YouTip LogoYouTip

Python Find Substring

## Python: How to Find All Occurrences of a Substring In Python, finding whether a substring exists within a larger string is a common task. While built-in methods like `in` or `str.find()` are excellent for finding the first occurrence, you often need to locate **all** occurrences of a substring along with their starting indices. This tutorial covers how to find all occurrences of a substring in Python using both the native `str.find()` method and the powerful `re` (Regular Expression) module. --- ## Method 1: Using the `str.find()` Method The `str.find()` method returns the lowest index in the string where the substring is found. If the substring is not found, it returns `-1`. By utilizing a `while` loop and updating the starting search index, we can find all occurrences. ### Code Example ```python def find_all_substrings(main_string, sub_string): start = 0 positions = [] while True: # Search for the substring starting from the 'start' index start = main_string.find(sub_string, start) # If find() returns -1, no more occurrences exist if start == -1: break # Record the found index positions.append(start) # Move the index forward by 1 to find overlapping matches # (Or use 'start += len(sub_string)' to find non-overlapping matches) start += 1 return positions # Define the main string and the substring to search for main_string = "hello world, hello python, hello programming" sub_string = "hello" # Execute the function positions = find_all_substrings(main_string, sub_string) print("Substring positions:", positions) ``` ### Output ```text Substring positions: [0, 13, 26] ``` ### Code Explanation 1. **Function Parameters**: The `find_all_substrings` function accepts two arguments: `main_string` (the text to search within) and `sub_string` (the target text to find). 2. **Initialization**: The `start` variable tracks the current search position (initially `0`), and `positions` is an empty list to store the index results. 3. **The Loop**: A `while True` loop runs continuously until no more matches are found. 4. **`main_string.find(sub_string, start)`**: This searches for `sub_string` starting at the index specified by `start`. 5. **Termination**: If `find()` returns `-1`, the loop breaks. 6. **Index Increment**: If a match is found, its index is appended to `positions`. We then increment `start` by `1` to continue searching for subsequent occurrences. --- ## Method 2: Using Regular Expressions (`re` module) For a more concise and Pythonic approach, you can use Python's built-in `re` module. The `re.finditer()` function yields match objects for all non-overlapping matches in a string. ### Code Example (Non-Overlapping) ```python import re main_string = "hello world, hello python, hello programming" sub_string = "hello" # re.finditer returns an iterator yielding match objects positions = [match.start() for match in re.finditer(re.escape(sub_string), main_string)] print("Substring positions (Regex):", positions) ``` ### Output ```text Substring positions (Regex): [0, 13, 26] ``` > **Note**: `re.escape()` is used to escape any special characters in your substring, ensuring it is treated as a literal string during the regex search. --- ## Considerations: Overlapping vs. Non-Overlapping Matches When searching for substrings, it is important to decide how you want to handle overlapping patterns. For example, if you search for `"ana"` in `"anana"`: * **Overlapping matches** should return indices `[0, 2]` (finding both `"ana"na` and `an"ana"`). * **Non-overlapping matches** should return index `` (finding `"ana"na` and leaving only `"na"` remaining). ### Handling Overlapping Matches with Regex Standard regex engines do not return overlapping matches by default. To find overlapping matches using regular expressions, you can use a **positive lookahead assertion** (`(?=...)`): ```python import re main_string = "anana" sub_string = "ana" # Using positive lookahead to capture overlapping matches positions = [match.start() for match in re.finditer(f'(?={re.escape(sub_string)})', main_string)] print("Overlapping positions:", positions) ``` ### Output ```text Overlapping positions: [0, 2] ```
← Python Find CharacterPython Anagram Check β†’