Python Find Substring
## Python: How to Find All Occurrences of a Substring
In Python, finding whether a substring exists within a larger string is a common task. While built-in methods like `in` or `str.find()` are excellent for finding the first occurrence, you often need to locate **all** occurrences of a substring along with their starting indices.
This tutorial covers how to find all occurrences of a substring in Python using both the native `str.find()` method and the powerful `re` (Regular Expression) module.
---
## Method 1: Using the `str.find()` Method
The `str.find()` method returns the lowest index in the string where the substring is found. If the substring is not found, it returns `-1`. By utilizing a `while` loop and updating the starting search index, we can find all occurrences.
### Code Example
```python
def find_all_substrings(main_string, sub_string):
start = 0
positions = []
while True:
# Search for the substring starting from the 'start' index
start = main_string.find(sub_string, start)
# If find() returns -1, no more occurrences exist
if start == -1:
break
# Record the found index
positions.append(start)
# Move the index forward by 1 to find overlapping matches
# (Or use 'start += len(sub_string)' to find non-overlapping matches)
start += 1
return positions
# Define the main string and the substring to search for
main_string = "hello world, hello python, hello programming"
sub_string = "hello"
# Execute the function
positions = find_all_substrings(main_string, sub_string)
print("Substring positions:", positions)
```
### Output
```text
Substring positions: [0, 13, 26]
```
### Code Explanation
1. **Function Parameters**: The `find_all_substrings` function accepts two arguments: `main_string` (the text to search within) and `sub_string` (the target text to find).
2. **Initialization**: The `start` variable tracks the current search position (initially `0`), and `positions` is an empty list to store the index results.
3. **The Loop**: A `while True` loop runs continuously until no more matches are found.
4. **`main_string.find(sub_string, start)`**: This searches for `sub_string` starting at the index specified by `start`.
5. **Termination**: If `find()` returns `-1`, the loop breaks.
6. **Index Increment**: If a match is found, its index is appended to `positions`. We then increment `start` by `1` to continue searching for subsequent occurrences.
---
## Method 2: Using Regular Expressions (`re` module)
For a more concise and Pythonic approach, you can use Python's built-in `re` module. The `re.finditer()` function yields match objects for all non-overlapping matches in a string.
### Code Example (Non-Overlapping)
```python
import re
main_string = "hello world, hello python, hello programming"
sub_string = "hello"
# re.finditer returns an iterator yielding match objects
positions = [match.start() for match in re.finditer(re.escape(sub_string), main_string)]
print("Substring positions (Regex):", positions)
```
### Output
```text
Substring positions (Regex): [0, 13, 26]
```
> **Note**: `re.escape()` is used to escape any special characters in your substring, ensuring it is treated as a literal string during the regex search.
---
## Considerations: Overlapping vs. Non-Overlapping Matches
When searching for substrings, it is important to decide how you want to handle overlapping patterns. For example, if you search for `"ana"` in `"anana"`:
* **Overlapping matches** should return indices `[0, 2]` (finding both `"ana"na` and `an"ana"`).
* **Non-overlapping matches** should return index `` (finding `"ana"na` and leaving only `"na"` remaining).
### Handling Overlapping Matches with Regex
Standard regex engines do not return overlapping matches by default. To find overlapping matches using regular expressions, you can use a **positive lookahead assertion** (`(?=...)`):
```python
import re
main_string = "anana"
sub_string = "ana"
# Using positive lookahead to capture overlapping matches
positions = [match.start() for match in re.finditer(f'(?={re.escape(sub_string)})', main_string)]
print("Overlapping positions:", positions)
```
### Output
```text
Overlapping positions: [0, 2]
```
YouTip