Regular Expressions - Grouping and Referencing
In regular expressions, grouping allows us to treat multiple characters as a single unit, much like parentheses in mathematics. Grouping primarily serves two purposes:
- Treat multiple characters as a whole: Quantifiers (such as
*,+,?,{n}) can be applied to this whole unit. - Capture matched content: The matched content of this part can be referenced or extracted later.
Basic Syntax
Use parentheses () to create a group:
(expression)
For example, (ab)+ can match "ab", "abab", "ababab", etc., but cannot match "a" or "b".
Group Types
There are several different types of groups in regular expressions:
1. Capturing Group
The most common form of grouping. It captures the matched content and assigns it a number (starting from 1).
Example
(d{4})-(d{2})-(d{2}) # Match date format YYYY-MM-DD
This expression creates 3 groups:
- Group 1: 4-digit year
- Group 2: 2-digit month
- Group 3: 2-digit day
2. Non-capturing Group
Use the syntax (?:expression) to group without capturing.
Example
(?:Mr|Ms|Mrs). (w+) # Matches "Mr. Smith" but only captures "Smith"
3. Named Capturing Group
Assign a name to a group for better readability (syntax may vary across languages).
Python example:
Example
(?Pd{4})-(?Pd{2})-(?Pd{2})
JavaScript example:
Example
(?d{4})-(?d{2})-(?d{2})
Group Referencing
One of the most powerful features of groups is the ability to reference previously matched content either inside the regular expression or externally.
1. Backreference
Reference a previous group inside the regular expression using the number syntax:
Example
(w+) 1 # Matches repeated words, e.g., "hello hello"
This pattern matches two identical words separated by a space.
2. Named Backreference
For named groups, use the name for referencing:
Example
(?Pw+)(?P=word) # Python syntax k # JavaScript syntax
3. Replacement Reference
Reference group content in replacement operations:
Python example:
Example
import re
text = "2023-05-15"
new_text = re.sub(r'(d{4})-(d{2})-(d{2})', r'2/3/1', text)
# Result: "05/15/2023"
JavaScript example:
Example
let text = "2023-05-15";
let newText = text.replace(/(d{4})-(d{2})-(d{2})/, '$2/$3/$1');
// Result: "05/15/2023"
Practical Application Examples
Example 1: Matching HTML Tags
Example
]*>.*?1>
This pattern can match paired HTML tags (like <div>...</div>), where:
(*)captures the tag name1references the previously captured tag name to ensure consistency
Example 2: Validating Repeated Words
Example
b(w+)bs+1b
Can find consecutive repeated words in text.
Example 3: Date Format Conversion
Python code:
Example
import re
date = "2023-12-25"
# Convert YYYY-MM-DD to DD/MM/YYYY
new_date = re.sub(r'(d{4})-(d{2})-(d{2})', r'3/2/1', date)
print(new_date) # Output: 25/12/2023
Advanced Applications of Grouping
1. Conditional Matching
Some regex engines support conditional matching based on groups:
Example
(?(1)true-pattern|false-pattern)
Means if group 1 was matched, match true-pattern, otherwise match false-pattern.
2. Balancing Groups (Advanced Feature)
Used for matching nested structures (like parentheses), requires support from specific regex engines.
Common Issues and Pitfalls
- Overusing groups: Unnecessary groups can impact performance
- Bad example:
(a)|(b)(if capturing is not needed, use(?:a|b)instead)
- Bad example:
- Confusing group numbering:
- Group numbers are assigned from 1 based on the order of opening parentheses
- Non-capturing groups do not participate in numbering
- Greedy matching issues:
# Will greedily match until the last >
Should use:# Non-greedy matching
Practice Challenges
- Write a regular expression to match repeated email usernames (e.g.,
user@domain.com;user@domain.com) - Convert phone number format from
(123) 456-7890to123-456-7890 - Extract all attributes from an HTML tag (e.g., src and alt from
<img src="..." alt="...">)
Summary Key Points
| Concept | Syntax | Purpose |
|---|---|---|
| Capturing Group | (pattern) |
Capture matched content and assign a number |
| Non-capturing Group | (?:pattern) |
Group without capturing |
| Named Group | (?P<name>pattern) (Python) |
Assign a name to the group |
| Backreference | 1, 2, etc. |
Reference a previously matched group |
| Named Backreference | (?P=name) (Python) |
Reference a group by name |
| Replacement Reference | $1, $2 or 1, 2 |
Reference groups in replacement strings |
Mastering the grouping and referencing features of regular expressions allows you to:
- Build more complex matching patterns
- Extract and process specific parts of strings
- Implement intelligent string transformations
- Validate complex text structures
YouTip