YouTip LogoYouTip

Regexp Grouping Quoting

Regular Expressions - Grouping and Referencing

In regular expressions, grouping allows us to treat multiple characters as a single unit, much like parentheses in mathematics. Grouping primarily serves two purposes:

  1. Treat multiple characters as a whole: Quantifiers (such as *, +, ?, {n}) can be applied to this whole unit.
  2. Capture matched content: The matched content of this part can be referenced or extracted later.

Basic Syntax

Use parentheses () to create a group:

(expression)

For example, (ab)+ can match "ab", "abab", "ababab", etc., but cannot match "a" or "b".


Group Types

There are several different types of groups in regular expressions:

1. Capturing Group

The most common form of grouping. It captures the matched content and assigns it a number (starting from 1).

Example

(d{4})-(d{2})-(d{2}) # Match date format YYYY-MM-DD

This expression creates 3 groups:

  • Group 1: 4-digit year
  • Group 2: 2-digit month
  • Group 3: 2-digit day

2. Non-capturing Group

Use the syntax (?:expression) to group without capturing.

Example

(?:Mr|Ms|Mrs). (w+) # Matches "Mr. Smith" but only captures "Smith"

3. Named Capturing Group

Assign a name to a group for better readability (syntax may vary across languages).

Python example:

Example

(?Pd{4})-(?Pd{2})-(?Pd{2})

JavaScript example:

Example

(?d{4})-(?d{2})-(?d{2})

Group Referencing

One of the most powerful features of groups is the ability to reference previously matched content either inside the regular expression or externally.

1. Backreference

Reference a previous group inside the regular expression using the number syntax:

Example

(w+) 1 # Matches repeated words, e.g., "hello hello"

This pattern matches two identical words separated by a space.

2. Named Backreference

For named groups, use the name for referencing:

Example

(?Pw+)(?P=word) # Python syntax

 k # JavaScript syntax

3. Replacement Reference

Reference group content in replacement operations:

Python example:

Example

import re

 text = "2023-05-15"

 new_text = re.sub(r'(d{4})-(d{2})-(d{2})', r'2/3/1', text)

# Result: "05/15/2023"

JavaScript example:

Example

let text = "2023-05-15";
 let newText = text.replace(/(d{4})-(d{2})-(d{2})/, '$2/$3/$1');

// Result: "05/15/2023"

Practical Application Examples

Example 1: Matching HTML Tags

Example

]*>.*?

This pattern can match paired HTML tags (like <div>...</div>), where:

  • (*) captures the tag name
  • 1 references the previously captured tag name to ensure consistency

Example 2: Validating Repeated Words

Example

b(w+)bs+1b

Can find consecutive repeated words in text.

Example 3: Date Format Conversion

Python code:

Example

import re

date = "2023-12-25"

# Convert YYYY-MM-DD to DD/MM/YYYY

 new_date = re.sub(r'(d{4})-(d{2})-(d{2})', r'3/2/1', date)

print(new_date) # Output: 25/12/2023

Advanced Applications of Grouping

1. Conditional Matching

Some regex engines support conditional matching based on groups:

Example

(?(1)true-pattern|false-pattern)

Means if group 1 was matched, match true-pattern, otherwise match false-pattern.

2. Balancing Groups (Advanced Feature)

Used for matching nested structures (like parentheses), requires support from specific regex engines.


Common Issues and Pitfalls

  1. Overusing groups: Unnecessary groups can impact performance
    • Bad example: (a)|(b) (if capturing is not needed, use (?:a|b) instead)
  2. Confusing group numbering:
    • Group numbers are assigned from 1 based on the order of opening parentheses
    • Non-capturing groups do not participate in numbering
  3. Greedy matching issues:
     # Will greedily match until the last >
    Should use:
     # Non-greedy matching

Practice Challenges

  1. Write a regular expression to match repeated email usernames (e.g., user@domain.com;user@domain.com)
  2. Convert phone number format from (123) 456-7890 to 123-456-7890
  3. Extract all attributes from an HTML tag (e.g., src and alt from <img src="..." alt="...">)

Summary Key Points

Concept Syntax Purpose
Capturing Group (pattern) Capture matched content and assign a number
Non-capturing Group (?:pattern) Group without capturing
Named Group (?P<name>pattern) (Python) Assign a name to the group
Backreference 1, 2, etc. Reference a previously matched group
Named Backreference (?P=name) (Python) Reference a group by name
Replacement Reference $1, $2 or 1, 2 Reference groups in replacement strings

Mastering the grouping and referencing features of regular expressions allows you to:

  • Build more complex matching patterns
  • Extract and process specific parts of strings
  • Implement intelligent string transformations
  • Validate complex text structures
← Regexp AssertionsTailwindcss Animations β†’