Regexp Metachar
# Regular Expressions β Metacharacters
## Regular Expressions - Metacharacters
Metacharacters in regular expressions are characters with special meanings. They do not represent literal values but are used to control the matching pattern.
* * *
## Basic Metacharacters
`.` (Dot)
* Matches any single character except the newline character (`n`)
Example: `a.b` matches "aab", "a1b", "a b", etc.
`^` (Caret)
* Matches the start position of a string
* Example: `^abc` matches strings that start with "abc"
`$` (Dollar Sign)
* Matches the end position of a string
* Example: `xyz$` matches strings that end with "xyz"
`` (Backslash)
* Escape character, makes the following character lose its special meaning
* Example: `.` matches a literal dot instead of any character
* * *
## Character Class Metacharacters
`[]` (Square Brackets)
* Defines a character set, matches any one character within the set
* Example: `` matches any one vowel
`[^]` (Negated Character Class)
* Matches any character not in the square brackets
* Example: `[^0-9]` matches any non-digit character
`-` (Hyphen)
* Represents a range within a character class
* Example: `` matches any lowercase letter
* * *
## Quantifier Metacharacters
`*` (Asterisk)
* Matches the preceding sub-expression zero or more times
* Example: `ab*c` matches "ac", "abc", "abbc", etc.
`+` (Plus Sign)
* Matches the preceding sub-expression one or more times
* Example: `ab+c` matches "abc", "abbc" but not "ac"
`?` (Question Mark)
* Matches the preceding sub-expression zero or one time
* Example: `colou?r` matches "color" and "colour"
`{n}` (Curly Braces)
* Matches exactly n times
* Example: `a{3}` matches "aaa"
`{n,}`
* Matches at least n times
* Example: `a{2,}` matches "aa", "aaa", etc.
`{n,m}`
* Matches between n and m times
* Example: `a{2,4}` matches "aa", "aaa", "aaaa"
* * *
## Grouping and Alternation Metacharacters
`()` (Parentheses)
* Defines a sub-expression or capturing group
* Example: `(ab)+` matches "ab", "abab", etc.
`|` (Vertical Bar)
* Represents an "OR" relationship
* Example: `cat|dog` matches "cat" or "dog"
* * *
## Special Character Class Metacharacters
`d`
* Matches any digit, equivalent to ``
`D`
* Matches any non-digit, equivalent to `[^0-9]`
`w`
* Matches any word character (letter, digit, underscore), equivalent to ``
`W`
* Matches any non-word character, equivalent to `[^a-zA-Z0-9_]`
`s`
* Matches any whitespace character (space, tab, newline, etc.)
`S`
* Matches any non-whitespace character
* * *
## Boundary Matching Metacharacters
`b`
* Matches a word boundary
* Example: `bcatb` matches "cat" but not "category"
`B`
* Matches a non-word boundary
* Example: `BcatB` matches "cat" in "scattered" but not the standalone "cat"
* * *
## Other Metacharacters
`n`
* Matches a newline character
`t`
* Matches a tab character
`r`
* Matches a carriage return character
`f`
* Matches a form feed character
`v`
* Matches a vertical tab character
* * *
## Greedy vs. Non-Greedy Quantifiers
By default, quantifiers (`*`, `+`, `?`, `{}`) are greedy and match as many characters as possible. Adding `?` after a quantifier makes it non-greedy (lazy):
* `*?`: Zero or more times, but as few as possible
* `+?`: One or more times, but as few as possible
* `??`: Zero or one time, but as few as possible
* `{n,m}?`: Between n and m times, but as few as possible
Example: `` matches HTML tags without crossing tag boundaries
* * *
## Positive and Negative Lookahead
`(?=...)` (Positive Lookahead)
* Matches a position followed by a specific pattern
* Example: `Windows(?=95|98)` matches "Windows" followed by 95 or 98
`(?!...)` (Negative Lookahead)
* Matches a position not followed by a specific pattern
* Example: `Windows(?!95|98)` matches "Windows" not followed by 95 or 98
`(?<=...)` (Positive Lookbehind)
* Matches a position preceded by a specific pattern
* Example: `(?<=95|98)Windows` matches "Windows" preceded by 95 or 98
`(?<!...)` (Negative Lookbehind)
* Matches a position not preceded by a specific pattern
* Example: `(?<!95|98)Windows` matches "Windows" not preceded by 95 or 98
### Example
Next, we analyze a regular expression for matching email addresses, as shown in the following image:
## Example
var str = "abcd test@.com 1234"; var patt1 = /b[w.%+-]+@[w.-]+.{2,6}b/g; document.write(str.match(patt1));
The highlighted text below is the matched expression:
test@.com
!(#)
[Try it Β»](#)
!(#)
The following table contains a complete list of metacharacters and their behavior in regular expression context:
| Character | Description |
| --- | --- |
| | Marks the next character as a special character, a literal character, a backreference, or an octal escape. For example, 'n' matches the character "n". 'n' matches a newline character. The sequence '' matches "" and "(" matches "(". |
| ^ | Matches the start position of the input string. If the RegExp object's Multiline property is set, ^ also matches the position after 'n' or 'r'. |
| $ | Matches the end position of the input string. If the RegExp object's Multiline property is set, $ also matches the position before 'n' or 'r'. |
| * | Matches the preceding sub-expression zero or more times. For example, zo* matches "z" and "zoo". * is equivalent to {0,}. |
| + | Matches the preceding sub-expression one or more times. For example, 'zo+' matches "zo" and "zoo", but not "z". + is equivalent to {1,}. |
| ? | Matches the preceding sub-expression zero or one time. For example, "do(es)?" matches "do" or "does". ? is equivalent to {0,1}. |
| {n} | n is a non-negative integer. Matches exactly n times. For example, 'o{2}' does not match the 'o' in "Bob", but matches the two o's in "food". |
| {n,} | n is a non-negative integer. Matches at least n times. For example, 'o{2,}' does not match the 'o' in "Bob", but matches all o's in "foooood". 'o{1,}' is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'. |
| {n,m} | m and n are non-negative integers, where n <= m. Matches at least n and at most m times. For example, "o{1,3}" will match the first three o's in "fooooood". 'o{0,1}' is equivalent to 'o?'. Note that there cannot be a space between the comma and the two numbers. |
| ? | When this character follows any other quantifier (*, +, ?, {n}, {n,}, {n,m}), the match is non-greedy. Non-greedy mode matches as few characters as possible, while the default greedy mode matches as many as possible. For example, for the string "oooo", 'o+?' matches a single "o", while 'o+' matches all 'o's. |
| . | Matches any single character except the newline character (n, r). To match any character including 'n', use a pattern like "**(.|n)**". |
| (pattern) | Matches pattern and captures the match. The captured match can be accessed from the resulting Matches collection, using SubMatches collection in VBScript and $0β¦$9 properties in JScript. To match parentheses characters, use '(' or ')'. |
| (?:pattern) | Matches pattern but does not capture the match, i.e., it is a non-capturing match and is not stored for later use. This is useful when combining parts of a pattern with the "or" character (|). For example, 'industr(?:y|ies)' is a more concise expression than 'industry|industries'. |
| (?=pattern) | Positive lookahead, matches a search string at any point where a string matching pattern begins. This is a non-capturing match, i.e., the match does not need to be captured for later use. For example, "Windows(?=95|98|NT|2000)" matches "Windows" in "Windows2000", but not in "Windows3.1". Lookahead does not consume characters, meaning after a match occurs, the next match search starts immediately after the last match, not after the character containing the lookahead. |
| (?!pattern) | Negative lookahead, matches a search string at any point where a string not matching pattern begins. This is a non-capturing match, i.e., the match does not need to be captured for later use. For example, "Windows(?!95|98|NT|2000)" matches "Windows" in "Windows3.1", but not in "Windows2000". Lookahead does not consume characters, meaning after a match occurs, the next match search starts immediately after the last match, not after the character containing the lookahead. |
| (?<=pattern) | Positive lookbehind, similar to positive lookahead but in the opposite direction. For example, "`(?<=95|98|NT|2000)Windows`" matches "`Windows`" in "`2000Windows`", but not in "`3.1Windows`". |
| (?<!pattern) | Negative lookbehind, similar to negative lookahead but in the opposite direction. For example, "`(?<!95|98|NT|2000)Windows`" matches "`Windows`" in "`3.1Windows`", but not in "`2000Windows`". |
| x|y | Matches x or y. For example, 'z|food' matches "z" or "food". '(z|f)ood' matches "zood" or "food". |
| | Character set. Matches any one character contained. For example, '' matches 'a' in "plain". |
| [^xyz] | Negated character set. Matches any character not contained. For example, '[^abc]' matches 'p', 'l', 'i', 'n' in "plain". |
| | Character range. Matches any character in the specified range. For example, '' matches any lowercase letter from 'a' to 'z'. |
| [^a-z] | Negated character range. Matches any character not in the specified range. For example, '[^a-z]' matches any character not between 'a' and 'z'. |
| (#) | Matches a word boundary, i.e., the position between a word and a space. For example, 'erb' matches 'er' in "never", but not in "verb". |
| (#) | Matches a non-word boundary. 'erB' matches 'er' in "verb", but not in "never". |
| cx | Matches the control character indicated by x. For example, cM matches a Control-M or carriage return. The value of x must be A-Z or a-z. Otherwise, c is treated as a literal 'c' character. |
| d | Matches a digit character. Equivalent to . |
| D | Matches a non-digit character. Equivalent to [^0-9]. |
| f | Matches a form feed character. Equivalent to x0c and cL. |
| n | Matches a newline character. Equivalent to x0a and cJ. |
| r | Matches a carriage return character. Equivalent to x0d and cM. |
| s | Matches any whitespace character, including space, tab, form feed, etc. Equivalent to . |
| S | Matches any non-whitespace character. Equivalent to [^ fnrtv]. |
| t | Matches a tab character. Equivalent to x09 and cI. |
| v | Matches a vertical tab character. Equivalent to x0b and cK. |
| w | Matches a word character (letter, digit, underscore). Equivalent to ''. |
| W | Matches a non-word character. Equivalent to '[^A-Za-z0-9_]'. |
| xn | Matches n, where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long. For example, 'x41' matches "A". 'x041' is equivalent to 'x04' & "1". ASCII encoding can be used in regular expressions. |
| num | Matches num, where num is a positive integer. A backreference to the captured match. For example, '(.)1' matches two consecutive identical characters. |
| n | Identifies an octal escape value or a backreference. If n is preceded by at least n captured sub-expressions, n is a backreference. Otherwise, if n is an octal digit (0-7), n is an octal escape value. |
| nm | Identifies an octal escape value or a backreference. If nm is preceded by at least nm captured sub-expressions, nm is a backreference. If nm is preceded by at least n captures, n is a backreference followed by the literal m. If none of the previous conditions are met, and n and m are octal digits (0-7), nm matches the octal escape value nm. |
| nml | Matches the octal escape value nml if n is an octal digit (0-3), and m and l are octal digits (0-7). |
| un | Matches n, where n is a Unicode character represented by four hexadecimal digits. For example, u00A9 matches the copyright symbol (?). |
YouTip