Ruby Regular Expressions

Regular expressions are special sequences of characters that use a pattern with specific syntax to match or find sets of strings.

Regular expressions are composed of predefined specific characters and combinations of these characters to form a "rule string," which is used to express a filtering logic for strings.

Regular expressions are literally a pattern between slashes or between any delimiters following %r, as shown below:

/pattern/ /pattern/im %r!/usr/local!

Example

line1 = "Cats are smarter than dogs";
line2 = "Dogs also like meat";

if(line1 =~ /Cats(.*)/)
  puts "Line1 contains Cats"
end

if(line2 =~ /Cats(.*)/)
  puts "Line2 contains Dogs"
end

Try it yourself »

The output of the above example is:

Line1 contains Cats

A regular expression may contain an optional modifier that controls various aspects of matching. Modifiers are specified after the second slash character, as shown in the example above. The following table lists possible modifiers:

Modifier	Description
i	Ignore case when matching text.
o	Perform `#{}` interpolation only once. The regular expression is evaluated only the first time.
x	Ignore spaces and allow comments within the entire expression.
m	Match multiple lines, treating newline characters as normal characters.
u,e,s,n	Interpret the regular expression as Unicode (UTF-8), EUC, SJIS, or ASCII. If no modifier is specified, the regular expression is assumed to use the source encoding.

Just as strings can be delimited with %Q, Ruby allows you to start a regular expression with %r followed by any delimiter. This is very useful when describing a pattern containing many slash characters that you don't want to escape.

%r|/|

Flags can be matched using the following syntax:

%r[]i

Except for control characters (+ ? . * ^ $ ( ) { } | ), all other characters match themselves. You can escape control characters by placing a backslash before them.

The following table lists the regular expression syntax available in Ruby.

Pattern	Description
^	Matches the beginning of a line.
$	Matches the end of a line.
.	Matches any single character except the newline. With the `m` option, it also matches the newline.
[...]	Matches any single character within the brackets.
[^...]	Matches any single character not within the brackets.
re*	Matches the preceding sub-expression zero or more times.
re+	Matches the preceding sub-expression one or more times.
re?	Matches the preceding sub-expression zero or one time.
re{ n}	Matches the preceding sub-expression exactly n times.
re{ n,}	Matches the preceding sub-expression n or more times.
re{ n, m}	Matches the preceding sub-expression at least n times but not more than m times.
a\| b	Matches a or b.
(re)	Groups the regular expression and remembers the matched text.
(?imx)	Temporarily turns on the `i`, `m`, or `x` option within the regular expression. If in parentheses, it affects only the part within the parentheses.
(?-imx)	Temporarily turns off the `i`, `m`, or `x` option within the regular expression. If in parentheses, it affects only the part within the parentheses.
(?: re)	Groups the regular expression but does not remember the matched text.
(?imx: re)	Temporarily turns on the `i`, `m`, or `x` option within the parentheses.
(?-imx: re)	Temporarily turns off the `i`, `m`, or `x` option within the parentheses.
(?#...)	Comment.
(?= re)	Specifies a position with a pattern. No scope.
(?! re)	Specifies a negative position with a pattern. No scope.
(?> re)	Matches an independent pattern without backtracking.
w	Matches a word character.
W	Matches a non-word character.
s	Matches a whitespace character. Equivalent to .
S	Matches a non-whitespace character.
d	Matches a digit. Equivalent to .
D	Matches a non-digit.
A	Matches the beginning of a string.
Z	Matches the end of a string. If a newline exists, it matches only before the newline.
z	Matches the end of a string.
G	Matches the point where the last match completed.
b	Matches a word boundary when outside brackets, and a backspace (0x08) when inside brackets.
B	Matches a non-word boundary.
n, t, etc.	Matches a newline, carriage return, tab, etc.
1...9	Matches the nth grouped sub-expression.
10	Matches the nth grouped sub-expression if it has been matched. Otherwise, it refers to the octal representation of the character code.

Characters

Example	Description
/ruby/	Matches "ruby"
¥	Matches the Yen symbol. Ruby 1.9 and Ruby 1.8 support multiple characters.

Character Classes

Example	Description
/uby/	Matches "Ruby" or "ruby"
/rub/	Matches "ruby" or "rube"
//	Matches any lowercase vowel
//	Matches any digit, same as `//`
//	Matches any lowercase ASCII letter
//	Matches any uppercase ASCII letter
//	Matches any character within the brackets
/[^aeiou]/	Matches any character that is not a lowercase vowel
/[^0-9]/	Matches any non-digit character

Special Character Classes

Example	Description
/./	Matches any character except the newline
/./m	In multiline mode, it also matches the newline
/d/	Matches a digit, equivalent to `//`
/D/	Matches a non-digit, equivalent to `/[^0-9]/`
/s/	Matches a whitespace character, equivalent to `//`
/S/	Matches a non-whitespace character, equivalent to `/[^ trnf]/`
/w/	Matches a word character, equivalent to `//`
/W/	Matches a non-word character, equivalent to `/[^A-Za-z0-9_]/`

Repetition

Example	Description
/ruby?/	Matches "rub" or "ruby". Here, y is optional.
/ruby*/	Matches "rub" followed by zero or more y's.
/ruby+/	Matches "rub" followed by one or more y's.
/d{3}/	Matches exactly 3 digits.
/d{3,}/	Matches 3 or more digits.
/d{3,5}/	Matches 3, 4, or 5 digits.

Non-Greedy Repetition

This matches the minimum number of repetitions.

Example	Description
/<.*>/	Greedy repetition: matches "<ruby>perl>"
/<.*?>/	Non-greedy repetition: matches "<ruby>" in "<ruby>perl>"

Grouping with Parentheses

Example	Description
/Dd+/	No grouping: + repeats d
/(Dd)+/	Grouping: + repeats the Dd pair
/(uby(, )?)+/	Matches "Ruby", "Ruby, ruby, ruby", etc.

Backreferences

This matches a previously matched group again.

Example	Description
/()uby&1ails/	Matches ruby&rails or Ruby&Rails
/(['"])(?:(?!1).)*1/	Single or double-quoted string. 1 matches the character matched by the first group, 2 matches the character matched by the second group, and so on.

Alternation

Example	Description
/ruby\|rube/	Matches "ruby" or "rube"
/rub(y\|le)/	Matches "ruby" or "ruble"
/ruby(!+\|?)/	"ruby" followed by one or more ! or followed by a ?

Anchors

This specifies the position for matching.

Example	Description
/^Ruby/	Matches a string or line starting with "Ruby"
/Ruby$/	Matches a string or line ending with "Ruby"
/ARuby/	Matches a string starting with "Ruby"
/RubyZ/	Matches a string ending with "Ruby"
/bRubyb/	Matches "Ruby" at word boundaries
/brubB/	B is a non-word boundary: matches "rub" in "rube" and "ruby", but not in standalone "rub"
/Ruby(?=!)/	Matches "Ruby" if followed by an exclamation mark
/Ruby(?!!)/	Matches "Ruby" if not followed by an exclamation mark

Special Syntax for Parentheses

Example	Description
/R(?#comment)/	Matches "R". All remaining characters are comments.
/R(?i)uby/	Case-insensitive when matching "uby".
/R(?i:uby)/	Same as above.
/rub(?:y\|le))/	Groups only, no 1 backreference

sub and gsub, along with their bang variants sub! and gsub!, are important string methods when using regular expressions.

All these methods perform search and replace operations using regular expression patterns. sub and sub! replace the first occurrence of the pattern, while gsub and gsub! replace all occurrences.

sub and gsub return a new string, leaving the original string unmodified, whereas sub! and gsub! modify the string they are called on.

Example

phone = "138-3453-1111 #This is a phone number"
phone = phone.sub!(/#.*$/, "")
puts "Phone number : #{phone}"
phone = phone.gsub!(/D/, "")
puts "Phone number : #{phone}"

Try it yourself »

The output of the above example is:

Phone number : 138-3453-1111
Phone number : 13834531111

Example

text = "rails is rails, Ruby on Rails is a great Ruby framework"
text.gsub!("rails", "Rails")
text.gsub!(/brailsb/, "Rails")
puts "#{text}"

Try it yourself »

The output of the above example is:

Rails is Rails, Ruby on Rails is a great Ruby framework

YouTip