Regular expressions

Regular expressions are patterns that can be searched for within a text string, instead of searching for an exact match to a known piece of text. They are much more versatile for find and replace operations, and therefore useful for parsing, filtering, etc.

Some example regular expressions are:

Pattern	Code	Meaning
B.*D	regex(“B.*D”)	Find `B`, followed by any number of characters (including none), followed by a `D`.
[0-3]	regex(@“[0-3]“)	Find any digit from 0 to 3
foo	bar	regex(“foo
\d+	regex(@“\d+”,“g”)	Find all sequences of digits

These are some of the patterns you can use. If you want to use any of the operators as an actual character, it must be escaped with a backslash.

It is highly recommended that you use raw strings like @"..." for your regular expression patterns, because with a regular DM string you have to escape all backslash “ and open bracket [ characters, which will make your regular expression much harder for you to read. It’s easier to write @"[d]n" than "[d]n".

Pattern	Matches
a	b
.	Any character (except a line break)
^	Beginning of text; or line if `m` flag is used
$	End of text; or line if `m` flag is used
\A	Beginning of text
\Z	End of text
[chars]	Any character between the brackets. Ranges can be specified with a hyphen, like 0-9. Character classes like `d` and `s` can also be used (see below).
[^chars]	Any character NOT matching the ones between the brackets.
\b	Word break
\B	Word non-break
(pattern)	Capturing group: the pattern must match, and its contents will be captured in the group list.
(?:pattern)	Non-capturing group: Match the pattern, but do not capture its contents.
\1 through \9	Backreference; `N` is whatever was captured in the Nth capturing group.
Modifiers
Modifiers are “greedy” by default, looking for the longest match possible. When following a word, they only apply to the last character.
a*	Match a zero or more times
a+	Match a one or more times
a?	Match a zero or one time
a{n}	Match a, exactly n times
a{n,}	Match a, n or more times
a{n,m}	Match a, n to m times
modifier?	Make the previous modifier non-greedy (match as little as possible)
Escape codes and character classes
\xNN	Escape code for a single character, where NN is its hexadecimal ASCII value
\uNNNN	Escape code for a single 16-bit Unicode character, where NNNN is its hexadecimal value
\UNNNNNN	Escape code for a single 21-bit Unicode character, where NNNNNN is its hexadecimal value
\d	Any digit 0 through 9
\D	Any character except a digit or line break
\l	Any letter A through Z, case-insensitive
\L	Any character except a letter or line break
\w	Any identifier character: digits, letters, or underscore
\W	Any character except an identifier character or line break
\s	Any space character
\S	Any character except a space or line break
Assertions
(?=pattern)	Look-ahead: Require this pattern to come next, but don’t include it in the match
(?!pattern)	Look-ahead: Require this pattern NOT to come next
(?⇐pattern)	Look-behind: Require this pattern to come before, but don’t include it in the match (must be a fixed byte length)
(?<!pattern)	Look-behind: Require this pattern NOT to come before (must be a fixed byte length)

The optional flags can be any combination of these:

Flag	Meaning
i	Case-insensitive matching
g	Global: In Find() subsequent calls will start where this left off, and in Replace() all matches are replaced.
m	Multi-line: ^ and $ refer to the beginning and end of a line, respectively.

After calling Find() on a /regex datum, the datum’s group var will contain a list—if applicable—of any sub-patterns found with the () parentheses operator. For instance, searching the string "123" for 1(d)(d) will match "123", and the group var will be list("2","3"). Groups can also be used in replacement expressions; see the Replace() proc for more details.

DM Reference

DM Ref

Explorer

See also

Graph View

Backlinks