Regular expressions are patterns that can be searched for within a text string, instead of searching for an exact match to a known piece of text. They are much more versatile for find and replace operations, and therefore useful for parsing, filtering, etc.
Some example regular expressions are:
| Pattern | Code | Meaning |
|---|---|---|
| B.*D | regex(“B.*D”) | Find B, followed by any number of characters (including none), followed by a D. |
| [0-3] | regex(@“[0-3]“) | Find any digit from 0 to 3 |
| foo | bar | regex(“foo |
| \d+ | regex(@“\d+”,“g”) | Find all sequences of digits |
These are some of the patterns you can use. If you want to use any of the operators as an actual character, it must be escaped with a backslash.
It is highly recommended that you use raw strings like @"..." for your regular expression patterns, because with a regular DM string you have to escape all backslash “ and open bracket [ characters, which will make your regular expression much harder for you to read. It’s easier to write @"[d]n" than "[d]n".
| Pattern | Matches |
|---|---|
| a | b |
| . | Any character (except a line break) |
| ^ | Beginning of text; or line if m flag is used |
| $ | End of text; or line if m flag is used |
| \A | Beginning of text |
| \Z | End of text |
| [chars] | Any character between the brackets. Ranges can be specified with a hyphen, like 0-9. Character classes like d and s can also be used (see below). |
| [^chars] | Any character NOT matching the ones between the brackets. |
| \b | Word break |
| \B | Word non-break |
| (pattern) | Capturing group: the pattern must match, and its contents will be captured in the group list. |
| (?:pattern) | Non-capturing group: Match the pattern, but do not capture its contents. |
| \1 through \9 | Backreference; *N* is whatever was captured in the Nth capturing group. |
| Modifiers | |
| Modifiers are “greedy” by default, looking for the longest match possible. When following a word, they only apply to the last character. | |
| a* | Match a zero or more times |
| a+ | Match a one or more times |
| a? | Match a zero or one time |
| a{n} | Match a, exactly n times |
| a{n,} | Match a, n or more times |
| a{n,m} | Match a, n to m times |
| modifier? | Make the previous modifier non-greedy (match as little as possible) |
| Escape codes and character classes | |
| \xNN | Escape code for a single character, where NN is its hexadecimal ASCII value |
| \uNNNN | Escape code for a single 16-bit Unicode character, where NNNN is its hexadecimal value |
| \UNNNNNN | Escape code for a single 21-bit Unicode character, where NNNNNN is its hexadecimal value |
| \d | Any digit 0 through 9 |
| \D | Any character except a digit or line break |
| \l | Any letter A through Z, case-insensitive |
| \L | Any character except a letter or line break |
| \w | Any identifier character: digits, letters, or underscore |
| \W | Any character except an identifier character or line break |
| \s | Any space character |
| \S | Any character except a space or line break |
| Assertions | |
| (?=pattern) | Look-ahead: Require this pattern to come next, but don’t include it in the match |
| (?!pattern) | Look-ahead: Require this pattern NOT to come next |
| (?⇐pattern) | Look-behind: Require this pattern to come before, but don’t include it in the match (must be a fixed byte length) |
| (?<!pattern) | Look-behind: Require this pattern NOT to come before (must be a fixed byte length) |
The optional flags can be any combination of these:
| Flag | Meaning |
|---|---|
| i | Case-insensitive matching |
| g | Global: In Find() subsequent calls will start where this left off, and in Replace() all matches are replaced. |
| m | Multi-line: ^ and $ refer to the beginning and end of a line, respectively. |
After calling Find() on a /regex datum, the datum’s group var will contain a list—if applicable—of any sub-patterns found with the () parentheses operator. For instance, searching the string "123" for 1(d)(d) will match "123", and the group var will be list("2","3"). Groups can also be used in replacement expressions; see the Replace() proc for more details.