Suppose, we are looking for a numeric digit then the regular expression we would search for is [0-9]
.
The brackets indicate that the character being compared should match any one of the characters enclosed within the bracket.
The dash (-) between 0 and 9 indicates that it is a range from 0 to 9. Therefore, this regular expression will match any
character between 0 and 9, that is, any digit. If we want to search for a special character literally we must use
a backslash before the special character. For example, the single character regular expression \*
matches a single asterisk. In the table below the special characters are briefly described.
Table 4-1. Regexp Control Characters
| ^ |
Beginning of the string. The expression ^A
will match an A
only at
the beginning of the string. |
| ^ |
The caret (^) immediately following the left-bracket ([) has a different meaning.
It is used to exclude the remaining characters within brackets from matching the target string.
The expression [^0-9]
indicates that the target character should not be a digit. |
| $ |
The dollar sign ($
) will match the end of the string.
The expression abc$
will match the sub-string abc
only if it is at
the end of the string. |
| | |
The alternation character (|
) allows either expression on its side to
match the target string. The expression a|b
will match a
as well
as b
. |
| . |
The dot (.
) will match any character. |
| * |
The asterisk (*
) indicates that the character to the left of the
asterisk in the expression should match 0 or more times. |
| + |
The plus (+
) is similar to asterisk but there should be at least
one match of the character to the left of the + sign in the expression. |
| ? |
The question mark (?
) matches the character to its
left 0 or 1 times. |
| () |
The parenthesis affects the order of pattern evaluation. |
| [ ] |
Brackets ([
and ]
) enclosing a set
of characters indicates that any of the enclosed characters may match the target character. |
The parenthesis, besides affecting the evaluation order of the regular expression,
also serves as tagged expression which is something like a temporary memory. This memory can then
be used when we want to replace the source expression with a replace expression. The replace expression
can specify an & character which means that the & represents the sub-string that was found.
So, if the sub-string that matched the regular expression is abcd
, then a replace
expression of xyz&xyz
will change it to xyzabcdxyz
. The
replace expression can also be expressed as xyz\0xyz
. The \0
indicates a tagged expression representing the entire sub-string that was matched.
Similarly you can have other tagged expression represented by \1
,
\2
etc. Note that although the tagged expression 0 is always defined,
the tagged expression 1, 2, etc. are only defined if the regular expression used in the search
had enough sets of parenthesis. Here are few examples:
Table 4-2. Regexp Examples
| String |
Search |
Replace |
Result |
| Mr. |
(Mr)(\.) |
\1s\2 |
Mrs. |
| abc |
(a)b(c) |
&-\1-\2 |
abc-a-c |
| bcd |
(a|b)c*d |
&-\1 |
bcd-b |
| abcde |
(.*)c(.*) |
&-\1-\2 |
abcde-ab-de |
| cde |
(ab|cd)e |
&-\1 |
cde-cd |
| |
([0-9,A-Z,a-z,\ ]*)(STOP:)([0-9,A-Z,a-z,\ ]*) -> \1\2 |
foo bar STOP: lkasdfkjakjlf |
foo bar STOP: |