This section offers a real example of the XReplace-32 regular expressions mechanism.
It explains step by step a complex replacement that saved hours of time to a real XReplace-32 user.
This is a live problem from the industry that was impossible to solve without regular expressions.
There're hundreds of files of the following format:
Software - Package - (Path:C:\WINNT\SYSTEM32\NTOSKRNL.EXE) - Version = 4.0
Hardware - Package - (Path:T:\32bit\system\4nt\PRODUCTS.TXT) - Name = Microsoft
Windows NT SP3
Software - Package - (Path:N:\32bit\Xreplace\xrep32.exe) - File Size = 914688
Hardware - Package - (Path:C:\WINNT\SYSTEM32\MFC42.DLL) - File Date = 862437600
No
|
The task is to remove paths inside the brackets to produce the following
output:
Software - Package - NTOSKRNL.EXE - Version = 4.0
Hardware - Package - PRODUCTS.TXT - Name = Microsoft
Windows NT SP3
Software - Package - xrep32.exe - File Size = 914688
Hardware - Package - MFC42.DLL - File Date = 862437600
No
|
A regular expression is a series of patterns that are matched to the real data.
Once the pattern is matched, it's cut into pieces following the pattern format. Each piece is identified
by a number and can be altered, copied or replaced.
A range defines what kind of characters a pattern can contain. For example, all capital letters range from A to Z.
XReplace-32 regexp range for capital letters is written
. A range for all alphanumeric
characters is thus:
. To include a single bracket
(
into the pattern you'll have to use a backslash:
Patterns are separated in XReplace-32 by putting them into parenthesis. A full pattern of all alphanumeric characters is
written
and will match any character between A and Z, a and z or 0 and 9.
To match a sequence of characters it is necessary to express the fact that a pattern is repeated multiple times
(as many as possible). This is done by adding a * to the pattern:
Multiple patterns are easy to write. For example, a filename is of the form name.extension
and will be matched by
([A-Z,a-z,0-9]*\.[A-Z,a-z,0-9]*)
|
This reads: a sequence of alphanumeric characters followed by a dot and an another sequence of alphanumeric characters.
We can now attempt to get rid of (Path:?:\
and replace it by a simple backslash in order
to obtain:
Software - Package - \WINNT\SYSTEM32\NTOSKRNL.EXE) - Version = 4.0
|
for the fisrt line.
The source replacement is:
and the target is
which will match a left parenthesis followed by
Path:
, a single
character, a column and a backslash.
We must now get rid of all sequences between backslashes:
Software - Package - \WINNT\SYSTEM32\NTOSKRNL.EXE) - Version = 4.0
Software - Package - \SYSTEM32\NTOSKRNL.EXE) - Version = 4.0
Software - Package - \NTOSKRNL.EXE) - Version = 4.0
|
There's no mechanism to repeat a replacement till no more occurrences are found. We'll have to repeat the operation
that will replace a backslash followed by a sequence of alphanumeric characters (directory names) and
terminated by another backslash.
The replacement is almost finished. It's obviously easy to replace a filename starting by a backslash
and ending with a right parenthesis. The source pattern sequence is:
(\\)([A-Z,a-z,0-9]*\.[A-Z,a-z,0-9]*)(\))
|
As we replaced by a simple backslash in previous steps, we'll have to rewrite the filename as the result of
the replacement. Remember that each pattern could be identified by a number. In this example,
patterns are the following:
1: \\
2: [A-Z,a-z,0-9]*\. [A-Z,a-z,0-9]*
3: \)
|
The target replacement is \2
, which is the filename only.
The full replacements sequence is finally:
Imagine making such a replacement in 10'000 files by hand!
|
Use the prompted mode to try replacements before they are effectively made.
Regular expressions have lots of side effects when it comes to complex replacements because of
the very extensive pattern matching algorithms.
|