package require nre ?2.0? nrematch ?switches? exp string ?matchVar? ?subMatchVar subMatchVar ...?
If additional arguments are specified after string then they are treated as the names of variables in which to return information about which part(s) of string matched exp. MatchVar will be set to the range of string that matched all of exp. The first subMatchVar will contain the characters in string that matched the leftmost parenthesized subexpression within exp, the next subMatchVar will contain the characters that matched the next parenthesized subexpression to the right in exp, and so on.Instead of using the standard regular expression package it uses the package described in this man page.
If the initial arguments to nrematch start with - then they are treated as switches. The following switches are currently supported:
A regular expression is zero or more branches, separated by ``|''. It matches anything that matches one of the branches.
A branch is zero or more pieces, concatenated. It matches a match for the first, followed by a match for the second, etc.
A piece is an atom possibly followed by ``*'', ``+'', ``?'',or ``{x,y}'' which in turn might be followed by a ``?''.A ``*'' matches a sequence of 0 or more matches of the atom.
A ``+'' matches a sequence of 1 or more matches of the atom.
A ``?'' matches a sequence of 0 or 1 matches of the atom.A ``{x}'' matches a sequence of x matches of the atom.
A``{x,}'' matches a sequence of x or more matches of the atom.
A ``{x,y}'' matches a sequence of at least x and at most y matches of
the atom.
By default a piece will match as long a sequence as
possible. However if the piece constructs described above have a ``?''
after them then piece will match as short a sequence as possible.
Note that the ``{x,y}'' repetition construct is only recognized if
the p flag is set.
An atom is a regular expression in parentheses (matching a match for the regular expression), a range (see below), ``.'' (matching any single character), ``^'' (matching the null string at the beginning of the input string), ``$'' (matching the null string at the end of the input string), a ``\'' followed by a single character (matching that characteror matching something special if the p flag is used; see the FLAGS section for details),or a single character with no other significance (matching that character).
A range is a sequence of characters enclosed in ``[]''. It normally matches any single character from the sequence. If the sequence begins with ``^'', it matches any single character not from the rest of the sequence. If two characters in the sequence are separated by ``-'', this is shorthand for the full list of ASCII characters between them (e.g. ``[0-9]'' matches any decimal digit). To include a literal ``]'' in the sequence, make it the first character (following a possible ``^''). To include a literal ``-'', make it the first or last character.
A parentheses atom in which the character immediately after the ``(''
is a ``?'' is a special construct with one of the following meanings:
``(?:''regexp``)'' are shy groups. This groups like
``()'' but doesn't capture the text for backreferences like ``()'' does.
It matches if regexp matches.
``(?=''regexp``)'' is a non-capturing zero-width positive lookahead
assertion. It matches if regexp matches.
The matched text is not consumed.
``(?!''regexp``)'' is a non-capturing zero-width negative lookahead
assertion. It matches if regexp does not match.
``(?#''any text``)'' is a comment. The entire atom is treated as an
empty string.
``(?ipxm)'' is a used to set flags. Any combination of the flag characters
``ipxm'' are allowed. The entire atom is treated as an empty string.
See the FLAGS section for a description of each flag.
``(?|''range``)'' is an alternate syntax for a character range.
Its benefit is that it does not use the Tcl special characters ``[]'' to
enclose the range.
The i flag causes case to be ignored when alphabetic characters are
compared.
The m flag enables multi-line mode. The ``^'' atom is changed to match
at the beginning of the string or the beginning of any line in the
string. The ``$'' atom is changed to match at the end of the string or
the end of any line in the string. The ``.'' atom is changed to match
any character except ``\n''.
The x flag causes white space in the regular expression to be ignored
and removed during compilation. To include literal white space as an atom
to be matched preceed it with a backslash ``\''. Whitespace is only ignored
between atoms, pieces, branches, and regular expressions.
It is not ignored in ranges or in any other complex atom.
The white space includes comments where a comment starts with a ``#''
and continues to the end of the line.
The p flag enables extra escape sequences and constructs to be
recognized. See the BACKWARDS COMPATIBILITY section for why these
constructs are not enabled by default. The following are enabled:FLAGS
Flags can be set using a ``(?''flag-char``)'' atom. Some commands that
use regular expressions have options that set some of these same
flags. For example the -nocase option sets the i flag. The advantage
of having the flags in the regular expression itself is that they can
then be used by any command without the need to add new command
switches. It is best to set the flags at the very beginning of the
regular expression; however they apply to the entire regular expression
no matter where they appear.
Considering only the rules given so far, x and y could end up with the values aabb and aa, aaab and aaa, ab and a, or any of several other combinations. To resolve this potential ambiguity nrematch chooses among alternatives using the following rules apply in decreasing order of priority:nrematch (a*)b* aabaaabb x y
After this command x will be abc, y will be ab, and z will be an empty string. Rule 4 specifies that (ab|a) gets first shot at the input string and Rule 2 specifies that the ab sub-expression is checked before the a sub-expression. Thus the b has already been claimed before the (b*) component is checked and (b*) must match an empty string.nrematch (ab|a)(b*)c abc x y z
A compiled regular expression is limited in size to 32678 bytes. If during compilation it is discovered that the regular expression requires more memory then the operation will fail with the error: ``regexp too big''.
The counts in the repetition construct ``{x,y}'' must be greater than or equal to zero and less than or equal to 255.
The maximum number of unique ranges in a regular expression is 64.
(?...), *?, +?, and ??
will cause compilation errors in older regular expressions so they are
always recognized in new regular expressions.
All the other new constructs would have meant something else in older
regular expressions. So they always have the old meaning unless you
turn on one of the new flags. For example you need to start a regular
expression with (?p) if you want to use the new ``\'' sequences or
the ``{x,y}'' repetition construct.
However if the regular expression string is not constant:
If instead you stored the regular expression string into a variable
then the regular expression object would remain and not need to
be recreated each time:
It is best to use (?i) instead of -nocase if you can
because then the text of the regular expression object describes
its state.
If you do not need the matchVar or a subMatchVar then you
can set that argument to an empty string ``{}''.
This tells nrematch to not bother setting a variable to that
particular captured subexpression.
To match a number if followed by something other than a period:
To match an item that contains only letters, but not all uppercase:
To see if a string contains both 'this' and 'that':
BACKWARDS COMPATIBILITY
Regular expressions from previous releases of Tcl should behave
exactly the same. The following new constructs:PERFORMANCE INFORMATION
The first time a regular expressions is used it is compiled into
a Tcl object. The next time that object needs to be used as a
regular expression the compilation step will not be needed if the
object still exists and is still a regular expression.
So if the regular expression is a constant string:
then the first time the above command is executed the string
constant object is converted to a regular expression object
and will remain so giving a performance boost.nrematch {abc|def|zeq} $str
then the string object will need to be recreated each time the
above command executes.nrematch "$W1|$W2|$W3" $str
If it is a complex regular expression used in more than one place
this can be a win in both time and space.set re "$W1|$W2|$W3"
proc foo {} {
global re
nrematch $re $str
}
BINARY CLEAN
The new regular expression compiler and matcher are binary clean.
This means that it is ok for the regular expression and the string
being matched to contain binary data including null bytes.EXAMPLES
To match a number if not followed by a period:nrematch {[0-9]+(?![.])} $str
nrematch {[0-9]+(?=[^.])} $str
nrematch {^(?![A-Z]*$)[a-zA-Z]*$} $str
nrematch {^(?=.*?this)(?=.*?that)} $str
Copyright © 1997 Darrel Schneider.