| Linux Newbie Guide XI - Newbie goes Pro! (Advanced) |
|
|
Page: 2/9 [Printable Version]
Regular expressions (regexpr)
Regular experessions are used for "pattern" matching in search,
replace, etc. They are often used with utilities (e.g., grep, sed)
and programming languages (e.g., perl).
In regular expressions, most characters just match themselves. The exceptions
are the "metacharacters" that have special meaning.
In regexpr, the special characters are: "" (backslash),
"." (dot), "*" (asterisk), "[" (bracket),
"^" (caret, special only at the beginnig of a string), "$"
(dollar sign, special only at the end of a string). A character terminating a
pattern string is also special for this string.
The backslash, "" is used as an "escape" character,
i.e., to quote a subsequent special character.
Thus, "" searches for a backslash, "." searches for a
dot, "*" searches for the asterisk, "[" searches for the
bracket, "^" searches for the caret even at the begining of the
string, "$" searches for the dollar sign even at the end of the
string.
Backslash followed by a regular (non-special) character may gain a special
meaning. Thus, the symbols < and > match an
empty string at the beginning and the end of a word, respectively. The
symbol matches the empty string at the edge
of a word, and B matches the empty string provided it's not at the
edge of a word.
The dot, ".", matches any single character. [The dir
command uses "?" in this place.] Thus, "m.a" matches
"mpa" and "mea" but not "ma" or "mppa".
Any string is matched by ".*" (dot and asterisk). [The dir
command uses "*" instead.] In general, any pattern
followed by "*" matches zero or more occurences of this pattern.
Thus, "m*" matches zero or more occurances of "m". To
search for one or more "m", I could use "mm*".
The * is a repetition operator. Other repetition operators
are used less often--here is the full list:
* the proceding
item is to be matched zero or more times;
+ the preceding item
is to be matched one or more times);
? the preceding item
is optional and matched at most once);
{n} the preceding item is to be
matched exactly n times;
{n,} the preceding item is to be
matched n or more times);
{n,m} the preceding item is to be
matched at least n times, but not more than m times.
The caret, "^", means "the beginning of the line". So
"^a" means "find a line starting with an "a".
The dollar sign, "$", means "the end of the line". So
"a$" means "find a line ending with an "a".
Example. This command searches the file myfile for lines
starting with an "s" and ending with an "n", and prints
them to the standard output (screen):
cat myfile | grep '^s.*n$'
Any character terminating the pattern string is special, precede it with a
backslash if you want to use it within this string.
The bracket, "[" introduces a set. Thus [abD] means: either
a or b or D. [a-zA-C] means any character from a to z or from A to C.
Attention with some characters inside sets. Within a set, the only special
characters are "[", "]", "-", and
"^", and the combinations "[:", "[=", and
"[.". The backslash is not special within a set.
Some useful categories of characters are: [:upper:]
=upper-case letters, [:lower:] =lower-case letters, [:alpha:] =letters
meaning upper+lower, [:digit:] =0 to 9, [:alnum:] = alpha and digits,
[:space:] =whitespace meaning ++ and
similar, [:graph:] =graphically printable characters except space, [:print:]
=printable characters including space, [:punct:] =punctuation characters
meaning graphical characters minus alpha and digits, [:cntrl:] =control
characters meaning non-printable characters.
Example. This command prints lines containing a capital letter followed by a
digit:
dir -l | grep '[[:upper:]][[:digit:]]'
|