Freedom The Open Source Way Contribute Articles or News to OSForgeOSForge HomeLogout from Forums
Contacting OSForgeOSForge HomeAbout OSForge
  

Root
Contribute News
Learning Corner
Linux Distributions
Linux Common FAQ's
Discussion Forums
Community Gallery
Links Directory
Search OSForge
Networking
Industry Updates
Linux & Open Source
Opinions
Press Release
Programming
Security
Web Development

White Paper
Plat'Home Unveils Winners of “Will Linux Work?” Contest
Zenoss Core Recognized as Best Open Source Network Monitoring Solution
LinMin™ Joins Intel® Certified Software Solutions Program
xTuple™ ERP 3.0 Wins “Best Business Application” At LinuxWorld Conference & Exp
Holland Computing Center - Rocks+Moab Provides Windows/Linux Cluster Solution
LogMeIn Launches Mobile Plug-in for Linux
FuseMail Selects Funambol’s Open Source Push Email and PIM Sync Solution
Zenoss Expands IT Management Solution for Managed Service Providers
Moab Workload Manager Claims Title as World’s First Petaflop Scheduler
Moab Workload Manager Claims Title as World’s First Petaflop Scheduler

View More

Linux Newbie Guide XI - Newbie goes Pro! (Advanced)

Page: 2/9  [Printable Version]



Regular expressions (regexpr)
Regular experessions are used for "pattern" matching in search, replace, etc. They are often used with utilities (e.g., grep, sed) and programming languages (e.g., perl).
In regular expressions, most characters just match themselves. The exceptions are the "metacharacters" that have special meaning.

In regexpr, the special characters are: "" (backslash), "." (dot), "*" (asterisk), "[" (bracket), "^" (caret, special only at the beginnig of a string), "$" (dollar sign, special only at the end of a string). A character terminating a pattern string is also special for this string.

The backslash, "" is used as an "escape" character, i.e., to quote a subsequent special character.
Thus, "" searches for a backslash, "." searches for a dot, "*" searches for the asterisk, "[" searches for the bracket, "^" searches for the caret even at the begining of the string, "$" searches for the dollar sign even at the end of the string.

Backslash followed by a regular (non-special) character may gain a special meaning. Thus, the symbols < and > match an empty string at the beginning and the end of a word, respectively. The symbol  matches the empty string at the edge of a word, and B matches the empty string provided it's not at the edge of a word.
The dot, ".", matches any single character. [The dir command uses "?" in this place.] Thus, "m.a" matches "mpa" and "mea" but not "ma" or "mppa".

Any string is matched by ".*" (dot and asterisk). [The dir command uses "*" instead.] In general, any pattern followed by "*" matches zero or more occurences of this pattern. Thus, "m*" matches zero or more occurances of "m". To search for one or more "m", I could use "mm*".

The * is a repetition operator. Other repetition operators are used less often--here is the full list:
* the proceding item is to be matched zero or more times;
+ the preceding item is to be matched one or more times);
? the preceding item is optional and matched at most once);
{n} the preceding item is to be matched exactly n times;
{n,} the preceding item is to be matched n or more times);
{n,m} the preceding item is to be matched at least n times, but not more than m times.
The caret, "^", means "the beginning of the line". So "^a" means "find a line starting with an "a".

The dollar sign, "$", means "the end of the line". So "a$" means "find a line ending with an "a".

Example. This command searches the file myfile for lines starting with an "s" and ending with an "n", and prints them to the standard output (screen):

cat myfile | grep '^s.*n$'

Any character terminating the pattern string is special, precede it with a backslash if you want to use it within this string.

The bracket, "[" introduces a set. Thus [abD] means: either a or b or D. [a-zA-C] means any character from a to z or from A to C.

Attention with some characters inside sets. Within a set, the only special characters are "[", "]", "-", and "^", and the combinations "[:", "[=", and "[.". The backslash is not special within a set.

Some useful categories of characters are: [:upper:] =upper-case letters, [:lower:] =lower-case letters, [:alpha:] =letters meaning upper+lower, [:digit:] =0 to 9, [:alnum:] = alpha and digits, [:space:] =whitespace meaning ++ and similar, [:graph:] =graphically printable characters except space, [:print:] =printable characters including space, [:punct:] =punctuation characters meaning graphical characters minus alpha and digits, [:cntrl:] =control characters meaning non-printable characters.

Example. This command prints lines containing a capital letter followed by a digit:

dir -l | grep '[[:upper:]][[:digit:]]'

<< Previous Page << Previous Page (1/9)       Next Page >> (3/9) Next Page >>

[ Back to Linux Computing | Sections Index ]

 
Scroll Up

   About | Term of Use | Privacy | Adras | Tell a Friend | Advertise  

OSForge News RSS Feed