GREP — Find Regular Expressions in Files
Quick Start for Release 8.0
Program Dated 5 May 2005 / Document Dated 17 Apr 2006
Copyright © 1986–2008 by Stan Brown, Oak Road Systems
Program Dated 5 May 2005 / Document Dated 17 Apr 2006
Copyright © 1986–2008 by Stan Brown, Oak Road Systems
Summary: GREP searches named input files, or the standard input, and displays lines that match one or more patterns called regular expressions or regexes. GREP can also search binary files and display records or buffers that contain matches. This Quick Start is your overview of GREP.
Contents:
These documents are sometimes revised between software releases — you may want to check for revisions at <http://oakroadsystems.com/sharware/grep.htm>.
The DOS filter FIND is useful for finding a given string in one or more files. But what if you want to find the word the in caps or lower case, without also finding other, There, then, and so on? You don’t really want to search for a specific string. Rather, what you’re looking for is a regular expression or regex, namely the preceded and followed by something other than a letter. GREP to the rescue!
GREP takes one or more regexes, matches them against the input files, and displays the hits.
Oak Road Systems GREP combines most features of UNIX grep, egrep, and fgrep. GREP has many other advantages over FIND besides using regular expressions. Indeed, customers have cited some of these as features they couldn’t find in competing GREPs:
The 16-bit version, GREP16, runs under DOS 2.0 or higher, including a DOS box under any version of Windows. The 32-bit version, GREP32, requires a DOS box (or “command prompt”) under Windows 95, Windows NT, or any later Windows.
The two executables operate the same and have the same features, except that you need GREP32 for long filenames, for extended regexes, and for character mapping. If you typically run GREP in a DOS box (“command prompt”) under Windows 95 or NT or later, GREP32 is the one you want.
There’s no special installation procedure. Simply move GREP16.EXE, GREP32.EXE, or both to any convenient directory in your path. Each executable is completely self contained. If installing on Windows XP, don’t fiddle with any compatibility settings: GREP runs fine with the XP defaults.
An interactive program tour is included as file TOUR.BAT; just type TOUR after unZIPping the archive.
You may wish to rename the executable you use more often to the simpler GREP.EXE. All the examples in this GREP Quick Start assume you’ve done that. Otherwise, just substitute GREP16 or GREP32 wherever you see GREP in the examples.
Starting with release 7.5, a Quick Reference Card is included as an MS-Word file, GREPQRC.DOC. It’s suitable for printing in 8½×11 or A4 format.
GREP is shareware. You are encouraged to “try before you buy” with the free download.
If you use GREP past a 30-day evaluation period, you must register and pay for it. Please see the file LICENSE.TXT for full details, including support and warranty information.
The unregistered evaluation version displays a registration reminder when you run it, and a request for feedback at the end.
Warning for batch files: About once per hundred runs, the unregistered version prompts you to press a key to continue execution. GREP works just fine in batch files, but you need to be at your computer when running unregistered GREP so that you can answer that prompt. If you like GREP enough to put it into batch files that run unattended, you like it enough to register it!
When you register, you get the registered version with these benefits:
There’s no special uninstall procedure; simply delete the GREP files. GREP doesn’t write any secret files or modify the Windows registry.
The basic GREP command form is
grep options regex inputfiles
(You can also run GREP from the Windows desktop; see the GREP Manual.
Options are listed later in this GREP Quick Start and are fully explained in the GREP Manual.
regex is a string or a special pattern-matching string called a regular expression or regex. Regex patterns are listed later in this GREP Quick Start and are explained in detail in the GREP Manual. (A regex is normally required on the command line; however, if you use the /F option, one or more regexes are taken from a file or the keyboard instead of the command line.)
You can specify inputfiles on the command line; otherwise GREP reads the standard input.
As with any command, you can redirect or pipe inputs or output. GREP can return a useful value in ERRORLEVEL, as explained in the GREP Manual.
Here are two simple examples. First,
grep /I pic[t\s] \proj\*.cob
examines every COBOL source file in the root-level PROJ directory and displays every line that contains a picture clause (“pic” followed by either “t” or a space) in caps or lower case (the /I option). Adding the /S option
grep /I /S pic[t\s] \*.cob
examines every COBOL source file in all directories on the current disk.
For a summary of operating instructions, type
grep /? | more
Since the help text is over 100 lines long, you might prefer to redirect it to your printer or a file:
grep /? >prn:
GREP scans either named input files or the standard input — the standard input can be a named file, a pipe, or the keyboard.
Named input files provide the greatest flexibility. They can be read as text or binary, and you can search subdirectory trees.
GREP32 can use long filenames; GREP16 requires short (8.3) filenames.
GREP expands any wildcards in named input files. Not only DOS-style * and ?, but UNIX-style [...] can be used. For instance, "c:\My Documents\[abc]*doc" tells GREP to read every file in the indicated directory whose name starts with A, B, or C and ends with DOC (including “.DOC”). Please see Named Input Files in the GREP Manual for complete rules.
You can use the /X option to exclude some files or groups of files from consideration. For instance, if you want all 2001 reports except December, you might specify something like
grep [options] [regex] *2001* -x*dec2001*
If you have many named input files, you may want to store the list in a file; see the /@ option.
If you set the /S option, GREP searches not only the files indicated on the command line, but also the same-named files in subdirectories.
(The /S option is fully functional in the registered version, and searches all the way to the bottom of a directory tree. In the unregistered evaluation version, GREP searches the named or implied directories and all directories immediately below them, but no further in any one execution. You can either make multiple runs, or register GREP for the convenience of searching the entire directory tree.)
For example, with the command
grep /S regex \hazax* *.c g:\mumble\*.htm
GREP examines all files on the entire current drive whose names start with “hazax”; then it looks at all C source files in the current directory and all subdirectories under it; finally it looks at all HTML files in directory “g:\mumble” and all subdirectories under it.
Perhaps a more realistic example: you have a document about Vandelay Industries somewhere on your disk, but you can’t remember where. You can find it this way:
grep /S Vandelay \*
or: grep /S Vandelay \*.*
(Both * and *.* select all files; see Wildcard Expansion in the GREP Manual.) You might want to add the /I option if you can’t remember how “Vandelay” was capitalized.
If you don’t specify any named input files, GREP takes its input from the standard input. That can mean any of these three sources:
input redirected from a single file (DOS doesn’t allow wildcards):
grep [options] [regex] <inputfile
another command’s output piped into GREP for further processing:
other-command | grep [options] [regex]
keyboard input (GREP prompts you):
grep [options] [regex]
Example:
tracert oakroadsystems.com | grep 123
tells GREP to read the tracert command’s output and display any lines that contain the string “123”.
GREP was originally written with plain text files in mind, but you can also use it quite well with binary files like word-processing files, databases, and executable programs. GREP not only reads binary files differently, it also adjusts the display format for matches.
DOS and Windows don’t mark a file as text or binary; the program that reads the file just has to know. GREP “knows” files are binary when you tell it via the /R2 or /R3 option; otherwise it treats input files as text. Use the /R3 option when you don’t know any details of the internal structure of the binary file; please see Binary Files and Text Files in the GREP Manual for much more about binary files.
Registered users can use the /R-1 or /R-2 option to have GREP examine each file and decide whether it’s text or free-form binary; please see the /R option in the GREP Manual for details. If you have the registered version, I recommend /R-1.
Normally, GREP displays hits on your screen. “Hits” are the text lines, binary records, or binary buffers that contain matches for the regex(es). As part of the output, GREP displays the file path and name as a header above the group of hits from that file. You can use various options to display abbreviated or expanded forms of hits or to suppress those headers, move them to the lines with the hits, or display headers even for files that had no hits.
You can also redirect GREP’s output into a file or pipe GREP’s output to another command (even another GREP command). To redirect GREP output, follow the DOS rules and put one of these at the end of the GREP command line:
>>reportfile
appends GREP’s output to an existing file, or create the file and write
to it if it doesn’t exist.
>reportfile
overwrites an existing file with GREP’s output, or create the file
and write to it if it doesn’t exist.
| other-command
pipes GREP’s output to the standard input stream of the other
command.
You can pipe or redirect output regardless of whether input was piped or redirected.
Only the hits (and file path\name headers, if present) are redirected by the above syntax. Errors and warning messages are still sent to the standard error stream. That is usually your screen, though some OSes or shell replacements let you redirect error output. For example, in 4DOS and 4NT type help piping or help redirection for information.
The /D option lets you create extra debugging output and send it to a named file or the standard error output.
Each description is hyperlinked from the downloaded copy of this GREP Quick Start to the full description in the GREP Manual.
| Option and Effect | UNIX grep* | DOS FIND* | |
|---|---|---|---|
| ? | Display help for files, regexes, and options. | --help | /? |
| @ | Take input file names from keyboard or file. | ||
| A | Include hidden and system files when expanding wildcards. | ||
| B | Display a header for every file, even if it contains no hits. | ||
| C | Display the hit count, not the actual hits. | -c | /C |
| D | Display debugging output. | ||
| E | Select extended regular expressions or strings, or search for a word. | (-E), (-w) | |
| F | Read regexes from keyboard or file. | (-f) | |
| G | Read variable-length text lines or paragraphs. | ||
| H | Don’t display headers (file names) in output. | -h | |
| I | Ignore case when matching. | -i | /I |
| J | Display just the part of each line that matches the regex. | -o | |
| K | Report only the first few hits. | ||
| L | List the files that contain hits, not the actual hits. | -l | |
| M | Specify character mapping and define “word”. | ||
| N | Show line numbers with hits. | -n | /N |
| O | Set output format. | ||
| P | Show context lines around matching lines. | (-A, -B, -C) | |
| Q | Suppress program logo and some or all warnings. | (-s) | |
| R | Read and display input files as binary or text. | -U, (-a) | |
| S | Scan files in subdirectories too. | -r | |
| U | UNIX-style output: show filespec with each hit. | (implied) | |
| V | Display lines that don’t contain a match. | -v | /V |
| W | Specify line width or binary block length. | ||
| X | Exclude specified files from scan. | -x | |
| Y | Multiple regexes must all match. | ||
| Z | Reset all options (recommended for batch files). | ||
| 0 | Set ERRORLEVEL to 0 if any hits were found. | ||
| 1 | Set ERRORLEVEL to 1 if any hits were found. | (-v) | |
| 3 | Set ERRORLEVEL to 3 if warnings were displayed. | ||
| * UNIX grep options are case sensitive; GREP and
FIND options are not.
(An option is shown in parentheses if the GREP option’s effect is similar but not identical.) | |||
On the command line, options can appear anywhere, before or after the regex and the input files. All options are processed before any files are read.
You have a lot of freedom about how you enter options: use a leading hyphen or slash, use upper- or lower-case letters, leave spaces between options or combine them. For instance, the following are just some of the different ways of turning on the /P3 option and /B option:
/p3 -b /b/P3 /p3B -B/P3 -P3 -b
This GREP Quick Start always uses capital letters for the options, to make it easier to distinguish letter l and figure 1.
For clarity, you should always use a hyphen or slash before the numeric /0 option, /1 option, or /3 option. Example: /E0 means the /E option with a value of 0, but /E/0 means the /E option with no value specified, followed by the /0 option.
Registered users who use certain options frequently can put them in the ORS_GREP environment variable. Use the SET command in the c:\config.sys file (if present) or on the command line:
set ORS_GREP=options...
You have the same freedom as on the command line: leading slashes or hyphens, space separation or options run together, caps or lower case.
Example: If you prefer to have GREP sense the type of each file (/R-1 option) and you prefer UNIX-style output (/U option) with line numbers (/N option), then you want to set the environment variable as
set ORS_GREP=/R-1UN
The GREP Manual gives more information about the environment variable, including instructions for overriding a particular stored option on the command line.
A regular expression or regex is a pattern of characters to compare to lines, records, or buffers from one or more input files. GREP reports a hit if the input contains a match with the pattern in the regex.
A regex can be a simple text string, like mother, or something more complex. (If you want to search only for simple strings, use the /E0 option and ignore all this regex stuff.)
Example 1: If you want both the English and the American spellings of the word “grey/gray”, use
gr[ea]y
as your regex. (See Example 5 for “colour/color”.)
Example 2: The basic regex for any word starting with “moth” is
moth[a-z]*
which is the letters “moth” followed by any number of letters a through z. Yes, that regex does match “moth” itself: see * or + for Repetition in the GREP Manual.
Example 3: A word in double quotes would be matched by
\"[a-z]+\"
Read that regex as “a double quote mark, followed by one or more letters, followed by another double quote mark.” (You need the backslashes \ to tell most flavors of DOS to pass the quote marks forward to GREP. See Quotes in a Regex in the GREP Manual.)
Example 4: A U.S. local telephone number has the basic regex
[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]
That signifies three digits, followed by a hyphen, followed by four digits. (You could express it more simply with an extended regex: [0-9]{3}-[0-9]{4} or even \d{3}-\d{4}.)
Example 5: To get the American and English spellings of “color/colour” is easy with GREP32: specify an extended regex (with the /E2 option) of
colou?r
GREP16 doesn’t support extended regexes, so you could either use colou*r (which would also match the non-words colouur, colouuuuur, etc.), or else use the /F- option and enter color and colour as two regexes.
From the examples you can see that a regex is essentially a string of characters with a bunch of operators thrown in to express possibilities like “any of these characters” and “repeated”. Here’s a quick summary of the characters that have special meaning in a regex; note that some work in any regex and others only in an extended regex (/E2 option). Each one is hyperlinked from the downloaded copy of this GREP Quick Start to the section of the GREP Manual where you’ll find a full description.
| which regexes? | description | |
|---|---|---|
| Characters with special meaning outside square brackets: | ||
| . period | any | matches any character |
| * asterisk | any | matches 0 or more occurrences of the preceding |
| + plus sign | any | matches 1 or more occurrences of the preceding |
| ? question mark | extended | matches 0 or 1 occurrence of the preceding |
| [ left square bracket | any | start a character class, e.g. [abcde] to match any one of a, b, c, d, e |
| ^ caret | any | match start of line in text mode or start of record in binary mode |
| $ dollar sign | any | match end of line in text mode or end of record in binary mode |
| \ backslash | any | treat any of the listed special characters as normal |
| \ backslash | extended | (1) character types like \w for a word character;
(2) simple assertions like \b for a word boundary; (3) back references to parenthesized subexpressions; (4) character encoding for odd characters like \x3c for < |
| { left brace | extended | repetition count, e.g. {3,} for three or more occurrences of the preceding |
| | vertical bar | extended | alternatives, e.g. mother|father to match “mother” or “father” |
| (...) parentheses or round brackets |
extended | subexpressions, e.g. ( )+ to match one or more occurrences of “ ” |
| Characters with special meaning inside square brackets: | ||
| ] right square bracket | any | end the character class |
| - minus sign or hyphen | any | character range, e.g. [a-z] to match any lower-case English letter |
| ^ caret | any | negate the character class, e.g. [^a-z] to match any character except a lower-case English letter |
| \ backslash | any | treat the next character as normal |
| \ backslash | extended | character encoding |
| [: left square bracket followed by colon |
extended | introduce a named character class, e.g. [[:punct:]0-9] for any punctuation character or a digit |
this page: http://oakroadsystems.com/sharware/grep101.htm
You might like to look at our other shareware utilities too.