grep 与正则表达式

发表于2017年8月15日2017年8月17日作者

特殊符號	代表意義
[:alnum:]	代表英文大小寫字元及數字，亦即 0-9, A-Z, a-z
[:alpha:]	代表任何英文大小寫字元，亦即 A-Z, a-z
[:blank:]	代表空白鍵與 [Tab] 按鍵兩者
[:cntrl:]	代表鍵盤上面的控制按鍵，亦即包括 CR, LF, Tab, Del.. 等等
[:digit:]	代表數字而已，亦即 0-9
[:graph:]	除了空白字元 (空白鍵與 [Tab] 按鍵) 外的其他所有按鍵
[:lower:]	代表小寫字元，亦即 a-z
[:print:]	代表任何可以被列印出來的字元
[:punct:]	代表標點符號 (punctuation symbol)，亦即：" ' ? ! ; : # $…
[:upper:]	代表大寫字元，亦即 A-Z
[:space:]	任何會產生空白的字元，包括空白鍵, [Tab], CR 等等
[:xdigit:]	代表 16 進位的數字類型，因此包括： 0-9, A-F, a-f 的數字與字元

使用鸟哥提供的测试文件

http://linux.vbird.org/linux_basic/0330regularex/regular_express.txt

获取hair前两行和后三行的内容,并显示行号

[root@localhost test]# grep -n -A3 -B2 "hair" regular_express.txt

5-However, this dress is about $ 3183 dollars.

6-GNU is free air not free beer.

7:Her hair is very beauty.

8-I can't finish the test.

9-Oh! The soup taste good.

10-motorcycle is cheap than car.

忽略大小写

[root@localhost test]# grep -in 'the' regular_express.txt

8:I can't finish the test.

9:Oh! The soup taste good.

12:the symbol '*' is represented as start.

14:The gd software is a library for drafting programs.

15:You are the best is mean you are the no. 1.

16:The world <Happy> is the same with "glad".

18:google is the best tools for search keyword.

匹配集合

[root@localhost test]# grep -n 't[ae]st' regular_express.txt

8:I can't finish the test.

9:Oh! The soup taste good.

非g开头的单词

[root@localhost test]# grep -n '[^g]oo' regular_express.txt

2:apple is my favorite food.

3:Football game is not use feet only.

18:google is the best tools for search keyword.

19:goooooogle yes!

非小写字母开头后接oo

[root@localhost test]# grep -n '[^[:lower:]]oo' regular_express.txt

3:Football game is not use feet only.

行首the开头的

z@z-desktop:~/test$ grep -n '^the' regular_express.txt

12:the symbol '*' is represented as start.

行首a-z开头的

z@z-desktop:~/test$ grep -n '^[a-z]' regular_express.txt

2:apple is my favorite food.

4:this dress doesn't fit me.

10:motorcycle is cheap than car.

12:the symbol '*' is represented as start.

18:google is the best tools for search keyword.

19:goooooogle yes!

20:go! go! Let's go.

行尾以.结尾的（注意如果是win下编辑的文件，需要删除\r 可以使用sed -e "s/\r$//g" regular_express.txt 或者 tr -d '\r' < regular_express.txt 然后通过cat -A可以查看结果）

z@z-desktop:~/test$ grep -n '\.$' regular_express.txt

1:"Open Source" is a good mechanism to develop programs.

2:apple is my favorite food.

3:Football game is not use feet only.

4:this dress doesn't fit me.

10:motorcycle is cheap than car.

11:This window is clear.

12:the symbol '*' is represented as start.

15:You are the best is mean you are the no. 1.

16:The world <Happy> is the same with "glad".

17:I like dog.

18:google is the best tools for search keyword.

20:go! go! Let's go.

去除空行，去除#开头的行

grep -v '^$' /etc/rsyslog.conf | grep -v '^#'

匹配g和d中间含有两个字符的

z@z-desktop:~/test$ grep -n 'g..d' regular_express.txt

1:"Open Source" is a good mechanism to develop programs.

9:Oh! The soup taste good.

16:The world <Happy> is the same with "glad".

匹配g和g之间有任意个字符的

z@z-desktop:~/test$ grep -n 'g.*g' regular_express.txt

1:"Open Source" is a good mechanism to develop programs.

14:The gd software is a library for drafting programs.

18:google is the best tools for search keyword.

19:goooooogle yes!

20:go! go! Let's go.

RE 字符	意義與範例
^word	意義：待搜尋的字串(word)在行首！範例：搜尋行首為 # 開始的那一行，並列出行號grep -n '^#' regular_express.txt
word$	意義：待搜尋的字串(word)在行尾！範例：將行尾為 ! 的那一行列印出來，並列出行號grep -n '!$' regular_express.txt
.	意義：代表『一定有一個任意字元』的字符！範例：搜尋的字串可以是 (eve) (eae) (eee) (e e)，但不能僅有 (ee) ！亦即 e 與 e 中間『一定』僅有一個字元，而空白字元也是字元！grep -n 'e.e' regular_express.txt
\	意義：跳脫字符，將特殊符號的特殊意義去除！範例：搜尋含有單引號 ' 的那一行！grep -n \' regular_express.txt
*	意義：重複零個到無窮多個的前一個 RE 字符範例：找出含有 (es) (ess) (esss) 等等的字串，注意，因為 * 可以是 0 個，所以 es 也是符合帶搜尋字串。另外，因為 * 為重複『前一個 RE 字符』的符號，因此，在 * 之前必須要緊接著一個 RE 字符喔！例如任意字元則為『.』！grep -n 'ess' regular_express.txt
[list]	意義：字元集合的 RE 字符，裡面列出想要擷取的字元！範例：搜尋含有 (gl) 或 (gd) 的那一行，需要特別留意的是，在 [] 當中『謹代表一個待搜尋的字元』，例如『 a[afl]y 』代表搜尋的字串可以是 aay, afy, aly 即 [afl] 代表 a 或 f 或 l 的意思！grep -n 'g[ld]' regular_express.txt
[n1-n2]	意義：字元集合的 RE 字符，裡面列出想要擷取的字元範圍！範例：搜尋含有任意數字的那一行！需特別留意，在字元集合 [] 中的減號 – 是有特殊意義的，他代表兩個字元之間的所有連續字元！但這個連續與否與 ASCII 編碼有關，因此，你的編碼需要設定正確(在 bash 當中，需要確定 LANG 與 LANGUAGE 的變數是否正確！) 例如所有大寫字元則為 [A-Z]grep -n '[A-Z]' regular_express.txt
[^list]	意義：字元集合的 RE 字符，裡面列出不要的字串或範圍！範例：搜尋的字串可以是 (oog) (ood) 但不能是 (oot) ，那個 ^ 在 [] 內時，代表的意義是『反向選擇』的意思。例如，我不要大寫字元，則為 [^A-Z]。但是，需要特別注意的是，如果以 grep -n [^A-Z] regular_express.txt 來搜尋，卻發現該檔案內的所有行都被列出，為什麼？因為這個 [^A-Z] 是『非大寫字元』的意思，因為每一行均有非大寫字元，例如第一行的 "Open Source" 就有 p,e,n,o…. 等等的小寫字grep -n 'oo[^t]' regular_express.txt
\{n,m\}	意義：連續 n 到 m 個的『前一個 RE 字符』意義：若為 \{n\} 則是連續 n 個的前一個 RE 字符，意義：若是 \{n,\} 則是連續 n 個以上的前一個 RE 字符！範例：在 g 與 g 之間有 2 個到 3 個的 o 存在的字串，亦即 (goog)(gooog)grep -n 'go\{2,3\}g' regular_express.txt

延伸型正则表达式

删除空白行和注释行

egrep -v '^$|^#' regular_express.txt

RE 字符	意義與範例
+	意義：重複『一個或一個以上』的前一個 RE 字符範例：搜尋 (god) (good) (goood)… 等等的字串。那個 o+ 代表『一個以上的 o 』所以，底下的執行成果會將第 1, 9, 13 行列出來。egrep -n 'go+d' regular_express.txt
?	意義：『零個或一個』的前一個 RE 字符範例：搜尋 (gd) (god) 這兩個字串。那個 o? 代表『空的或 1 個 o 』所以，上面的執行成果會將第 13, 14 行列出來。有沒有發現到，這兩個案例( 'go+d' 與 'go?d' )的結果集合與 'go*d' 相同？想想看，這是為什麼喔！ ^_^egrep -n 'go?d' regular_express.txt
\|	意義：用或( or )的方式找出數個字串範例：搜尋 gd 或 good 這兩個字串，注意，是『或』！所以，第 1,9,14 這三行都可以被列印出來喔！那如果還想要找出 dog 呢？egrep -n 'gd\|good' regular_express.txt egrep -n 'gd\|good\|dog' regular_express.txt
()	意義：找出『群組』字串範例：搜尋 (glad) 或 (good) 這兩個字串，因為 g 與 d 是重複的，所以，我就可以將 la 與 oo 列於 ( ) 當中，並以 \| 來分隔開來，就可以啦！egrep -n 'g(la\|oo)d' regular_express.txt
()+	意義：多個重複群組的判別範例：將『AxyzxyzxyzxyzC』用 echo 叫出，然後再使用如下的方法搜尋一下！echo 'AxyzxyzxyzxyzC' \| egrep 'A(xyz)+C' 上面的例子意思是說，我要找開頭是 A 結尾是 C ，中間有一個以上的 "xyz" 字串的意思～

参考自鸟哥博客

http://linux.vbird.org/linux_basic/0330regularex.php

grep -F介绍

https://unix.stackexchange.com/questions/17949/what-is-the-difference-between-grep-egrep-and-fgrep

The -F switch switches grep into a different mode where it accepts a pattern to match, but then splits that pattern up into one search string per line and does an OR search on any of the strings without doing any special pattern matching.

-F, –fixed-strings

Interpret PATTERN as a list of fixed strings, separated by newlines, any of which is to be matched.

Here are some example scenarios:

You have a file with a list of say ten Unix usernames in plain text. You want to search the group file on your machine to see if any of the ten users listed are in any special groups:

grep -F -f user_list.txt /etc/group

The reason the -F switch helps here is that the usernames in your pattern file are interpreted as plain text strings. Dots for example would be interpreted as dots rather than wild-cards.

You want to search using a fancy expression. For example parenthesis () can be used to indicate groups with | used as an OR operator. You could run this search using -E:

grep -E '^no(fork|group)' /etc/group

…to return lines that start with either "nofork" or "nogroup". Without the -E switch you would have to escape the special characters involved because with normal pattern matching they would just search for that exact pattern;

grep '^no$fork\|group$' /etc/group

gnu grep 3.0 使用例子

http://www.gnu.org/software/grep/manual/grep.html

Here is an example command that invokes GNU grep:

grep -i 'hello.*world' menu.h main.c

This lists all lines in the files menu.h and main.c that contain the string ‘hello’ followed by the string ‘world’; this is because ‘.*’ matches zero or more characters within a line. See Regular Expressions. The -i option causes grep to ignore case, causing it to match the line ‘Hello, world!’, which it would not otherwise match. See Invoking, for more details about how to invoke grep.

Here are some common questions and answers about grep usage.

How can I list just the names of matching files?

grep -l 'main' *.c

lists the names of all C files in the current directory whose contents mention ‘main’.

How do I search directories recursively?

grep -r 'hello' /home/gigi

searches for ‘hello’ in all files under the /home/gigi directory. For more control over which files are searched, use find, grep, and xargs. For example, the following command searches only C files:

find /home/gigi -name '*.c' -print0 | xargs -0r grep -H 'hello'

This differs from the command:

grep -H 'hello' *.c

which merely looks for ‘hello’ in all files in the current directory whose names end in ‘.c’. The ‘find …’ command line above is more similar to the command:

grep -rH –include='*.c' 'hello' /home/gigi

What if a pattern has a leading ‘–’?

grep -e '–cut here–' *

searches for all lines matching ‘<span style="font-size:16px;font-family:monospace;background-color:#ffffff"–<cut here–’. Without -e, grep would attempt to parse ‘<span style="font-size:16px;font-family:monospace;background-color:#ffffff"–<cut here–’ as a list of options.

Suppose I want to search for a whole word, not a part of a word?

grep -w 'hello' *

searches only for instances of ‘hello’ that are entire words; it does not match ‘Othello’. For more control, use ‘\<’ and ‘\>’ to match the start and end of words. For example:

grep 'hello\>' *

searches only for words ending in ‘hello’, so it matches the word ‘Othello’.

How do I output context around the matching lines?

grep -C 2 'hello' *

prints two lines of context around each matching line.

How do I force grep to print the name of the file?

Append /dev/null:

grep 'eli' /etc/passwd /dev/null

gets you:

/etc/passwd:eli:x:2098:1000:Eli Smith:/home/eli:/bin/bash

Alternatively, use -H, which is a GNU extension:

grep -H 'eli' /etc/passwd

Why do people use strange regular expressions on ps output?

ps -ef | grep '[c]ron'

If the pattern had been written without the square brackets, it would have matched not only the ps output line for cron, but also the ps output line for grep. Note that on some platforms, ps limits the output to the width of the screen; grep does not have any limit on the length of a line except the available memory.

Why does grep report “Binary file matches”?

If grep listed all matching “lines” from a binary file, it would probably generate output that is not useful, and it might even muck up your display. So GNU grep suppresses output from files that appear to be binary files. To force GNU grep to output lines even from files that appear to be binary, use the -a or ‘<span style="font-size:16px;font-family:monospace;background-color:#ffffff"–<binary-files=text’ option. To eliminate the “Binary file matches” messages, use the -I or ‘<span style="font-size:16px;font-family:monospace;background-color:#ffffff"–<binary-files=without-match’ option.

Why doesn’t ‘grep -lv’ print non-matching file names?

‘grep -lv’ lists the names of all files containing one or more lines that do not match. To list the names of all files that contain no matching lines, use the -L or <span style="font-size:16px;font-family:monospace;background-color:#ffffff"–<files-without-match option.

I can do “OR” with ‘|’, but what about “AND”?

grep 'paul' /etc/motd | grep 'franc,ois'

finds all lines that contain both ‘paul’ and ‘franc,ois’.

Why does the empty pattern match every input line?

The grep command searches for lines that contain strings that match a pattern. Every line contains the empty string, so an empty pattern causes grep to find a match on each line. It is not the only such pattern: ‘^’, ‘$’, ‘.*’, and many other patterns cause grep to match every line.

To match empty lines, use the pattern ‘^$’. To match blank lines, use the pattern ‘^[[:blank:]]*$’. To match no lines at all, use the command ‘grep -f /dev/null’.

How can I search in both standard input and in files?

Use the special file name ‘–’:

cat /etc/passwd | grep 'alain' – /etc/motd

How to express palindromes in a regular expression?

It can be done by using back-references; for example, a palindrome of 4 characters can be written with a BRE:

grep -w -e '$.$$.$.\2\1' file

It matches the word “radar” or “civic.”

Guglielmo Bondioni proposed a single RE that finds all palindromes up to 19 characters long using 9 subexpressions and 9 back-references:

grep -E -e '^(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?)(.?).?\9\8\7\6\5\4\3\2\1$' file

Note this is done by using GNU ERE extensions; it might not be portable to other implementations of grep.

Why is this back-reference failing?

echo 'ba' | grep -E '(a)\1|b\1'

This gives no output, because the first alternate ‘(a)\1’ does not match, as there is no ‘aa’ in the input, so the ‘\1’ in the second alternate has nothing to refer back to, meaning it will never match anything. (The second alternate in this example can only match if the first alternate has matched—making the second one superfluous.)

How can I match across lines?

Standard grep cannot do this, as it is fundamentally line-based. Therefore, merely using the [:space:] character class does not match newlines in the way you might expect.

With the GNU grep option -z (<span style="font-size:16px;font-family:monospace;background-color:#ffffff"–<null-data), each input and output “line” is null-terminated; see Other Options. Thus, you can match newlines in the input, but typically if there is a match the entire input is output, so this usage is often combined with output-suppressing options like -q, e.g.:

printf 'foo\nbar\n' | grep -z -q 'foo[[:space:]]\+bar'

If this does not suffice, you can transform the input before giving it to grep, or turn to awk, sed, perl, or many other utilities that are designed to operate across lines.

What do grep, fgrep, and egrep stand for?

The name grep comes from the way line editing was done on Unix. For example, ed uses the following syntax to print a list of matching lines on the screen:

global/regular expression/print g/re/p

fgrep stands for Fixed grep; egrep stands for Extended grep.

RE 字符	意義與範例
^word	意義：待搜尋的字串(word)在行首！範例：搜尋行首為 # 開始的那一行，並列出行號grep -n '^#' regular_express.txt
word$	意義：待搜尋的字串(word)在行尾！範例：將行尾為 ! 的那一行列印出來，並列出行號grep -n '!$' regular_express.txt
.	意義：代表『一定有一個任意字元』的字符！範例：搜尋的字串可以是 (eve) (eae) (eee) (e e)，但不能僅有 (ee) ！亦即 e 與 e 中間『一定』僅有一個字元，而空白字元也是字元！grep -n 'e.e' regular_express.txt
\	意義：跳脫字符，將特殊符號的特殊意義去除！範例：搜尋含有單引號 ' 的那一行！grep -n \' regular_express.txt
*	意義：重複零個到無窮多個的前一個 RE 字符範例：找出含有 (es) (ess) (esss) 等等的字串，注意，因為 * 可以是 0 個，所以 es 也是符合帶搜尋字串。另外，因為 * 為重複『前一個 RE 字符』的符號，因此，在 * 之前必須要緊接著一個 RE 字符喔！例如任意字元則為『.』！grep -n 'ess' regular_express.txt
[list]	意義：字元集合的 RE 字符，裡面列出想要擷取的字元！範例：搜尋含有 (gl) 或 (gd) 的那一行，需要特別留意的是，在 [] 當中『謹代表一個待搜尋的字元』，例如『 a[afl]y 』代表搜尋的字串可以是 aay, afy, aly 即 [afl] 代表 a 或 f 或 l 的意思！grep -n 'g[ld]' regular_express.txt
[n1-n2]	意義：字元集合的 RE 字符，裡面列出想要擷取的字元範圍！範例：搜尋含有任意數字的那一行！需特別留意，在字元集合 [] 中的減號 – 是有特殊意義的，他代表兩個字元之間的所有連續字元！但這個連續與否與 ASCII 編碼有關，因此，你的編碼需要設定正確(在 bash 當中，需要確定 LANG 與 LANGUAGE 的變數是否正確！) 例如所有大寫字元則為 [A-Z]grep -n '[A-Z]' regular_express.txt
[^list]	意義：字元集合的 RE 字符，裡面列出不要的字串或範圍！範例：搜尋的字串可以是 (oog) (ood) 但不能是 (oot) ，那個 ^ 在 [] 內時，代表的意義是『反向選擇』的意思。例如，我不要大寫字元，則為 [^A-Z]。但是，需要特別注意的是，如果以 grep -n [^A-Z] regular_express.txt 來搜尋，卻發現該檔案內的所有行都被列出，為什麼？因為這個 [^A-Z] 是『非大寫字元』的意思，因為每一行均有非大寫字元，例如第一行的 "Open Source" 就有 p,e,n,o…. 等等的小寫字grep -n 'oo[^t]' regular_express.txt
\{n,m\}	意義：連續 n 到 m 個的『前一個 RE 字符』意義：若為 \{n\} 則是連續 n 個的前一個 RE 字符，意義：若是 \{n,\} 則是連續 n 個以上的前一個 RE 字符！範例：在 g 與 g 之間有 2 個到 3 個的 o 存在的字串，亦即 (goog)(gooog)grep -n 'go\{2,3\}g' regular_express.txt

RE 字符	意義與範例
+	意義：重複『一個或一個以上』的前一個 RE 字符範例：搜尋 (god) (good) (goood)… 等等的字串。那個 o+ 代表『一個以上的 o 』所以，底下的執行成果會將第 1, 9, 13 行列出來。egrep -n 'go+d' regular_express.txt
?	意義：『零個或一個』的前一個 RE 字符範例：搜尋 (gd) (god) 這兩個字串。那個 o? 代表『空的或 1 個 o 』所以，上面的執行成果會將第 13, 14 行列出來。有沒有發現到，這兩個案例( 'go+d' 與 'go?d' )的結果集合與 'go*d' 相同？想想看，這是為什麼喔！ ^_^egrep -n 'go?d' regular_express.txt
\|	意義：用或( or )的方式找出數個字串範例：搜尋 gd 或 good 這兩個字串，注意，是『或』！所以，第 1,9,14 這三行都可以被列印出來喔！那如果還想要找出 dog 呢？egrep -n 'gd\|good' regular_express.txt egrep -n 'gd\|good\|dog' regular_express.txt
()	意義：找出『群組』字串範例：搜尋 (glad) 或 (good) 這兩個字串，因為 g 與 d 是重複的，所以，我就可以將 la 與 oo 列於 ( ) 當中，並以 \| 來分隔開來，就可以啦！egrep -n 'g(la\|oo)d' regular_express.txt
()+	意義：多個重複群組的判別範例：將『AxyzxyzxyzxyzC』用 echo 叫出，然後再使用如下的方法搜尋一下！echo 'AxyzxyzxyzxyzC' \| egrep 'A(xyz)+C' 上面的例子意思是說，我要找開頭是 A 結尾是 C ，中間有一個以上的 "xyz" 字串的意思～