正则表达式及linux文本检查工具

发布时间：2020-12-14 04:33:29 所属栏目：百科来源：网络整理

导读：650) this.width=650;" src="http://img.jb51.cc/vcimg/static/loading.png" alt="j_0006.gif" src="http://img.baidu.com/hi/jx2/j_0006.gif"> 首先我们要明白什么是正则表达式？用最简单的话来说，正则表达式就是一套为了处理大量的字符串来定义的某种规

首先我们要明白什么是正则表达式？

用最简单的话来说，正则表达式就是一套为了处理大量的字符串来定义的某种规则和方法；或者换一句话来讲，正则表达式就是用一些特殊的字符来重新定义表示含义：

例如：我们把"."表示任意的单个字符；这样的类似的重新定义就是我们讲的正则表达式；

正则表达式广泛的引用在grep工具中，所以我们先通过grep慢慢引出什么是正则表达式...

一、linux正则表达式之前的三个文本查找命令

grep：(global search regular RE )全面搜索正则表达式并把行打印出来）

相关解释：最早的文本匹配程序，使用POSIX定义的基本正则表达式（BRE）来匹配文本

名称:print lines matching a pattern是一种强大的文本搜索工具，它只能使用基本的正则表达式来搜索文本，并把匹配的行打印出来

[root@linux~]#grep'root'/etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
[root@linux~]#

格式：

1）grep [OPTIONS] PATTERN [FILE...]

################################下面我们就根据这个文件进行讲解#######################
[root@linux~]#cattest.txt
Thisisabeautifulgirl
Sodoyouwanttoknowwhoisher?


oh!Ican`ttellyou?



canyoutellmeyourphonenumber?
Mytelphonenumberis15648562351...


Beatwishtoyou?
#########################################################################################

2）grep [OPTIONS] [-e PATTERN | -f FILE] [FILE...]

描述：grep会根据标准输入的“PATTERN”或者被命名的文件搜索相应的行，默认情况下会打印匹配的行

[root@linux~]#grep'telphone'test.txt
Mytelphonenumberis15648562351...
[root@linux~]#

常用选项：

-E: 相当于egrep,是由POSIX指定，利用此命令可以使用扩展的正则表达式对文本进行搜索，并把符合用户需求的字符串打印出来

注意：当我们使用egrep的时候我们就不需要对特殊的字符进行转移操作了，这一点与grep有一点差别：

先来看看egrep的使用：

[root@linux~]#egrep'beautiful'test.txt
Thisisabeautifulgirl
[root@linux~]#

下面是grep -E类似与egrep的功能

[root@linux~]#grep-E'^(a|J)'/etc/passwd
adm:x:3:4:adm:/var/adm:/sbin/nologin
avahi-autoipd:x:170:170:AvahiIPv4LLStack:/var/lib/avahi-autoipd:/sbin/nologin
abrt:x:173:173::/etc/abrt:/sbin/nologin
Jason:x:1000:1000::/home/Jason:/bin/bash

-F: 相当于fgrep,是由Posix指定，它利用固定的字符串来对文本进行搜索，但不支持正则表达式的引用，所以此命令的执行速度也最快

[root@linux~]#grep-F'root'/etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin

--color=auto/nerver/always：对匹配到的文本着色后高亮显示，一般在alias中定义；

[root@linux~]#alias
aliascp='cp-i'
aliasegrep='egrep--color=auto'
aliasfgrep='fgrep--color=auto'
aliasgrep='grep--color=auto'
aliasl.='ls-d.*--color=auto'
aliasll='ls-l--color=auto'
aliasls='ls--color=auto'
aliasmv='mv-i'
aliasrm='rm-i'
aliaswhich='alias|/usr/bin/which--tty-only--read-alias--show-dot--show-tilde'
[root@linux~]#

[root@linux~]#grep'home'--color=auto/etc/passwd
Jason:x:1000:1000::/home/Jason:/bin/bash
[root@linux~]#grep'home'--color=never/etc/passwd
Jason:x:1000:1000::/home/Jason:/bin/bash
[root@linux~]#grep'home'--color=always/etc/passwd
Jason:x:1000:1000::/home/Jason:/bin/bash
[root@linux~]#

-i：忽略字符大小写；

[root@linux~]#cattest.txt
Goodmorning,zhangAn!
[root@linux~]#grep-i'a'test.txt
Goodmorning,zhangAn!

-o：仅显示匹配到的文本自身；

[root@linux~]#cattest.txt
Goodmorning,zhangAn!
[root@linux~]#grep-o'zhang'test.txt
zhang
[root@linux~]#

-v: --invert-match：反向匹配，匹配引号之外的行

[root@linux~]#cattest.txt
Goodmorning,zhangAn!
nihao
[root@linux~]#grep-v'Good'test.txt
nihao
[root@linux~]#
#在这里可以看出反向匹配是打印出来不包含'Good'的行

-q: --quiet， --silient：静默模式，不输出任何信息；

[root@linux~]#grep-v'Good'test.txt
nihao
[root@linux~]#grep-qv'Good'test.txt
[root@linux~]#

-n:显示匹配到行，并且显示行号

[root@linux~]#grep-n'o'test.txt
1:Goodmorning,zhangAn!
2:nihao
[root@linux~]#grep'o'test.txt|cat-n
1	Goodmorning,zhangAn!
2	nihao
[root@linux~]#
#grep的n选项是有颜色的与cat的n选项有一些差别

-c: 计算找到‘PATTERN’的次数

[root@linux~]#grep-c'o'test.txt
2
[root@linux~]#

-A：显示匹配到字符那行的后面n行

[root@linux~]#cattest.txt
gegVDFwer34fs43dfwerFG4g
gegVDFweSDFGertgg
23ere67fgSD5436fe
nihao,zhandge
[root@linux~]#grep-A1'23'test.txt
23ere67fgSD5436fe
nihao,zhandge
[root@linux~]#

-B：显示匹配到字符那行的前面n行

[root@linux~]#cattest.txt
gegVDFwer34fs43dfwerFG4g
gegVDFweSDFGertgg
23ere67fgSD5436fe
nihao,zhandge
[root@linux~]#grep-B2'23'test.txt
gegVDFwer34fs43dfwerFG4g
gegVDFweSDFGertgg
23ere67fgSD5436fe
[root@linux~]#

-C：显示匹配到字符那行的前后n行

[root@linux~]#grep-C1'23'test.txt
gegVDFweSDFGertgg
23ere67fgSD5436fe
nihao,zhandge
[root@linux~]#

-G：--basic-regexp：支持使用基本正则表达式；

-P：--perl-regexp：支持使用pcre正则表达式；

-e： PATTERN,--regexp=PATTERN：多模式机制；

-f： FILE,--file=FILE：FILE为每行包含了一个pattern的文本文件，即grep script；

下面就不演示这两个，上面有相关的例子

egrep:扩展式grep，其使用扩展式正规表达式（ERE）来匹配文本。

egrep命令等同于grep -E，利用此命令可以使用扩展的正则表达式对文本进行搜索，并把符合用户需求的字符串打印出来。

fgrep：快速grep，这个版本匹配固定字符串而非正则表达式。并且是唯一可以并行匹配多个字符串的版本。

fgrep命令等同于grep -F，它利用固定的字符串来对文本进行搜索，但不支持正则表达式的引用，所以此命令的执行速度也最快。

二、基本正则表达式:

基本意义：由一些基本字符以及某些特殊字符搭配，组合成一段具有某种语法规则的能轻松搜索并匹配文本的字符串

分类：基本正则表达式与扩展正则表达式

1）基本正则表达式的元字符

什么是元字符？

元字符是一个或一组代替一个或多个字符的字符，其实呢就是下面的这几类.

1）字符匹配

.:表示匹配任意的单个字符

[root@linux~]#grep'r..t'/etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
ftp:x:14:50:FTPUser:/var/ftp:/sbin/nologin
[root@linux~]#

我们注意这样的一个例子：

[root@linux~]#grep'.'test.txt
Thisisabeautifulgirl
Sodoyouwanttoknowwhoisher?
oh!Ican`ttellyou?
canyoutellmeyourphonenumber?
Mytelphonenumberis15648562351...
Beatwishtoyou?

这样就把所有的行都匹配出来了

[]:匹配指定范围内的单个字符:

[root@linux~]#grep'[aj]h'test002.txt
ahjhb
[root@linux~]#

[^]:匹配指定范围内的单个字符

[root@linux~]#grep'[^a]h'test002.txt
ahjhb
[root@linux~]#

[:alnum:] : 数字与字母大小写字符-->"A-Za-z0-9"

[root@linux~]#grep'[[:alnum:]]'test002.txt
b
ab
acb
aaX2Ab
a[Ah?jhb
aba1baba5bab
[root@linux~]#
####下面的我就不再一一在例子了，很简单

[:digit:] : 数字字符-------------->"0-9"

[root@linux~]#grep'[[:digit:]]'test.txt
Mytelphonenumberis15648562351...
[root@linux~]#把电话号码匹配出来了

[:punct:] : 标点符号字符---------->"?.,"

[root@linux~]#grep'[[:punct:]]'test.txt
Sodoyouwanttoknowwhoisher?
oh!Ican`ttellyou?
canyoutellmeyourphonenumber?
Mytelphonenumberis15648562351...
Beatwishtoyou?
[root@linux~]#把所有的标点符号匹配出来了

[:alpha:] : 字母字符-------------->"A-Za-z"

[root@linux~]#grep'[[:alpha:]]'test.txt
Thisisabeautifulgirl
Sodoyouwanttoknowwhoisher?
oh!Ican`ttellyou?
canyoutellmeyourphonenumber?
Mytelphonenumberis15648562351...
Beatwishtoyou?
除了字母是不是都过滤掉了？

[:graph:] : 除空格符(空格键与(Tab)按键)外的其他所有按键

[root@linux~]#grep'[[:graph:]]'test.txt
Thisisabeautifulgirl
Sodoyouwanttoknowwhoisher?
oh!Ican`ttellyou?
canyoutellmeyourphonenumber?
Mytelphonenumberis15648562351...
Beatwishtoyou?
[root@linux~]#看看前面的源文件，对比一下，是不是？

[:space:] : 代表的是空白字符，包括空格键[Tab]等

[root@linux~]#grep'[[:graph:]]'test.txt
Thisisabeautifulgirl
Sodoyouwanttoknowwhoisher?
oh!Ican`ttellyou?
canyoutellmeyourphonenumber?
Mytelphonenumberis15648562351...
Beatwishtoyou?
[root@linux~]#这个演示的效果不太明显，你可以试一试"grep'[^[:space:]]'test.txt"

[:blank:] : 代表的是空格键与[Tab]按键

[:lower:] : 小写字母字符---------->"a-z"

[root@linux~]#grep'[[:lower:]]'test.txt
Thisisabeautifulgirl
Sodoyouwanttoknowwhoisher?
oh!Ican`ttellyou?
canyoutellmeyourphonenumber?
Mytelphonenumberis15648562351...
Beatwishtoyou?

[:upper:] : 大写字母字符---------->"A-Z"

[root@linux~]#grep'[^[:lower:]]'test.txt
Thisisabeautifulgirl
Sodoyouwanttoknowwhoisher?
oh!Ican`ttellyou?
canyoutellmeyourphonenumber?
Mytelphonenumberis15648562351...
Beatwishtoyou?这样写是不是也对呢？
[root@linux~]#grep'[[:upper:]]'test.txt
Thisisabeautifulgirl
Sodoyouwanttoknowwhoisher?
oh!Ican`ttellyou?
Mytelphonenumberis15648562351...
Beatwishtoyou?

[:cntrl:] : 表示键盘上面的控制按键即包括"CR,LF,Tab,Del"

[:print:] : 代表可以打印出来的字符

[:xdigit:] :代表十六进制的数字类型->"0-9，A-F,a-f"

上面三个不经常使用，就不演示了

2）匹配次数

用法：用在指定重复出现字符的后面

功能：限制前面的支付要出现的次数

*:表示匹配其前面的字符任意次(0或1或多次)【Jasonforcto注:这里要区分开通配符的定义】

案例： a,b,ab,aab,acb,adb,amnb

因此对于以上的案例中"a*b"只能匹配(b,)

详解:对于a*b表示的是a是一个需要重复的字符，而*表示重复的次数，b表示的是a后面的字符

因此：当*等于0时，匹配的是b

当*等于1时，匹配的是ab

当*等于2时，匹配的是aab

这里一定要注意*表示的是任意次,而不是任意字符

[root@linux~]#grep'.*'test.txt
Thisisabeautifulgirl
Sodoyouwanttoknowwhoisher?


oh!Ican`ttellyou?



canyoutellmeyourphonenumber?
Mytelphonenumberis15648562351...


Beatwishtoyou?
在看看下面一个,要注意这两个的结合
[root@linux~]#grep'.'test.txt
Thisisabeautifulgirl
Sodoyouwanttoknowwhoisher?
oh!Ican`ttellyou?
canyoutellmeyourphonenumber?
Mytelphonenumberis15648562351...
Beatwishtoyou?
或者看看下面的理解了吗？
[root@linux~]#grep-v'.'test.txt







[root@linux~]#

.*:表示的是任意长度的任意字符（此处的*是重复.的次数）

[root@linux~]#grep'.*'test.txt
Thisisabeautifulgirl
Sodoyouwanttoknowwhoisher?


oh!Ican`ttellyou?



canyoutellmeyourphonenumber?
Mytelphonenumberis15648562351...


Beatwishtoyou?

案例：t.*e表示的是t开头，e结尾"."可以是任意字符"*"表示.可以出现多次

[root@linux~]#grep't.*e'test.txt
Sodoyouwanttoknowwhoisher?
oh!Ican`ttellyou?
canyoutellmeyourphonenumber?
Mytelphonenumberis15648562351...

这个命令把所有符合的条件都显示了出来，为什么是一行呢？这里大家注意一下，Linux系统中正则表达式默认的处于贪婪模式，就是尽可能的把所有匹配到的行都显示出来。可是如何关闭贪婪模式呢？下面一个我们就讲到了

注意: 默认工作处于贪婪模式

贪婪与非贪婪(惰性) 两者的前提都是需要匹配成功，区别在于：

贪婪是在匹配成功的前提下，尽可能多的去匹配

非贪婪(惰性)是在匹配成功的前提下，尽可能少的去匹配

?:表示匹配其前面的字符可有可无(1次或0次)

[root@linux~]#grep'a?b'test002.txt
b
ab
acb
aab
ahjhb
ababababab
[root@linux~]#
#注意在grep中需要给特殊字符转义比如"{},?,<>"以下就不多说明了

{m,n}:匹配其前面的字符至少m次，至多n次

[root@linux~]#grep'a{1,3}b'test002.txt
ab
aab
ababababab
[root@linux~]#

{1,}：表示匹配至少一次

[root@linux~]#grep'a{1,}b'test002.txt
ab
aab
ababababab
[root@linux~]#

{0,2}:表示最多匹配2次

[root@linux~]#grep'a{0,2}b'test002.txt
b
ab
acb
aab
ahjhb
ababababab
[root@linux~]#

3）位置锚定

说明：限制使用模式搜索文本，限制模式所匹配到的文本只能出现于目标文本的个位置；

字符	描述
^	行首锚定；用于模式的最左侧，^PATTERN
$	行尾锚定；用于模式的最右侧，PATTERN$
^PATTERN$	要让PATTERN完全匹配一整行
^$	没有任何字符包括空格的行
^[[:space:]]*$	没有任何字符但是包括空格的行

[root@linux~]#grep'^r..t'/etc/passwd
root:x:0:0:root:/root:/bin/bash
##如果不加上首字符,其他的行内包含的r..t也会显示出来的
[root@linux~]#grep'r..t'/etc/passwd
root:x:0:0:root:/root:/bin/bash
operator:x:11:0:operator:/root:/sbin/nologin
ftp:x:14:50:FTPUser:/var/ftp:/sbin/nologin

[root@linux~]#grep't$'/etc/passwd
halt:x:7:0:halt:/sbin:/sbin/halt
[root@linux~]#

^$:

[root@linux~]#grep-n'^$'test002.txt
8:
[root@linux~]#

单词：由非特殊字符组成的连续字符（字符串）都称为单词；

&;或b：词首锚定，用于单词模式的左侧，格式为&;PATTERN,bPATTERN

###第一个例子
[root@linux~]#grep"&;th"test003.txt
thisissuper
thisisasuper
thisisabigsuper
[root@linux~]#
##########################################
[root@linux~]#grep"super&;"test003.txt
thisissuper
thisisasuper
thisisabigsuper
superisme
hesuperishei
[root@linux~]#

###第二个例子
[root@linux~]#grep"&;[0-9]{3,4}&;"/etc/passwd
games:x:12:100:games:/usr/games:/sbin/nologin
avahi-autoipd:x:170:170:AvahiIPv4LLStack:/var/lib/avahi-autoipd:/sbin/nologin
systemd-bus-proxy:x:999:997:systemdBusProxy:/:/sbin/nologin
systemd-network:x:998:996:systemdNetworkManagement:/:/sbin/nologin
polkitd:x:997:995:Userforpolkitd:/:/sbin/nologin
abrt:x:173:173::/etc/abrt:/sbin/nologin
colord:x:996:994:Userforcolord:/var/lib/colord:/sbin/nologin
libstoragemgmt:x:995:992:daemonaccountforlibstoragemgmt:/var/run/lsm:/sbin/nologin
setroubleshoot:x:994:991::/var/lib/setroubleshoot:/sbin/nologin
rtkit:x:172:172:RealtimeKit:/proc:/sbin/nologin
chrony:x:993:990::/var/lib/chrony:/sbin/nologin
geoclue:x:992:989:Userforgeoclue:/var/lib/geoclue:/sbin/nologin
usbmuxd:x:113:113:usbmuxduser:/:/sbin/nologin
pulse:x:171:171:PulseAudioSystemDaemon:/var/run/pulse:/sbin/nologin
Jason:x:1000:1000::/home/Jason:/bin/bash
[root@linux~]#

&;或b：词尾锚定，用于单词模式的右侧，格式为PATTERN&;,PATTERNb

[root@linux~]#fdisk-l|grep"/dev/[sh]d[a-z]&;"
Disk/dev/sda:128.8GB,128849018880bytes,251658240sectors
[root@linux~]#

[root@linux~]#grep"super&;"test003.txt
thisissuper
thisisasuper
thisisabigsuper
superisme
hesuperishei
[root@linux~]#

&;PATTERN&;：单词锚定；

[root@linux~]#grep"(is)*"test003.txt
thisissuper
thisisasuper
thisisabigsuper
superisme
hesuperishei
[root@linux~]#

2）扩展正则表达式

1) 字符匹配的命令和用法与基本正则表达式的用法相同，这里不再重复阐述。

2) 次数匹配：

* :匹配其前面字符的任意次

[root@linux~]#catmuli.txt
goodgodgooodfoodfoolishgooooooooooogle
[root@linux~]#egrep'o*'muli.txt
goodgodgooodfoodfoolishgooooooooooogle
[root@linux~]#

.* :表示匹配任意字符

[root@linux~]#catmuli.txt
goodgodgooodfoodfoolishgooooooooooogle
[root@linux~]#egrep'l.*'muli.txt
goodgodgooodfoodfoolishgooooooooooogle
[root@linux~]#

?:匹配其前面字符的0次或着1次

[root@linux~]#egrep'hi?'test003.txt
thisissuper
thisisasuper
thisisabigsuper
hesuperishei

+ :匹配其前面字符至少1次

[root@linux~]#egrep'hi+'test003.txt
thisissuper
thisisasuper
thisisabigsuper

{m,n} :匹配其前面字符m到n次

[root@linux~]#egrep'hi{1,2}'test003.txt
thisissuper
thisisasuper
thisisabigsuper

3) 位置锚定的用法和基本正则表达式的用法相同，在此不再阐述。

分组及引用：

(pattern)：分组，括号中的模式匹配到的字符会被记录于正则表达式引擎内部的变量中；

后向引用：1,2,...

[root@linux~]#grep'l..e.*l..e'test004.txt
helikehisliker
helovehislover
helikehislikes
helovehislovers

[root@linux~]#grep'(l..e).*1'test004.txt
helikehisliker
helovehislover
helikehislikes
helovehislovers

或者:

a|b：a或者b

[root@linux~]#egrep'C|is'test3.txt
Catisnotcat!
[root@linux~]#

C|cat：表示C或cat

[root@linux~]#cattest3.txt
Catisnotcat!
[root@linux~]#egrep'C|cat'test3.txt
Catisnotcat!
[root@linux~]#

(C|c)at：表示Cat或cat

[root@linux~]#cattest3.txt
Catisnotcat!
[root@linux~]#egrep'(C|c)at'test3.txt
Catisnotcat!
[root@linux~]#

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!