常用正则表达式

发布时间：2020-12-13 19:47:37 所属栏目：百科来源：网络整理

导读：下面的例子默认以python为实现语言，用到python的re模块。 0、正则表达式的文档。（1）、正则表达式30分钟入门教程（2）、另一个不错的入门教程。（3）、揭开正则表达式的神秘面纱，个人觉得这篇文章对Multiline的讲解特别到位，截图如下： 650) this.wi

下面的例子默认以python为实现语言，用到python的re模块。

0、正则表达式的文档。

（1）、正则表达式30分钟入门教程

（2）、另一个不错的入门教程。

（3）、揭开正则表达式的神秘面纱，个人觉得这篇文章对Multiline的讲解特别到位，截图如下：

1、提取双引号及之间的内容。

（1）、用re.findall。

content='''abc"def"ghi'''
re.findall(r'".+"',content)
#结果
['"def"']

（2）、用re.search。

>>>content='''abc"def"ghi'''
>>>re.search(r'"(.+)"',content).group(0)
'"def"'

2、提取双引号之间的内容。规则: (pattern)

（1）、用re.findall。

content='''abc"def"ghi'''
re.findall(r'"(.+)"',content)
#结果
['def']

与1的区别是在需要返回的内容两边加上了括号。

（2）、用re.search。

>>>content='''abc"def"ghi'''
>>>re.search(r'"(.+)"',content).group(1)
'def'

3、效果同2。规则: (?<=pattern)、(?=pattern)

content='''abc"def"ghi'''
re.findall(r'(?<=").+(?=")',content)
#结果
['def']

4、C++中三种正则表达式比较(C regex，C ++regex，boost regex)。

5、查找以某些字符串打头的行。比如查找以+++、---、index打头的行：

#方法一，按行匹配
foriinlst:
ifre.match(r"(---|+++|index).*",i):
printi
#方法二，一次性匹配
re.findall(r'^(?:+++|---|index).*$',content,re.M)
#方法二精简版
re.findall(r'^(?:[-+]{3}|index).*$',re.M)

6、包含/不包含（参考：利用正则表达式排除特定字符串）

（0）、文本内容

>>>print(text)
www.sina.com.cn
www.educ.org
www.hao.cc
www.baidu.com
www.123.com

sina.com.cn
educ.org
hao.cc
baidu.com
123.com

（1）、匹配以www打头的行

>>>re.findall(r'^www.*$',text,re.M)
['www.sina.com.cn','www.educ.org','www.hao.cc','www.baidu.com','www.123.com']

（2）、匹配不以www打头的行

>>>re.findall(r'^(?!www).*$',re.M)
['','sina.com.cn','educ.org','hao.cc','baidu.com','123.com']

（3）、匹配以cn结尾的行

>>>re.findall(r'^.*?cn$','sina.com.cn']

（4）、配不以com结尾的行

>>>re.findall(r'^.*?(?<!com)$','','hao.cc']

（5）、匹配包含com的行

>>>re.findall(r'^.*?com.*?$','www.123.com','123.com']

(6)、匹配不包含com的行

>>>re.findall(r'^(?!.*com).*$',re.M)
['www.educ.org','hao.cc']

7、利用分组得到网址的第一级，即去除后面几级。（匹配全部，去除部分）

方法一：

>>>strr='http://www.baidu.com/abc/d.html'
>>>re.findall(r'(http://.+?)/.*',strr)
['http://www.baidu.com']

方法二：

>>>re.sub(r'(http://.+?)/.*',r'1',strr)
'http://www.baidu.com'

8、两个有助于理解正则分组的例子。

（1）、

>>>strr='A/B/C'
>>>re.sub(r'(.)/(.)/(.)',r'xx',strr)
'xx'
>>>re.sub(r'(.)/(.)/(.)',r'1xx',strr)
'Axx'
>>>re.sub(r'(.)/(.)/(.)',r'2xx',strr)
'Bxx'
>>>re.sub(r'(.)/(.)/(.)',r'3xx',strr)
'Cxx'

（2）、

>>>text='AA,BB:222'
>>>re.search(r'(.+),(.+):(d+)',text).group(0)
'AA,text).group(1)
'AA'
>>>re.search(r'(.+),text).group(2)
'BB'
>>>re.search(r'(.+),text).group(3)
'222'

9、提取含有hello字符串的div。

>>>content
'<divid="abc"><divid="hello1"><divid="def"><divid="hello2"><divid="hij">'
>>>
>>>p=r'<div((?!div).)+hello.+?>'
>>>re.search(p,content).group()
'<divid="hello1">'
>>>re.findall(p,content)
['"','"']
>>>foriterinre.finditer(p,content):
	print(iter.group())

<divid="hello1">
<divid="hello2">
>>>
>>>p=r'<div[^>]+hello.+?>'
>>>re.search(p,content)
['<divid="hello1">','<divid="hello2">']
>>>foriterinre.finditer(p,content):
	print(iter.group())

<divid="hello1">
<divid="hello2">

10、据walker猜测：在python3的Unicode字符集下，s匹配fnrtv加全角半角空格，共7个字符。

*** walker * Updated 2014-12-09 ***

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!