加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

Python:从行提取句子 – 基于标准需要正则表达式

发布时间:2020-12-16 21:43:27 所属栏目:Python 来源:网络整理
导读:这里有点Python/编程新手 我试图想出一个正则表达式,它可以处理从文本文件中的一行中提取句子,然后将它们附加到列表中.代码: import retxt_list = []with open('sample.txt','r') as txt: patt = r'.*}[.!?]s?n?|.*}.+[.!?]s?n?' read_txt = txt.readli

这里有点Python/编程新手……

我试图想出一个正则表达式,它可以处理从文本文件中的一行中提取句子,然后将它们附加到列表中.代码:

import re

txt_list = []

with open('sample.txt','r') as txt:
    patt = r'.*}[.!?]s?n?|.*}.+[.!?]s?n?'
    read_txt = txt.readlines()

    for line in read_txt:
        if line == "n":
            txt_list.append("n")
        else: 
            found = re.findall(patt,line)
            for f in found:
                txt_list.append(f)


for line in txt_list:
    if line == "n":
        print "newline"
    else:
        print line

根据上述代码的最后5行打印输出:

{Hello there|Hello|Howdy} Dr. Munchauson you {gentleman|fine fellow}! 
What {will|shall|should} we {eat|have} for lunch? Peas by the {thousand|hundred|1000} said Dr. Munchauson; {that|is} what he said.

newline
I am the {very last|last} sentence for this {instance|example}.

‘sample.txt’的内容:

{Hello there|Hello|Howdy} Dr. Munchauson you {gentleman|fine fellow}! What {will|shall|should} we {eat|have} for lunch? Peas by the {thousand|hundred|1000} said Dr. Munchauson; {that|is} what he said.

I am the {very last|last} sentence for this {instance|example}.

我现在已经玩了几个小时的正则表达式,我似乎无法破解它.因为它的正则表达在午餐结束时不匹配?因此这两句话我们午餐时会吃什么{会|应该}?蚕豆通过{千| 1000}表示,Munchauson博士; {那是}他说的话.不分开;这就是我想要的.

正则表达式的一些重要细节:

>每个句子总是以句号,感叹号或问号结束
>每个句子总是包含至少一对大括号“{}”,其中包含一些单词.此外,不会产生误导性的“.”在每个句子的最后一个括号之后.因此,博士将始终位于每个句子中最后一对花括号之前.这就是为什么我试图使用’}’来建立我的正则表达式.这样我可以避免使用异常方法,为Dr.,Jr.,about等语法创建例外.等等.对于我运行此代码的每个文件,我个人确保在任何句子中的最后一个’}’之后没有“误导期”.

我想要的输出是这样的:

{Hello there|Hello|Howdy} Dr. Munchauson you {gentleman|fine fellow}! 
What {will|shall|should} we {eat|have} for lunch?
Peas by the {thousand|hundred|1000} said Dr. Munchauson; {that|is} what he said.

newline
I am the {very last|last} sentence for this {instance|example}.
最佳答案
我得到的最直观的解决方案就是这个.从本质上讲,你需要将Dr.和Mr. notkens视为原子本身.

patt = r'(?:Dr.|Mr.|.)*?[.!?]s?n?'

细分,它说:

Find me the least number of Mr.s,Dr.s or any character up to a puncuation mark followed by a zero or one spaces which is followed by zero or one new lines.

在此sample.txt上使用时(我添??加了一行):

{Hello there|Hello|Howdy} Dr. Munchauson you {gentleman|fine fellow}! What {will|shall|should} we {eat|have} for lunch? Peas by the {thousand|hundred|1000} said Dr. Munchauson; {that|is} what he said.

But there are no {misters|doctors} here good sir! Help us if there is an emergency.

I am the {very last|last} sentence for this {instance|example}.

它给:

{Hello there|Hello|Howdy} Dr. Munchauson you {gentleman|fine fellow}!
What {will|shall|should} we {eat|have} for lunch?
Peas by the {thousand|hundred|1000} said Dr. Munchauson; {that|is} what he said.

newline
But there are no {misters|doctors} here good sir!
Help us if there is an emergency.

newline
I am the {very last|last} sentence for this {instance|example}.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读