加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

regex – 用于识别文本引用的正则表达式

发布时间:2020-12-14 06:26:04 所属栏目:百科 来源:网络整理
导读:我正在尝试创建一个正则表达式来捕获文本引用. 以下是文本引用的几个例句: … and the reported results in (Nivre et al.,2007) were not representative … … two systems used a Markov chain approach (Sagae and Tsujii 2007) . Nivre (2007) showed
我正在尝试创建一个正则表达式来捕获文本引用.

以下是文本引用的几个例句:

  1. … and the reported results in (Nivre et al.,2007) were not representative …

  2. … two systems used a Markov chain approach (Sagae and Tsujii 2007).

  3. Nivre (2007) showed that …

  4. … for attaching and labeling dependencies (Chen et al.,2007; Dredze et al.,2007).

目前,我的正则表达式是

(D*dddd)

哪个匹配示例1-3,但不匹配示例4.如何修改此示例以捕获示例4?

谢谢!

我最近为此目的使用了这样的东西:
#!/usr/bin/env perl

use 5.010;
use utf8;
use strict;
use autodie;
use warnings qw< FATAL all >;
use open qw< :std IO :utf8 >;

my $citation_rx = qr{
    ( (?:
        s*

        # optional author list
        (?: 
            # has to start capitalized
            p{Uppercase_Letter}        

            # then have a lower case letter,or maybe an apostrophe
            (?=  [p{Lowercase_Letter}p{Quotation_Mark}] )

            # before a run of letters and admissible punctuation
            [p{Alphabetic}p{Dash_Punctuation}p{Quotation_Mark}s,.] +

        ) ?  # hook if and only if you want the authors to be optional!!

        # a reasonable year
        b (18|19|20) dd 

        # citation series suffix,up to a six-parter
        [a-f] ?         b                 

        # trailing semicolon to separate multiple citations
        ; ?  
        s*
    ) +
    )
}x;

while (<DATA>) {
    while (/$citation_rx/gp) {
        say ${^MATCH};
    } 
} 

__END__
... and the reported results in (Nivré et al.,2007) were not representative ...
... two systems used a Markov chain approach (Sagae and Tsujii 2007).
Nivre (2007) showed that ...
... for attaching and labelling dependencies (Chen et al.,2007; Dre?e et al.,2007).

运行时,它会产生:

(Nivré et al.,2007)
(Sagae and Tsujii 2007)
(2007)
(Chen et al.,2007)

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读