加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

perl – 如何在文本中提取所有引用?

发布时间:2020-12-15 23:35:04 所属栏目:大数据 来源:网络整理
导读:我正在寻找一个输出文本中所有引用的SimpleGrepSedPerlOr PythonOneLiner. 例1: echo “HAL,” noted Frank,“said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner 标准输出: "HAL,""said that everything was going e
我正在寻找一个输出文本中所有引用的SimpleGrepSedPerlOr PythonOneLiner.

例1:

echo “HAL,” noted Frank,“said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner

标准输出:

"HAL,"
"said that everything was going extremely well.”

例2:

cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner

标准输出:

"EULA"
"Software"
"Workstation Computer"
"Device"
"DRM"

等等

(link to the corresponding text).

解决方法

我喜欢这个:

perl -ne 'print "$_n" foreach /"((?>[^"]|+[^"]|(?:\)*")*)"/g;'

它有点冗长,但它比最简单的实现更好地处理转义引用和回溯.它的意思是:

my $re = qr{
   "               # Begin it with literal quote
   ( 
     (?>           # prevent backtracking once the alternation has been
                   # satisfied. It either agrees or it does not. This expression
                   # only needs one direction,or we fail out of the branch

         [^"]    # a character that is not a dquote or a backslash
     |   +       # OR if a backslash,then any number of backslashes followed by 
         [^"]      # something that is not a quote
     |           # OR again a backslash
         (?>\)* # followed by any number of *pairs* of backslashes (as units)
         "         # and a quote
     )*            # any number of *set* qualifying phrases
  )                # all batched up together
  "                # Ended by a literal quote
}x;

如果你不需要那么大的力量 – 说它只是可能是对话而不是结构化的引用,那么

/"([^"]*)"/

可能与其他任何东西一样有效.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读