perl – 如何在文本中提取所有引用?
发布时间:2020-12-15 23:35:04 所属栏目:大数据 来源:网络整理
导读:我正在寻找一个输出文本中所有引用的SimpleGrepSedPerlOr PythonOneLiner. 例1: echo “HAL,” noted Frank,“said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner 标准输出: "HAL,""said that everything was going e
|
我正在寻找一个输出文本中所有引用的SimpleGrepSedPerlOr
PythonOneLiner.
例1: echo “HAL,” noted Frank,“said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner 标准输出: "HAL," "said that everything was going extremely well.” 例2: cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner 标准输出: "EULA" "Software" "Workstation Computer" "Device" "DRM" 等等 (link to the corresponding text). 解决方法
我喜欢这个:
perl -ne 'print "$_n" foreach /"((?>[^"]|+[^"]|(?:\)*")*)"/g;' 它有点冗长,但它比最简单的实现更好地处理转义引用和回溯.它的意思是: my $re = qr{
" # Begin it with literal quote
(
(?> # prevent backtracking once the alternation has been
# satisfied. It either agrees or it does not. This expression
# only needs one direction,or we fail out of the branch
[^"] # a character that is not a dquote or a backslash
| + # OR if a backslash,then any number of backslashes followed by
[^"] # something that is not a quote
| # OR again a backslash
(?>\)* # followed by any number of *pairs* of backslashes (as units)
" # and a quote
)* # any number of *set* qualifying phrases
) # all batched up together
" # Ended by a literal quote
}x;
如果你不需要那么大的力量 – 说它只是可能是对话而不是结构化的引用,那么 /"([^"]*)"/ 可能与其他任何东西一样有效. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
