perl – 如何在文本中提取所有引用?
发布时间:2020-12-15 23:35:04 所属栏目:大数据 来源:网络整理
导读:我正在寻找一个输出文本中所有引用的SimpleGrepSedPerlOr PythonOneLiner. 例1: echo “HAL,” noted Frank,“said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner 标准输出: "HAL,""said that everything was going e
我正在寻找一个输出文本中所有引用的SimpleGrepSedPerlOr
PythonOneLiner.
例1: echo “HAL,” noted Frank,“said that everything was going extremely well.” | SimpleGrepSedPerlOrPythonOneLiner 标准输出: "HAL," "said that everything was going extremely well.” 例2: cat MicrosoftWindowsXPEula.txt | SimpleGrepSedPerlOrPythonOneLiner 标准输出: "EULA" "Software" "Workstation Computer" "Device" "DRM" 等等 (link to the corresponding text). 解决方法
我喜欢这个:
perl -ne 'print "$_n" foreach /"((?>[^"]|+[^"]|(?:\)*")*)"/g;' 它有点冗长,但它比最简单的实现更好地处理转义引用和回溯.它的意思是: my $re = qr{ " # Begin it with literal quote ( (?> # prevent backtracking once the alternation has been # satisfied. It either agrees or it does not. This expression # only needs one direction,or we fail out of the branch [^"] # a character that is not a dquote or a backslash | + # OR if a backslash,then any number of backslashes followed by [^"] # something that is not a quote | # OR again a backslash (?>\)* # followed by any number of *pairs* of backslashes (as units) " # and a quote )* # any number of *set* qualifying phrases ) # all batched up together " # Ended by a literal quote }x; 如果你不需要那么大的力量 – 说它只是可能是对话而不是结构化的引用,那么 /"([^"]*)"/ 可能与其他任何东西一样有效. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |