perl – 用``替换任何标记内容中的所有空格
发布时间:2020-12-15 21:45:54 所属栏目:大数据 来源:网络整理
导读:任务 用 nbsp;替换任何标签内容中的所有空格. y.html(示例文件) p class=MsoNormal style='margin-top:1.0pt;margin-right:0cm;margin-bottom:1.0pt;margin-left:34.0pt;text-indent:-19.8pt'span lang=NL-BE style='font-size:10.0pt;font-family:Symbol;co
任务
用& nbsp;替换任何标签内容中的所有空格. y.html(示例文件) <p class=MsoNormal style='margin-top:1.0pt;margin-right:0cm;margin-bottom:1.0pt; margin-left:34.0pt;text-indent:-19.8pt'><span lang=NL-BE style='font-size:10.0pt; font-family:Symbol;color:black;mso-ansi-language:NL-BE'>·</span><span class=GramE><span style='font-size:7.0pt;color:black'> </span><span style='font-size:10.0pt;font-family:Arial;color:black'>Kit</span></span><span style='font-size:10.0pt;font-family:Arial;color:black'> </span><span class=SpellE><i><span style='font-size:10.0pt;font-family:Arial'>Strongyloides</span></i></span><i><span style='font-size:10.0pt;font-family:Arial'> <span class=SpellE>ratti</span></span></i><span style='font-size:10.0pt;font-family:Arial'> (nr. 9450) van <span class=SpellE>Bordier</span> Affinity Products. </span><span lang=NL-BE style='font-size:10.0pt;font-family: Arial;mso-ansi-language:NL-BE'>Zie bijsluiter in bijlage: CLKB_B_0306. Te bewaren bij 2 – 8 °C tot vervaldatum.</span><span lang=NL-BE style='mso-ansi-language: NL-BE'><o:p></o:p></span></p> 我尝试了什么 #!/usr/bin/perl use strict; use warnings; use Mojo::DOM; open (my $fh,"<","y.html") or die $!; my $dom = Mojo::DOM->new(do{local $/ = undef; <$fh>}); $dom->find("*")->each( sub { $_->content( $_->content =~ s/s/&;nbsp;/gr ) } ); print $dom; 上面脚本的结果 <p class="MsoNormal" style="margin-top:1.0pt;margin-right:0cm;margin-bottom:1.0pt; margin-left:34.0pt;text-indent:-19.8pt"><span lang="nl-be" style="font-size:10.0pt; font-family:symbol;color:black;mso-ansi-language:nl-be">·<span class="grame"><span style="font-s ize:7.0pt;color:black"> <span style="font-size:10.0pt;font-family:arial;color:black">Kit<span style="font-size:10.0pt;font-family:arial;color:black"> <span class="spelle"><i><span&nb sp;style="font-size:10.0pt;font-family:arial">Strongyloides<i><span style="font-size:10.0pt;font-family:arial"> <span class="spelle">ratti<span style="font-size:10.0pt;font-family:arial"> (n r. 9450) van <span class="spelle">Bordier Affinity Products. <span lang="nl-be" style="font-size:10.0pt;font-family: arial;mso-ansi-language:nl-be">Zie bijsluiter in bijlage: CLKB_B_030 6. Te bewaren bij 2 – 8 °C tot vervaldatum.<span lang="nl-be" style="mso-ansi-language: nl-be"><o:p></o:p></span lang="nl-be" style="mso-ansi-language: nl-be"></span lang ="nl-be" style="font-size:10.0pt;font-family: arial;mso-ansi-language:nl-be"></span class="spelle"></span style="font-size:10.0pt;font-family:arial"></span class="spelle"></span&nb sp;style="font-size:10.0pt;font-family:arial"></i></span style="font-size:10.0pt;font-family:arial"></i></span class="spelle"></span style="font-size:10.0pt;font-family:arial;color:black"></ span style="font-size:10.0pt;font-family:arial;color:black"></span style="font-size:7.0pt;color:black"></span class="grame"></span lang="nl-be" style="font-size:10.0pt; font-f amily:symbol;color:black;mso-ansi-language:nl-be"></p> 我没有得到所需的输出,它正在添加& nbsp;在标签中也是(例如:< / span& nbsp;),我希望仅在内容上完成. PS:我尝试使用Mojo :: DOM,但没有必要使用它,你可以尝试任何其他解析器,如果你想,我仍然想知道我的代码有什么问题? 解决方法
这是一个标记输入的工作,使其更容易使用.因此,我建议使用
HTML::TokeParser
#!/usr/bin/perl use strict; use warnings; use utf8; use HTML::TokeParser; my $data = do {local $/; <DATA>}; my $p = HTML::TokeParser->new($data); while (my $token = $p->get_token) { if ($token->[0] eq 'T') { my $text = $token->[1]; $text =~ s/ / /g; print $text; } else { print "$token->[-1]"; } } __DATA__ <html> <body> <p class=MsoNormal style='margin-top:1.0pt;margin-right:0cm;margin-bottom:1.0pt; margin-left:34.0pt;text-indent:-19.8pt'><span lang=NL-BE style='font-size:10.0pt; font-family:Symbol;color:black;mso-ansi-language:NL-BE'>·</span><span class=GramE><span style='font-size:7.0pt;color:black'> </span><span style='font-size:10.0pt;font-family:Arial;color:black'>Kit</span></span><span style='font-size:10.0pt;font-family:Arial;color:black'> </span><span class=SpellE><i><span style='font-size:10.0pt;font-family:Arial'>Strongyloides</span></i></span><i><span style='font-size:10.0pt;font-family:Arial'> <span class=SpellE>ratti</span></span></i><span style='font-size:10.0pt;font-family:Arial'> (nr. 9450) van <span class=SpellE>Bordier</span> Affinity Products. </span><span lang=NL-BE style='font-size:10.0pt;font-family: Arial;mso-ansi-language:NL-BE'>Zie bijsluiter in bijlage: CLKB_B_0306. Te bewaren bij 2 – 8 °C tot vervaldatum.</span><span lang=NL-BE style='mso-ansi-language: NL-BE'><o:p></o:p></span></p> </body> </html> 输出: <html> <body> <p class=MsoNormal style='margin-top:1.0pt;margin-right:0cm;margin-bottom:1.0pt; margin-left:34.0pt;text-indent:-19.8pt'><span lang=NL-BE style='font-size:10.0pt; font-family:Symbol;color:black;mso-ansi-language:NL-BE'>·</span><span class=GramE><span style='font-size:7.0pt;color:black'> </span><span style='font-size:10.0pt;font-family:Arial;color:black'>Kit</span></span><span style='font-size:10.0pt;font-family:Arial;color:black'> </span><span class=SpellE><i><span style='font-size:10.0pt;font-family:Arial'>Strongyloides</span></i></span><i><span style='font-size:10.0pt;font-family:Arial'> <span class=SpellE>ratti</span></span></i><span style='font-size:10.0pt;font-family:Arial'> (nr. 9450) van <span class=SpellE>Bordier</span> Affinity Products. </span><span lang=NL-BE style='font-size:10.0pt;font-family: Arial;mso-ansi-language:NL-BE'>Zie bijsluiter in bijlage: CLKB_B_0306. Te bewaren bij 2 – 8 °C tot vervaldatum.</span><span lang=NL-BE style='mso-ansi-language: NL-BE'><o:p></o:p></span></p> </body> </html> (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |