加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

Perl Treebuilder HTML解析,似乎无法解析为DIV,得到错误“在模式

发布时间:2020-12-16 06:16:21 所属栏目:大数据 来源:网络整理
导读:我是新手使用Perl treebuilder模块进行 HTML解析,无法弄清楚这个问题是什么..我花了几个小时试图让它工作并看了几个教程但我仍然得到这个错误:“在模式匹配中使用未初始化的值”,在我的代码中引用此行: sub{ $_[0]- tag() eq 'div' and ($_[0]-attr('class
我是新手使用Perl treebuilder模块进行 HTML解析,无法弄清楚这个问题是什么..我花了几个小时试图让它工作并看了几个教程但我仍然得到这个错误:“在模式匹配中使用未初始化的值”,在我的代码中引用此行:

sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
        );

这个错误在终端打印出很多次,我已经反复检查了一切,它肯定得到输入,因为$下载页面是一个完整的HTML文件,其中包含我在下面给出的字符串…任何建议都非常感谢.

示例字符串,包含在$downloadedpage变量中

<div class='snap_preview'><p><a href="http://recipe4all.com/dishes/mexican/"><img src="http://www.dishbase.com/recipe_images/large/chicken-enchiladas-12005010871.jpg" width="160" height="115" align="left" border="0" alt="Mexican dishes recipes" style="border:none;"></a><a href="http://recipe4all.com/dishes/mexican/"><b>Mexican dishes recipes</b></a> <i></i><br />
Mexican cuisine is popular the world over for its intense flavor and colorful presentation. Traditional Mexican recipes such as tacos,quesadillas,enchiladas and barbacoa are consistently explored for options by some of the world&#8217;s foremost gourmet chefs. A celebration of spices and unique culinary trends,Mexican food is now dominating world cuisines.</p>
<div style="margin-top: 1em" class="possibly-related"><hr /><p><strong>Possibly related posts: (automatically generated)</strong></p><ul><li><a rel='related' href='http://vireja59.wordpress.com/2010/02/13/all-best-italian-dishes-recipes/' style='font-weight:bold'>All best Italian dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/05/24/liver-dishes-recipes/' style='font-weight:bold'>Liver dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/04/24/parsley-in-cooking/' style='font-weight:bold'>Parsley in cooking</a></li></ul></div>

我的代码:

my $tree = HTML::TreeBuilder->new();
    $tree->parse($downloadedpage);
    $tree->eof();

    #the article is in the div with class "snap_preview"
    @article = $tree->look_down(
    sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
    );

解决方法

使用您给出的确切代码和示例,

use warnings;
use strict;
use HTML::TreeBuilder;
my $downloadedpage=<<EOF;
<div class='snap_preview'><p><a href="http://recipe4all.com/dishes/mexican/"><img src="http://www.dishbase.com/recipe_images/large/chicken-enchiladas-12005010871.jpg" width="160" height="115" align="left" border="0" alt="Mexican dishes recipes" style="border:none;"></a><a href="http://recipe4all.com/dishes/mexican/"><b>Mexican dishes recipes</b></a> <i></i><br />
Mexican cuisine is popular the world over for its intense flavor and colorful presentation. Traditional Mexican recipes such as tacos,Mexican food is now dominating world cuisines.</p>
<div style="margin-top: 1em" class="possibly-related"><hr /><p><strong>Possibly related posts: (automatically generated)</strong></p><ul><li><a rel='related' href='http://vireja59.wordpress.com/2010/02/13/all-best-italian-dishes-recipes/' style='font-weight:bold'>All best Italian dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/05/24/liver-dishes-recipes/' style='font-weight:bold'>Liver dishes recipes</a></li><li><a rel='related' href='http://vireja59.wordpress.com/2010/04/24/parsley-in-cooking/' style='font-weight:bold'>Parsley in cooking</a></li></ul></div>
EOF

my $tree = HTML::TreeBuilder->new();
    $tree->parse($downloadedpage);
    $tree->eof();

    #the article is in the div with class "snap_preview"
    my @article = $tree->look_down(
    sub{ $_[0]-> tag() eq 'div' and ($_[0]->attr('class') =~ /snap_preview/)}
    );

我根本没有任何错误.我的第一个猜测是HTML中有一些< div> s没有class属性.

也许你需要写

sub{
     $_[0]-> tag() eq 'div' and 
     $_[0]->attr('class') and 
     ($_[0]->attr('class') =~ /snap_preview/)
}

那里?

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读