加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

Perl抓取网页信息

发布时间:2020-12-16 00:04:29 所属栏目:大数据 来源:网络整理
导读:demo: #!/usr/bin/perl -w# Perl pragma to restrict unsafe constructsuse strict;# use LWP::UserAgent modeluse LWP::UserAgent;# main functionsub main { # get params # @_ # Within a subroutine the array @_ contains the parameters passed to th

demo:

#!/usr/bin/perl -w
# Perl pragma to restrict unsafe constructs
use strict;
# use LWP::UserAgent model
use LWP::UserAgent;

# main function
sub main {
    # get params
    # @_  
    # Within a subroutine the array @_ contains the parameters passed to that subroutine. 
    # Inside a subroutine,@_ is the default array for the array operators push,pop,shift,and unshift.
    my $url = 'http://www.taobao.com';
    die "no url param!n" unless $url;

    # create LWP::UserAgent object
    my $ua = LWP::UserAgent->new;
    # set connect timeout 
    $ua->timeout(20);
    # set User-Agent header
    $ua->agent("Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; SV1; .NET CLR 2.0.50727)");
    # send url use get mothed,and store response at var $resp
    my $resp = $ua->get($url);

    # check response
    if ($resp->is_success) {
        # get response content(html source code)
        my $content = $resp->decoded_content;
        # use Regex get page title from $content
        if ( $content =~ m{<title>(.*)</title>}si ) {
            # <title>(.+?)</title> (.+?) match title string,use () to store this str at a special variable $1 (this is a perl variable ),# The bracketing construct ( ... ) creates capture groups (also referred to as capture buffers). To refer to the current contents of a group later on,within the same pattern,use $1 for the first,$2 for the second,and so on.
            my $head = $1;
            print "find page title : $headn";
        } else {
            print "no page title for url : $urln";
        }
    } else {
		#display status information and exit
        die $resp->status_line;
    }
}

# pass params to main function,# @ARGV
# The array @ARGV contains the command-line arguments intended for the script.

main(@ARGV);

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读