【perl】使用LWP获取带cookie验证的HTTPS网页
发布时间:2020-12-15 21:05:35 所属栏目:大数据 来源:网络整理
导读:最近想到公司网站下一些文档,结果网站上做了下载总大小限制,没办法只好写个脚本来获
最近想到公司网站下一些文档,结果网站上做了下载总大小限制,没办法只好写个脚本来获取。??????
??????? 运行环境1:Windows XP???? StrawberryPerl 5.10 (
WIKI:http://win32.perl.org/wiki/index.php?title=Strawberry_Perl)
??????? 运行环境2:Linux? perl 5.8.8
??????? 抓包工具:
???????
HttpWatch Professional v6.0.14????? IE插件
?????? Firefox上的Firebug和Live HTTP Headers插件,Chrome Web上的Developer Tools,我都试过,还是觉得httpwatch抓包的结果最满意。
?????? 我之前登录网页,都是直接将要访问的连接贴到浏览器,然后浏览器会弹出对话框,要求输入帐号密码。
?????? 这里,我需要抓一下包,之前我一直使用Firebug抓包,结果抓的都是GET消息,最后获取到状态为200的页面,以下就是网页内容?
<
HTML
>
< ! -- File: llgettz.html -- > < SCRIPT Language ="Javascript1.2" > function getTime() { var????llglogin_CurrentClientTime = new Date() var????llglogin_year = ( llglogin_CurrentClientTime.getFullYear == null ) ? llglogin_CurrentClientTime.getYear() : llglogin_CurrentClientTime.getFullYear() var????llglogin_month = llglogin_CurrentClientTime.getMonth() + 1 var????llglogin_date = llglogin_CurrentClientTime.getDate() var????llglogin_hour = llglogin_CurrentClientTime.getHours() var????llglogin_minute = llglogin_CurrentClientTime.getMinutes() var????llglogin_second = llglogin_CurrentClientTime.getSeconds() document.LoginForm.CurrentClientTime.value????= 'D/' + llglogin_year + '/' + llglogin_month + '/' + llglogin_date document.LoginForm.CurrentClientTime.value += ':' + llglogin_hour + ':' + llglogin_minute + ':' + llglogin_second document.LoginForm.submit() } </SCRIPT> < BODY BGCOLOR ="#FFFFFF" BACKGROUND ="/img/pattern.gif" onLoad ="getTime()" > < FORM NAME ="LoginForm" METHOD ="POST" ACTION ="/livelink/livelink.exe" > < INPUT TYPE ="HIDDEN" NAME ="func" VALUE ="ll.login" > < INPUT TYPE ="HIDDEN" NAME ="CurrentClientTime" VALUE="" > < INPUT TYPE ="HIDDEN" NAME ="NextURL" VALUE ="/livelink/livelink.exe?Redirect=1" > </FORM> Your browser should have redirected you to the next Livelink page.????Please click on the link below to continue. < BR > < BR > < A HREF ="/livelink/livelink.exe?Redirect=1" >/livelink/livelink.exe?Redirect=1 </A> </BODY> </HTML> < ! -- End File: llgettz.html -- >
?????? 事实上,从抓到包的结果中看,这里网站确实没有返回cookie。 cookie是在POST到login页面后,才返回的。 所以为什么windows下可以获取到cookie,我还没弄懂。 ???? (我去CU,CSDN上都发帖问过,都没人回答,所以如果有人知道,希望不吝赐教。)
?????? 用HttpWatch来抓包,结果这里面就能看到POST的动作,但是Firebug 和 Live HTTP Headers都没抓到这个动作。
?????? 这时,浏览器执行了返回页面中的一段javascript。
<
FORM
NAME
="LoginForm"
METHOD
="POST"
ACTION
="/livelink/livelink.exe"
>
< INPUT TYPE ="HIDDEN" NAME ="func" VALUE ="ll.login" > < INPUT TYPE ="HIDDEN" NAME ="CurrentClientTime" VALUE="" > < INPUT TYPE ="HIDDEN" NAME ="NextURL" VALUE ="/livelink/livelink.exe?Redirect=1" > </FORM>
??????
??????
??????? 以后需要登录网页后获取网站信息之类的,就只用照葫芦画瓢。
??????? 还有一点要注意的,如果在访问HTTPS的时候,返回500错误,需要new UserAgent的时候加上ssl_opts => { verify_hostname => 0 }
#!/usr/bin/perl
package Livelink; use LWP::UserAgent; use MIME::Base64 qw(encode_base64); use HTTP::Cookies; my $Basic_Url = "https://wcdma-ll.app.alcatel-lucent.com/livelink/livelink.exe"; sub new { ????????my $invocant = shift; ????????my $class = ref $invocant || $invocant; ????????my $self = { ??login => shift, ??password => shift ????????}; ????????bless $self,$class; ????????$self->init(); ????????return $self; } sub init { ????????my $self = shift; ????????$self->{browser} = LWP::UserAgent->new(ssl_opts => {verify_hostname => 0}); ????????#设置http代理 ????????$self->{browser}->proxy('http','http://135.251.33.31:80'); ????????#将用户名密码进行base64编码 ????????die "Error: Login NONE.n" if (! $self->{login}); ????????die "Error: Password NONE.n" if (! $self->{password}); ????????$self->{encode_login} = encode_base64($self->{login} . ":" . $self->{password}); ????????#模拟浏览器的header,我访问的网页比较特殊,请求头中带有Authorization字段来进行鉴权。 ????????#也可以通过credentials方法来登录 ????????my @headers = ('User-Agent' => 'Mozilla/5.0', ??Authorization => "Basic $self->{encode_login}", ????????); ????????#这里是LWP::UserAgent对象调用default_header函数,它具有全局性 ????????#即后面对网页请求时,都会带有@headers的信息 ????????#在后面的HTTP::Reques对象也有个header函数,使用方法跟default_header一样 ????????#但是它的header信息只在当前的请求中生效。 ????????$self->{browser}->default_header(@headers); ????????$self->{cookie_jar} = HTTP::Cookies->new; ????????#设置cookie,正如前面所描述的,LWP::UserAgent对象调用的方法,在后面每次网页请求中都生效 ????????$self->{browser}->cookie_jar( $self->{cookie_jar} ); ????????#根据上面讲到的那段JAVASCRIPT,来构造post消息中的content字段。 ????????my $content = { ??#???????????? NextURL=>'/livelink/livelink.exe?Redirect=1', ??func=>'ll.login', ????????}; ????????#这里用$self->{browser}->post比较简单 ????????#如果用$req = HTTP::Request->new(POST => $Basic_Url)来POST的话 ????????#在构造$req->content之前,还需指明$req->content_type ????????#my $resp = $self->{browser}->post( $Basic_Url,Content => $content ); ????????my $req = HTTP::Request->new(POST => $Basic_Url); ????????$req->content_type('application/x-www-form-urlencoded'); ????????$req->content('func=ll.login'); ????????my $resp = $self->{browser}->request($req); ????????if ($resp->is_success) ????????{ ??print "Login Success.n"; ????????} ????????else ????????{???? ??print "Login Error.n" . $resp->status_line . "n"; ??exit 1; ????????} } sub get_web_content{???? ????????my $self = shift; ????????my ($url) = @_; ????????print "tNow,processing $urln"; ????????my $get_cookie = 1; ????????while ($get_cookie) ????????{ ??#调用HTTP::Request对象,对网页进行GET请求 ??#效果和$self->{browser}->get($url)一样,这里是为了后面打印$request->headers_as_string ??my $request = HTTP::Request->new('GET',$url); ??#default_header设置好后,这里就不需要重复调用了 ??#$request->header(Authorization => "Basic $self->{encode_login}"); ??my $response = $self->{browser}->request($request); ??$self->{cookie_jar}->extract_cookies($response); ??print "=== Cookies:n",$self->{cookie_jar}->as_string,"n"; ??if ($response->is_success) ??{ ???????? #打印响应码,请求头,响应头和响应内容???? ???????? print "Login Success.n** " . $response->status_line . " **n"; ???????? print "=== Request header: n",$request->headers_as_string,"n"; ???????? print "=== Response header: n",$response->headers_as_string,"n"; ???????? return $response->content; ???????? #访问成功后,即200 OK,就能获取到cookie值 $get_cookie = 0; ??} ??else???? ??{???? ???????? print "Login Error.n** " . $response->status_line . " **n"; ???????? print $response->content . "n"; ???????? #如果发生网页重定向,将重组URL后继续访问???? ???????? if ($response->status_line =~ /302/)???? ???????? { ????my ($redirect_url) = ($response->content =~ m/<[aA] (?:[hH]ref|HREF)="(.*?)"/g); ????print "n=== redirect url is $redirect_urln"; ????if ($redirect_url=~ /http/) ????{???? ????????????$url = $redirect_url; ????????????next; ????} ????else ????{ ????????????#有的重定向链接只有后面一部分,需要调用URI->new_abs来将$response->base链接起来???? ????????????$url = URI->new_abs($redirect_url,$response->base); ????????????next; ????} ???????? } ???????? exit 1; ??} ????????} } sub get_file { ????????my $self = shift; ????????my $object_id = shift; ????????unless($object_id)???? ????????{ ??die "$object_id is invalid id!"; ????????} ????????my $ll_url = "$Basic_Url/open/$object_id"; ????????return $self->get_web_content($ll_url); } package main; my $ll = new Livelink('username','password'); my $content = $ll->get_file('50967038'); open my $fh,'>',"myfile.html" or die "Can't open file,$!n"; print $fh $content; close $fh; (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |