FormRequest和FormRequest.from_response的区别
发布时间:2020-12-15 07:29:45 所属栏目:Java 来源:网络整理
导读:scrapy.FormRequest 通过FormRequest函数实现向服务器发送post请求,请求参数需要配合网站设计发送特殊参数。 1 class FormrequestSpider(CrawlSpider): 2 name= ‘ github ‘ 3 allowed_domains=[ ‘ github.com ‘ ] 4 start_urls=[ ‘ https://github.com
scrapy.FormRequest
1 class FormrequestSpider(CrawlSpider): 2 name=‘github‘ 3 allowed_domains=[‘github.com‘] 4 start_urls=[‘https://github.com/login‘] 5 6 def parse(self,response): 7 authenticity_token=response.xpath("//input[@name=‘authenticity_token‘]/@value").extract_first() 8 utf8=response.xpath("//input[@name=‘utf8‘]/@value").extract_first() 9 commit=response.xpath("//input[@name=‘commit‘]/@value").extract_first() 10 post_data=dict( 11 login="***********",12 password="**********",13 authenticity_token=authenticity_token,14 utf8=utf8,15 commit=commit,16 ) 17 # 表单请求 18 yield scrapy.FormRequest( 19 "https://github.com/session",20 formdata=post_data,21 callback=self.after_login 22 ) 23 24 def after_login(self,response): 25 # with open("a.html","w",encoding="utf-8") as f: 26 # f.write(response.body.decode()) 27 print(re.findall("********",response.body.decode())) ? ? scrapy.FormRequest.from_response
1 class GithubSpider(CrawlSpider): 2 name=‘github2‘ 3 allowed_domains=[‘github.com‘] 4 start_urls=[‘https://github.com/login‘] 5 6 def parse(self,response): 7 yield scrapy.FormRequest.from_response( 8 response,# 自动的从response中寻找from表单 9 # formdata只需要传入字典型登录名和密码,字典的健是input标签中的name属性 10 formdata={"login": "***********","password": "**********"},11 callback=self.after_login 12 ) 13 14 def after_login(self,response): 15 print(response.text) ? 参考阅读: https://www.cnblogs.com/ywjfx/p/11089248.html (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |