将自定义参数传递给scrapy请求

发布时间：2020-12-20 13:39:41 所属栏目：Python 来源：网络整理

导读：我想在我的请求中设置一个自定义参数,以便在我在parse_item中处理它时可以检索它.这是我的代码： def start_requests(self): yield Request("site_url",meta={'test_meta_key': 'test_meta_value'})def parse_item(self,response): print response.meta 将根

我想在我的请求中设置一个自定义参数,以便在我在parse_item中处理它时可以检索它.这是我的代码：

def start_requests(self):
    yield Request("site_url",meta={'test_meta_key': 'test_meta_value'})

def parse_item(self,response):
    print response.meta

将根据以下规则调用parse_item：

self.rules = (
        Rule(SgmlLinkExtractor(deny=tuple(self.deny_keywords),allow=tuple(self.client_keywords)),callback='parse_item'),Rule(SgmlLinkExtractor(deny=tuple(self.deny_keywords),allow=('',))),)

根据scrapy doc：

the Response.meta attribute is propagated along redirects and retries,so you will get the original Request.meta sent from your spider.

但我没有在parse_item中看到自定义元.有没有什么办法解决这一问题？ meta是正确的方式吗？

解决方法

生成新的Request时,需要指定回调函数,否则它将作为默认值传递给CrawlSpider的parse方法.

我遇到了similar problem,我花了一段时间来调试.

callback (callable) – the function that will be called with the response of this request (once its downloaded) as its first parameter. For more information see Passing additional data to callback functions below. If a Request doesn’t specify a callback,the spider’s parse() method will be used. Note that if exceptions are raised during processing,errback is called instead.

method (string) – the HTTP method of this request. Defaults to ‘GET’.

meta (dict) – the initial values for the Request.meta attribute. If given,the dict passed in this parameter will be shallow copied.

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!