Python实现从脚本里运行scrapy的方法
发布时间:2020-12-16 19:58:44 所属栏目:Python 来源:网络整理
导读:本篇章节讲解Python实现从脚本里运行scrapy的方法。供大家参考研究。具体如下: 复制代码 代码如下: #!/usr/bin/python import os os.environ.setdefault('SCRAPY_SETTINGS_MODULE','project.settings') #Must be at the top before other imports f
本篇章节讲解Python实现从脚本里运行scrapy的方法。分享给大家供大家参考。具体如下: 复制代码 代码如下: #!/usr/bin/python
import os os.environ.setdefault('SCRAPY_SETTINGS_MODULE','project.settings') #Must be at the top before other imports from scrapy import log,signals,project from scrapy.xlib.pydispatch import dispatcher from scrapy.conf import settings from scrapy.crawler import CrawlerProcess from multiprocessing import Process,Queue class CrawlerScript(): def __init__(self): self.crawler = CrawlerProcess(settings) if not hasattr(project,'crawler'): self.crawler.install() self.crawler.configure() self.items = [] dispatcher.connect(self._item_passed,signals.item_passed) def _item_passed(self,item): self.items.append(item) def _crawl(self,queue,spider_name): spider = self.crawler.spiders.create(spider_name) if spider: self.crawler.queue.append_spider(spider) self.crawler.start() self.crawler.stop() queue.put(self.items) def crawl(self,spider): queue = Queue() p = Process(target=self._crawl,args=(queue,spider,)) p.start() p.join() return queue.get(True) # Usage if __name__ == "__main__": log.start() """ This example runs spider1 and then spider2 three times. """ items = list() crawler = CrawlerScript() items.append(crawler.crawl('spider1')) for i in range(3): items.append(crawler.crawl('spider2')) print items 希望本文所述对大家的Python程序设计有所帮助。 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |