加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

Python – 在django随机地与芹菜一起崩溃

发布时间:2020-12-20 13:07:58 所属栏目:Python 来源:网络整理
导读:我在Ubuntu服务器上运行Django中的Scrapy项目. 问题是,即使只有一只蜘蛛在运行,Scrapy会随机崩溃. 下面是TraceBack的片段.作为一个没有专家,我用Google搜索 _SIGCHLDWaker Scrappy 但无法理解下面的片段找到的解决方案: --- exception caught here --- File
我在Ubuntu服务器上运行Django中的Scrapy项目.
问题是,即使只有一只蜘蛛在运行,Scrapy会随机崩溃.

下面是TraceBack的片段.作为一个没有专家,我用Google搜索

_SIGCHLDWaker Scrappy

但无法理解下面的片段找到的解决方案:

--- <exception caught here> ---
  File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/internet/posixbase.py",line 602,in _doReadOrWrite
    why = selectable.doWrite()
exceptions.AttributeError: '_SIGCHLDWaker' object has no attribute 'doWrite'

我不熟悉扭曲,尽管我试图理解它,但对我来说似乎非常不友好.

以下是完整的追溯:

2015-10-10 14:17:13,652: INFO/Worker-4] Enabled downloader middlewares: HttpAuthMiddleware,DownloadTimeoutMiddleware,RandomUserAgentMiddleware,ProxyMiddleware,RetryMiddleware,DefaultHeadersMiddleware,MetaRefreshMiddleware,HttpCompressionMiddleware,RedirectMiddleware,CookiesMiddleware,ChunkedTransferMiddleware,DownloaderStats
[2015-10-10 14:17:13,655: INFO/Worker-4] Enabled spider middlewares: HttpErrorMiddleware,OffsiteMiddleware,RefererMiddleware,UrlLengthMiddleware,DepthMiddleware
[2015-10-10 14:17:13,656: INFO/Worker-4] Enabled item pipelines: MadePipeline
[2015-10-10 14:17:13,656: INFO/Worker-4] Spider opened
[2015-10-10 14:17:13,657: INFO/Worker-4] Crawled 0 pages (at 0 pages/min),scraped 0 items (at 0 items/min)
Unhandled Error
Traceback (most recent call last):
  File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/python/log.py",line 101,in callWithLogger
    return callWithContext({"system": lp},func,*args,**kw)
  File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/python/log.py",line 84,in callWithContext
    return context.call({ILogContext: newCtx},**kw)
  File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/python/context.py",line 118,in callWithContext
    return self.currentContext().callWithContext(ctx,line 81,in callWithContext
    return func(*args,**kw)
--- <exception caught here> ---
  File "/home/b2b/virtualenvs/venv/local/lib/python2.7/site-packages/twisted/internet/posixbase.py",in _doReadOrWrite
    why = selectable.doWrite()
exceptions.AttributeError: '_SIGCHLDWaker' object has no attribute 'doWrite'

以下是我根据scrapy文档实现任务的方法

from scrapy.crawler import CrawlerProcess,CrawlerRunner
from twisted.internet import reactor
from scrapy.utils.project import get_project_settings
@shared_task
def run_spider(**kwargs):
    task_id = run_spider.request.id
    status = AsyncResult(str(task_id)).status
    source = kwargs.get("source")

    pro,created = Project.objects.get_or_create(name="b2b")
    query,_ = SearchTerm.objects.get_or_create(term=kwargs['query'])
    src,_ = Source.objects.get_or_create(term=query,engine=kwargs['source'])

    b,_ = Bot.objects.get_or_create(project=pro,query=src,spiderid=str(task_id),status=status,start_time=timezone.now())

    process = CrawlerRunner(get_project_settings())

    if source == "amazon":
        d = process.crawl(ComberSpider,query=kwargs['query'],job_id=task_id)
        d.addBoth(lambda _: reactor.stop())
    else:
        d = process.crawl(MadeSpider,job_id=task_id)
        d.addBoth(lambda _: reactor.stop())
    reactor.run()

我也试过像这样的tutorial,但它导致了一个不同的问题,我无法追溯

为了完整性,这里是我的蜘蛛的片段

class ComberSpider(CrawlSpider):

    name = "amazon"
    allowed_domains = ["amazon.com"]
    rules = (Rule(LinkExtractor(allow=r'corporations/.+/-*50/[0-9]+.html',restrict_xpaths="//a[@class='next']"),callback="parse_items",follow=True),)

    def __init__(self,**kwargs):
        super(ComberSpider,self).__init__(*args,**kwargs)
        self.query = kwargs.get('query')
        self.job_id = kwargs.get('job_id')
        SignalManager(dispatcher.Any).connect(self.closed_handler,signal=signals.spider_closed)
        self.start_urls = (
            "http://www.amazon.com/corporations/%s/------------"
            "--------50/1.html" % self.query.strip().replace(" ","_").lower(),)

解决方法

这是一个已知的Scrapy问题.有关详细信息和可能的解决方法,请参阅 issue report thread.

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读