【菜鸟学Python】使用Scrapy框架爬取糗事百科
发布时间:2020-12-20 10:17:21 所属栏目:Python 来源:网络整理
导读:第一步: 创建项目 scrapy stratproject [name] 如 scrapy startproject choushibaike 第二步: 进入到项目的文件夹目录创建APP scrapy gensider baike lovehhy.net 第三步: 配置baike.py文件 # -*- coding: utf-8 -*- import scrapy from ..items import C
第一步:创建项目 scrapy stratproject [name] 如 scrapy startproject choushibaike 第二步:进入到项目的文件夹目录创建APP scrapy gensider baike lovehhy.net 第三步:配置baike.py文件 # -*- coding: utf-8 -*- 第四步:配置items.py文件 import scrapy class ChoushibaikeItem(scrapy.Item): 第五步:配置pipelines.py文件 import pymongo class MongoPipeline(object): def __init__(self,mongo_uri,mongo_db): self.mongo_uri = mongo_uri self.mongo_db = mongo_db @classmethod def from_crawler(cls,crawler): return cls( mongo_uri=crawler.settings.get(‘MONGO_URI‘),mongo_db=crawler.settings.get(‘MONGO_DB‘) ) def open_spider(self,spider): self.client = pymongo.MongoClient(self.mongo_uri) self.db = self.client[self.mongo_db] def process_item(self,item,spider): name = item.__class__.__name__ self.db[name].insert(dict(item)) return item def close_spider(self,spider): self.client.close() 第六步:配置settings.py文件 # -*- coding: utf-8 -*- # Scrapy settings for choushibaike project # # For simplicity,this file contains only settings considered important or # commonly used. You can find more settings consulting the documentation: # # https://docs.scrapy.org/en/latest/topics/settings.html # https://docs.scrapy.org/en/latest/topics/downloader-middleware.html # https://docs.scrapy.org/en/latest/topics/spider-middleware.html BOT_NAME = ‘choushibaike‘ SPIDER_MODULES = [‘choushibaike.spiders‘] NEWSPIDER_MODULE = ‘choushibaike.spiders‘ # Crawl responsibly by identifying yourself (and your website) on the user-agent USER_AGENT = ‘Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/77.0.3865.90 Safari/537.36‘ # Obey robots.txt rules ROBOTSTXT_OBEY = False ITEM_PIPELINES = { # ‘choushibaike.pipelines.ChoushibaikePipeline‘: 300, ‘choushibaike.pipelines.MongoPipeline‘: 400,} MONGO_URI = ‘mongodb://admin:[email?protected]/‘ MONGO_DB = ‘choushibaike‘ ? 第七步:运行项目 scrapy crawl baike (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |