使用scrapy抓取BLAH全部EPUB格至书籍
发布时间:2020-12-17 17:14:03 所属栏目:Python 来源:网络整理
导读:今天PHP站长网 52php.cn把收集自互联网的代码分享给大家,仅供参考。 # -*- coding:utf-8 -*-__author__ = 'Kiun'import scrapyfrom scrapy.selector import Selectorfrom scrapy.contrib.loader import ItemLoader,Ident
以下代码由PHP站长网 52php.cn收集自互联网 现在PHP站长网小编把它分享给大家,仅供参考 # -*- coding:utf-8 -*- __author__ = 'Kiun' import scrapy from scrapy.selector import Selector from scrapy.contrib.loader import ItemLoader,Identity from sys import argv import requests import sys reload(sys) sys.setdefaultencoding('utf8') from scrapy import log log.msg("This is a warning",level=log.WARNING) class NovelSpider(scrapy.Spider): name = "novel" allowed_domains = ["blah.me"] start_urls = [ "http://blah.me/" ] def parse(self,response): sel = Selector(response) sites = sel.xpath("//div[@class='ok-book-item']") i = -1 for site in sites: i += 1 author = site.xpath("//div[@class='ok-book-author']/text()").extract() link = site.xpath("//a[@data-book-type='epub']/@href").extract() title =site.xpath("//a[@data-book-type='epub']/@data-book-title").extract() with open('/caonima.txt','a') as f: f.write(title[i].strip()+':http://blah.me'+link[i]+'n') j = -1 for l in link: j += 1 url = 'http://blah.me'+ l filename = title[j]+'.epub' with open(filename,'wb') as handle: response = requests.get(url,stream=True) if not response.ok: # Something went wrong print 'failed:%s' % (title[j]+':'+url) for block in response.iter_content(1024): if not block: break handle.write(block) print '%s finished' % (title[j]) for link in range(2,100): request = scrapy.Request("http://blah.me/?p="+str(link),callback=self.parse) yield request 以上内容由PHP站长网【52php.cn】收集整理供大家参考研究 如果以上内容对您有帮助,欢迎收藏、点赞、推荐、分享。 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |