加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

生产者消费者模型爬取某金融网站数据!Python无所不爬!

发布时间:2020-12-17 01:20:48 所属栏目:Python 来源:网络整理
导读:div style="margin:0px;padding:0px;border:0px;font-size:14px;line-height:inherit;font-family:helvetica,Arial,'Hiragino Sans GB','Microsoft YaHei',simsun;vertical-align:baseline;color:rgb(89,89,89);"div class="wrap1 sclearfix" style="margin:

<div style="margin:0px;padding:0px;border:0px;font-size:14px;line-height:inherit;font-family:helvetica,Arial,'Hiragino Sans GB','Microsoft YaHei',simsun;vertical-align:baseline;color:rgb(89,89,89);"><div class="wrap1 sclearfix" style="margin:5px auto;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-size:0px;line-height:inherit;font-family:inherit;vertical-align:baseline;width:1000px;background-color:rgb(255,255,255);"><div class="center" style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;"><div class="container-main" style="margin:0px auto 0px 0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;float:left;width:700px;"><div class="article-detail" style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;"><div class="article-content" style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;font-size:18px;line-height:1.5;vertical-align:baseline;color:rgb(93,93,93);"><div style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:1.5;vertical-align:baseline;"><p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">那么这个模型和爬虫有什么关系呢?其实,爬虫可以认为是一个生产者,它不断从网站爬取数据,爬取到的数据就是食物;而所得数据需要消费者进行数据清洗,把有用的数据吸收掉,把无用的数据丢弃。

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">以上便是对生产者消费者模型的简单介绍了,下面针对本次爬取任务予以详细说明。

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">分析站点

<pre style="margin-bottom:0px;padding-right:0px;padding-left:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;"><pre class="source-code" style="margin-bottom:0px;padding:15px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;background:rgb(243,244,245);"><code style="margin:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;"><span style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;color:rgb(153,115);">http://www.cfachina<span style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;color:rgb(153,153,153);">.org/cfainfo/organbaseinfoServlet?all=personinfo<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">从网址及网页内容可以提取出以下信息:

<ol class="list-paddingleft-2" style="margin-top:1em;margin-bottom:0px;padding-left:30px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;list-style-position:outside;"><li style="margin:0px;padding:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;clear:both;"><p style="margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">网址

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">获取机构名称

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">获取机构信息对应的网页数量

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">每个机构的数据量是不等的,幸好每个页面都包含了当前页面数及总页面数。使用以下代码即可获取页码数。

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">获取当前页面从业人员信息

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">确定爬取方案

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">一般的想法当然是逐页爬取主页信息,然后获取每页所有机构对应的网页链接,进而继续爬取每个机构信息。

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">但是由于该网站的机构信息网址具有明显的规律,我们根据每个机构的编号便可直接得到每个机构每个信息页面的网址。所以具体爬取方案如下:

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">main

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">主函数用于创建和启动生产者线程和消费者线程,同时为生产者线程提供机构编号队列。

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">源码

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">

<p style="margin-top:1em;margin-bottom:0px;border:0px;font-style:inherit;font-variant:inherit;font-weight:inherit;line-height:inherit;font-family:inherit;vertical-align:baseline;">源码图如果看不清,请保存到本地观看!源码群:125240963

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读