加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 百科 > 正文

正则匹配的爬虫

发布时间:2020-12-14 06:16:29 所属栏目:百科 来源:网络整理
导读:import requestsimport reclass Anjuke(object):??? def __init__(self):??????? self.url = "https://beijing.anjuke.com/sale/huairou/o5/"??????? self.headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML,like Ge

import requestsimport reclass Anjuke(object):??? def __init__(self):??????? self.url = "https://beijing.anjuke.com/sale/huairou/o5/"??????? self.headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.3 (KHTML,like Gecko) Chrome/19.0.1063.0 Safari/536.3"}??????? self.pattern = re.compile(‘<ul id="houselist-mod-new" class="houselist-mod houselist-mod-new">(.*?)</ul>‘,re.S)??????? self.second_pattern = re.compile(‘<(.*?)>|&(.*?);|s‘)??? def send_request(self):??????? reponse = requests.get(self.url,headers=self.headers)??????? data = reponse.content.decode()??????? print(data)??????? return data??? def save_data(self,result_data):??????? with open(‘anjuke.text‘,‘a‘) as f:??????????? for data in result_data:??????????????? second_content = self.second_pattern.sub(‘‘,data) + ‘nn‘??????????????? f.write(second_content)??? def analysis_data(self,data):??????? result_list = self.pattern.findall(data)??????? return result_list??? def run(self):??????? data = self.send_request()??????? result_list = self.analysis_data(data)??????? print(result_list)??????? self.save_data(result_list)if __name__ == ‘__main__‘:??? Anjuke().run()

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读