asyncio高性能爬虫
发布时间:2020-12-15 01:23:48 所属栏目:C语言 来源:网络整理
导读:#asyncio 没有提供http协议的接口 aiohttpimport asyncioimport socketfrom urllib.parse import urlparse??async def get_url(url):? ? #通过socket请求html? ? url = urlparse(url)? ? host = url.netloc? ? path = url.path? ? if path == "":? ? ? ? pat
#asyncio 没有提供http协议的接口 aiohttp import asyncio import socket from urllib.parse import urlparse ? ? async def get_url(url): ? ? #通过socket请求html ? ? url = urlparse(url) ? ? host = url.netloc ? ? path = url.path ? ? if path == "": ? ? ? ? path = "/" ? ? ? #建立socket连接 ? ? reader,writer = await asyncio.open_connection(host,80) ? ? writer.write("GET {} HTTP/1.1rnHost:{}rnConnection:closernrn".format(path,host).encode("utf8")) ? ? all_lines = [] ? ? async for raw_line in reader: ? ? ? ? data = raw_line.decode("utf8") ? ? ? ? all_lines.append(data) ? ? html = "n".join(all_lines) ? ? return html ? async def main(): ? ? tasks = [] ? ? for url in range(20): ? ? ? ? url = "http://shop.projectsedu.com/goods/{}/".format(url) ? ? ? ? tasks.append(asyncio.ensure_future(get_url(url))) ? ? for task in asyncio.as_completed(tasks): ? ? ? ? result = await task ? ? ? ? print(result) ? if __name__ == "__main__": ? ? import time ? ? start_time = time.time() ? ? loop = asyncio.get_event_loop() ? ? loop.run_until_complete(main()) ? ? print('last time:{}'.format(time.time()-start_time)) (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |