python – HTTP错误403:读取HTML时禁止
发布时间:2020-12-20 12:12:34 所属栏目:Python 来源:网络整理
导读:我想阅读以下html, import pandas as pddaily_info=pd.read_html('https://www.investing.com/earnings-calendar/',flavor='html5lib')print(daily_info) 不幸的是出现了: urllib.error.HTTPError:?HTTP?Error?403:?Forbidden 无论如何要解决它吗? 解决方
我想阅读以下html,
import pandas as pd daily_info=pd.read_html('https://www.investing.com/earnings-calendar/',flavor='html5lib') print(daily_info) 不幸的是出现了: urllib.error.HTTPError:?HTTP?Error?403:?Forbidden 无论如何要解决它吗? 解决方法
假装是一个浏览器:
import requests url = 'https://www.investing.com/earnings-calendar/' header = { "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/50.0.2661.75 Safari/537.36","X-Requested-With": "XMLHttpRequest" } r = requests.get(url,headers=header) dfs = pd.read_html(r.text) 结果: In [201]: len(dfs) Out[201]: 7 In [202]: dfs[0] Out[202]: 0 1 2 3 0 NaN NaN NaN NaN In [203]: dfs[1] Out[203]: Unnamed: 0 Company EPS /??Forecast Revenue /??Forecast.1 Market Cap Time 0 Monday,April 24,2017 NaN NaN NaN NaN NaN NaN NaN 1 NaN Acadia?(AKR) -- / 0.11 -- / -- 2.63B NaN 2 NaN Agree?(ADC) -- / 0.39 -- / -- 1.34B NaN 3 NaN Alcoa?(AA) -- / 0.53 -- / -- 5.84B NaN 4 NaN American Campus?(ACC) -- / 0.27 -- / -- 6.62B NaN 5 NaN Ameriprise Financial?(AMP) -- / 2.52 -- / -- 19.76B NaN 6 NaN Avacta Group?(AVTG) -- / -- 1.26M / -- 47.53M NaN 7 NaN Bank of Hawaii?(BOH) 1.2 / 1.08 165.8M / -- 3.48B NaN 8 NaN Bank of Marin?(BMRC) 0.74 / 0.8 -- / -- 422.29M NaN 9 NaN Banner?(BANR) -- / 0.68 -- / -- 1.82B NaN 10 NaN Barrick Gold?(ABX) -- / 0.2 -- / -- 22.44B NaN 11 NaN Barrick Gold?(ABX) -- / 0.28 -- / -- 30.28B NaN 12 NaN Berkshire Hills Bancorp?(BHLB) -- / 0.54 -- / -- 1.25B NaN 13 NaN Brookfield Canada Office Properties?(BOXC) -- / -- -- / -- NaN NaN ... (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |