我可以在Python 3上提供lxml.etree.parse的URL吗?
发布时间:2020-12-16 23:44:20 所属栏目:Python 来源:网络整理
导读:文档说我可以: lxml can parse from a local file,an HTTP URL or an FTP URL. It also auto-detects and reads gzip-compressed XML files (.gz). (从“Parsers”下的http://lxml.de/parsing.html起) 但一个快速的实验似乎暗示: Python 3.4.1 (v3.4.1:c0e
文档说我可以:
(从“Parsers”下的http://lxml.de/parsing.html起) 但一个快速的实验似乎暗示: Python 3.4.1 (v3.4.1:c0e311e010fc,May 18 2014,10:45:13) [MSC v.1600 64 bit (AMD64)] on win32 Type "help","copyright","credits" or "license" for more information. >>> from lxml import etree >>> parser = etree.HTMLParser() >>> from urllib.request import urlopen >>> with urlopen('https://pypi.python.org/simple') as f: ... tree = etree.parse(f,parser) ... >>> tree2 = etree.parse('https://pypi.python.org/simple',parser) Traceback (most recent call last): File "<stdin>",line 1,in <module> File "lxml.etree.pyx",line 3299,in lxml.etree.parse (srclxmllxml.etree.c:72655) File "parser.pxi",line 1791,in lxml.etree._parseDocument (srclxmllxml.etree.c:106263) File "parser.pxi",line 1817,in lxml.etree._parseDocumentFromURL (srclxmllxml.etree.c:106564) File "parser.pxi",line 1721,in lxml.etree._parseDocFromFile (srclxmllxml.etree.c:105561) File "parser.pxi",line 1122,in lxml.etree._BaseParser._parseDocFromFile (srclxmllxml.etree.c:100456) File "parser.pxi",line 580,in lxml.etree._ParserContext._handleParseResultDoc (srclxmllxml.etree.c:94543) File "parser.pxi",line 690,in lxml.etree._handleParseResult (srclxmllxml.etree.c:96003) File "parser.pxi",line 618,in lxml.etree._raiseParseError (srclxmllxml.etree.c:95015) OSError: Error reading file 'https://pypi.python.org/simple': failed to load external entity "https://pypi.python.org/simple" >>> 我可以使用urlopen方法,但文档似乎暗示传递URL在某种程度上更好.另外,如果文档不准确,我有点担心依赖lxml,特别是如果我开始需要做更复杂的事情. 从已知的URL解析带有lxml的HTML的正确方法是什么?我应该在哪里看到有记录的? 更新:如果我使用http网址而不是https网址,我会收到同样的错误. 解决方法
问题是lxml不支持HTTPS URL,而
http://pypi.python.org/simple重定向到HTTPS版本.
因此,对于任何安全的网站,您需要自己阅读URL: from lxml import etree from urllib.request import urlopen parser = etree.HTMLParser() with urlopen('https://pypi.python.org/simple') as f: tree = etree.parse(f,parser) (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
相关内容
- Python通过Django实现用户注册和邮箱验证功能代码
- python – Errno 185090050 _ssl.c:343:错误:0B084002:
- python – 未在类型注释中定义的名称
- python – 如何在Pylons中启动后台进程?
- python – 在为Apache Hadoop安装Hue浏览器时,Setuptools p
- 使用Python的Scrapy框架编写web爬虫的简单示例
- python – 有没有办法更改未使用basicConfig配置的记录器对
- Python网络编程之socket模块基础实例!
- Python,pandas:如何将一个系列附加到数据帧
- 终于打造成功无须充会员也能看vip视频的脚本了!亲测可用哦