lxml无法解析xml(其他编码是否为utf-8)[python]
发布时间:2020-12-16 23:12:31 所属栏目:百科 来源:网络整理
导读:我的代码: import reimport requestsfrom lxml import etreeurl = 'http://weixin.sogou.com/gzhjs?openid=oIWsFt__d2wSBKMfQtkFfeVq_u8Iext=2JjmXOu9jMsFW8Sh4E_XmC0DOkcPpGX18Zm8qPG7F0L5ffrupfFtkDqSOm47Bv9U'r = requests.get(url)items = r.json()['it
我的代码:
import re import requests from lxml import etree url = 'http://weixin.sogou.com/gzhjs?openid=oIWsFt__d2wSBKMfQtkFfeVq_u8I&ext=2JjmXOu9jMsFW8Sh4E_XmC0DOkcPpGX18Zm8qPG7F0L5ffrupfFtkDqSOm47Bv9U' r = requests.get(url) items = r.json()['items'] >没有编码(‘utf-8’): etree.fromstring(items [0])输出: ValueError Traceback (most recent call last) <ipython-input-69-cb8697498318> in <module>() ----> 1 etree.fromstring(items[0]) lxml.etree.pyx in lxml.etree.fromstring (srclxmllxml.etree.c:68121)() parser.pxi in lxml.etree._parseMemoryDocument (srclxmllxml.etree.c:102435)() ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration. > with encode(‘utf-8’): etree.fromstring(items [0] .encode(‘utf-8’))输出: File "<string>",line unknown XMLSyntaxError: CData section not finished 鎶楀啺鎶㈤櫓鎹锋姤:闃冲寳I绾挎,line 1,column 281 不知道解析这个xml .. 解决方法
作为解决方法,您可以在将字符串传递给etree.fromstring之前删除编码属性:
xml = re.sub(r'bencoding="[-w]+"','',items[0],count=1) root = etree.fromstring(xml) 看到@ Lea在问题中的评论后更新: 使用显式编码指定解析器: xml = r.json()['items'].encode('utf-8') root = etree.fromstring(xml,parser=etree.XMLParser(encoding='utf-8')) (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |