加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

python – 为嵌套模式解析字符串

发布时间:2020-12-20 12:15:13 所属栏目:Python 来源:网络整理
导读:什么是最好的方法来做到这一点. 输入字符串是 133_3135_3116_2The other system worked for about 1 month/116_2 got some good images 137_3on it then it started doing the same thing as the first one/137_3 so then I quit using either camera now th
什么是最好的方法来做到这一点.

输入字符串是

<133_3><135_3><116_2>The other system worked for about 1 month</116_2> got some good images <137_3>on it then it started doing the same thing as the first one</137_3> so then I quit using either camera now they are just sitting and collecting dust.</135_3></133_3>

预期的产出是

{'The other system worked for about 1 month got some good images on it then it started doing the same thing as the first one so then I quit 
using either camera now they are just sitting and collecting dust.':[133,135],'The other system worked for about 1 month': [116],'on it then it started doing the same thing as the first one':[137]

}

这似乎是一个递归的正则表达式搜索,但我无法弄清楚究竟如何.

我现在可以想到一个单调乏味的递归函数,但感觉应该有更好的方法.

相关问题:Can regular expressions be used to match nested patterns?

解决方法

使用expat或其他XML解析器;考虑到你正在处理XML数据,它比其他任何东西都更明确.

但是,请注意,XML元素名称不能以您的示例所具有的数字开头.

这是一个可以满足你需要的解析器,虽然你需要调整它以将重复的元素组合成一个dict键:

from xml.parsers.expat import ParserCreate

open_elements = {}
result_dict = {}

def start_element(name,attrs):
    open_elements[name] = True

def end_element(name):
    del open_elements[name]

def char_data(data):
    for element in open_elements:
        cur = result_dict.setdefault(element,'')
        result_dict[element] = cur + data

if __name__ == '__main__':
    p = ParserCreate()

    p.StartElementHandler = start_element
    p.EndElementHandler = end_element
    p.CharacterDataHandler = char_data

    p.Parse(u'<_133_3><_135_3><_116_2>The other system worked for about 1 month</_116_2> got some good images <_137_3>on it then it started doing the same thing as the first one</_137_3> so then I quit using either camera now they are just sitting and collecting dust.</_135_3></_133_3>',1)

    print result_dict

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读