python – 为嵌套模式解析字符串
发布时间:2020-12-20 12:15:13 所属栏目:Python 来源:网络整理
导读:什么是最好的方法来做到这一点. 输入字符串是 133_3135_3116_2The other system worked for about 1 month/116_2 got some good images 137_3on it then it started doing the same thing as the first one/137_3 so then I quit using either camera now th
什么是最好的方法来做到这一点.
输入字符串是 <133_3><135_3><116_2>The other system worked for about 1 month</116_2> got some good images <137_3>on it then it started doing the same thing as the first one</137_3> so then I quit using either camera now they are just sitting and collecting dust.</135_3></133_3> 预期的产出是 {'The other system worked for about 1 month got some good images on it then it started doing the same thing as the first one so then I quit using either camera now they are just sitting and collecting dust.':[133,135],'The other system worked for about 1 month': [116],'on it then it started doing the same thing as the first one':[137] } 这似乎是一个递归的正则表达式搜索,但我无法弄清楚究竟如何. 我现在可以想到一个单调乏味的递归函数,但感觉应该有更好的方法. 相关问题:Can regular expressions be used to match nested patterns? 解决方法
使用expat或其他XML解析器;考虑到你正在处理XML数据,它比其他任何东西都更明确.
但是,请注意,XML元素名称不能以您的示例所具有的数字开头. 这是一个可以满足你需要的解析器,虽然你需要调整它以将重复的元素组合成一个dict键: from xml.parsers.expat import ParserCreate open_elements = {} result_dict = {} def start_element(name,attrs): open_elements[name] = True def end_element(name): del open_elements[name] def char_data(data): for element in open_elements: cur = result_dict.setdefault(element,'') result_dict[element] = cur + data if __name__ == '__main__': p = ParserCreate() p.StartElementHandler = start_element p.EndElementHandler = end_element p.CharacterDataHandler = char_data p.Parse(u'<_133_3><_135_3><_116_2>The other system worked for about 1 month</_116_2> got some good images <_137_3>on it then it started doing the same thing as the first one</_137_3> so then I quit using either camera now they are just sitting and collecting dust.</_135_3></_133_3>',1) print result_dict (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |