Python – 在字典列表中查找重复项并对其进行分组
发布时间:2020-12-16 23:21:49 所属栏目:Python 来源:网络整理
导读:我不是程序员,也不是 python的新手,我有一个来自json文件的dicts列表: # JSON file (film.json)[{"year": ["1999"],"director": ["Wachowski"],"film": ["The Matrix"],"price": ["19,00"]},{"year": ["1994"],"director": ["Tarantino"],"film": ["Pulp Fi
我不是程序员,也不是
python的新手,我有一个来自json文件的dicts列表:
# JSON file (film.json) [{"year": ["1999"],"director": ["Wachowski"],"film": ["The Matrix"],"price": ["19,00"]},{"year": ["1994"],"director": ["Tarantino"],"film": ["Pulp Fiction"],"price": ["20,{"year": ["2003"],"film": ["Kill Bill vol.1"],"price": ["10,"film": ["The Matrix Reloaded"],"price": ["9,99"]},"film": ["Pulp Fyction"],"price": ["15,"director": ["E. de Souza"],"film": ["Street Fighter"],"price": ["2,{"year": ["1999"],{"year": ["1982"],"director": ["Ridley Scott"],"film": ["Blade Runner"],99"]}] 我可以导入json文件: import json json_file = open('film.json') f = json.load(json_file) 但在那之后,我无法在f中找到事件,并按电影片名分组. ## result grouped by 'film' #group 1 {"year": ["1999"],00"]} {"year": ["1999"],00"]} #group 2 {"year": ["1994"],00"]} {"year": ["1994"],00"]} #group X ... 或更好: new_dict = { 'group1':[[],[],...],'group2':[[],'groupX':[...] } 目前我正在测试嵌套,但没有运气.. 谢谢. 注意:“纸浆fyction”是未来实现的模糊字符串匹配的错误,现在我只需要一个’重复的石斑鱼’ note2:使用python 2.x. 解决方法
由于您的数据未排序,请使用
collections.defaultdict() object为新密钥显示列表,然后按电影标题键入:
from collections import defaultdict grouped = defaultdict(list) for film in f: grouped[film['film'][0]].append(film) 电影[‘电影’] [0]值用于分组电影.如果您想使用更复杂的标题分组,则必须创建该密钥的规范版本. 演示: >>> from collections import defaultdict >>> import json >>> with open('film.json') as film_file: ... f = json.load(film_file) ... >>> grouped = defaultdict(list) >>> for film in f: ... grouped[film['film'][0]].append(film) ... >>> grouped defaultdict(<type 'list'>,{u'Street Fighter': [{u'director': [u'E. de Souza'],u'price': [u'2,00'],u'film': [u'Street Fighter'],u'year': [u'1994']}],u'Pulp Fiction': [{u'director': [u'Tarantino'],u'price': [u'20,u'film': [u'Pulp Fiction'],u'Pulp Fyction': [{u'director': [u'Tarantino'],u'price': [u'15,u'film': [u'Pulp Fyction'],u'The Matrix': [{u'director': [u'Wachowski'],u'price': [u'19,u'film': [u'The Matrix'],u'year': [u'1999']},{u'director': [u'Wachowski'],u'year': [u'1999']}],u'Blade Runner': [{u'director': [u'Ridley Scott'],99'],u'film': [u'Blade Runner'],u'year': [u'1982']}],u'Kill Bill vol.1': [{u'director': [u'Tarantino'],u'price': [u'10,u'film': [u'Kill Bill vol.1'],u'year': [u'2003']}],u'The Matrix Reloaded': [{u'director': [u'Wachowski'],u'price': [u'9,u'film': [u'The Matrix Reloaded'],u'year': [u'2003']}]}) >>> from pprint import pprint >>> pprint(dict(grouped)) {u'Blade Runner': [{u'director': [u'Ridley Scott'],u'Street Fighter': [{u'director': [u'E. de Souza'],u'year': [u'2003']}]} 使用SoundEx分组电影将如下: from itertools import groupby,islice,ifilter _codes = ('bfpv','cgjkqsxz','dt','l','mn','r') _sounds = {c: str(i) for i,code in enumerate(_codes,1) for c in code} _sounds.update(dict.fromkeys('aeiouy')) def soundex(word,_sounds=_sounds): grouped = groupby(_sounds[c] for c in word.lower() if c in _sounds) if _sounds.get(word[0].lower()): next(grouped) # remove first group. sdx = ''.join([k for k,g in islice((g for g in grouped if g[0]),3)]) return word[0].upper() + format(sdx,'<03') grouped_by_soundex = defaultdict(list) for film in f: grouped_by_soundex[soundex(film['film'][0])].append(film) 导致: >>> pprint(dict(grouped_by_soundex)) {u'B436': [{u'director': [u'Ridley Scott'],u'K414': [{u'director': [u'Tarantino'],u'P412': [{u'director': [u'Tarantino'],u'year': [u'1994']},{u'director': [u'Tarantino'],u'S363': [{u'director': [u'E. de Souza'],u'T536': [{u'director': [u'Wachowski'],u'year': [u'2003']},u'year': [u'1999']}]} (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |