python – Pandas:从DataFrame列创建词典字典的最有效方法
发布时间:2020-12-20 11:46:02 所属栏目:Python 来源:网络整理
导读:import pandas as pdimport numpy as npimport randomlabels = ["c1","c2","c3"]c1 = ["one","one","two","three","three"]c2 = [random.random() for i in range(len(c1))]c3 = ["alpha","beta","gamma","alpha","zeta"]DF = pd.DataFrame(np.array([c1,c2,
import pandas as pd import numpy as np import random labels = ["c1","c2","c3"] c1 = ["one","one","two","three","three"] c2 = [random.random() for i in range(len(c1))] c3 = ["alpha","beta","gamma","alpha","zeta"] DF = pd.DataFrame(np.array([c1,c2,c3])).T DF.columns = labels DataFrame看起来像: c1 c2 c3 0 one 0.440958516531 alpha 1 one 0.476439953723 beta 2 one 0.254235673552 gamma 3 two 0.882724336464 alpha 4 two 0.79817899139 gamma 5 three 0.677464637887 alpha 6 three 0.292927670096 beta 7 three 0.0971956881825 gamma 8 three 0.993934915508 zeta 我能想到制作字典的唯一方法是: D_greek_value = {} for greek in set(DF["c3"]): D_c1_c2 = {} for i in range(DF.shape[0]): row = DF.iloc[i,:] if row[2] == greek: D_c1_c2[row[0]] = row[1] D_greek_value[greek] = D_c1_c2 D_greek_value 生成的字典如下所示: {'alpha': {'one': '0.67919712421','three': '0.67171020684','two': '0.571150669821'},'beta': {'one': '0.895090207979','three': '0.489490074662'},'gamma': {'one': '0.964777504708','three': '0.134397632659','two': '0.10302290374'},'zeta': {'three': '0.0204226923557'}} 我不想假设c1会以块为单位(“one”每次都在一起).我在一个几百MB的csv上做这个,我觉得我做错了.如果您有任何想法,请帮忙! 解决方法
IIUC,您可以利用groupby来处理大部分工作:
>>> result = df.groupby("c3")[["c1","c2"]].apply(lambda x: dict(x.values)).to_dict() >>> pprint.pprint(result) {'alpha': {'one': 0.440958516531,'three': 0.677464637887,'two': 0.8827243364640001},'beta': {'one': 0.47643995372299996,'three': 0.29292767009599996},'gamma': {'one': 0.254235673552,'three': 0.0971956881825,'two': 0.79817899139},'zeta': {'three': 0.993934915508}} 一些解释:首先我们按c3分组,然后选择列c1和c2.这给了我们想要变成词典的小组: >>> grouped = df.groupby("c3")[["c1","c2"]] >>> grouped.apply(lambda x: print(x,"n","--")) # just for display purposes c1 c2 0 one 0.679926178687387 3 two 0.11495090934413166 5 three 0.7458197179794177 -- c1 c2 0 one 0.679926178687387 3 two 0.11495090934413166 5 three 0.7458197179794177 -- c1 c2 1 one 0.12943266757277916 6 three 0.28944292691097817 -- c1 c2 2 one 0.36642834809341274 4 two 0.5690944224514624 7 three 0.7018221838129789 -- c1 c2 8 three 0.7195852795555373 -- 鉴于这些子帧中的任何一个,比如倒数第二个,我们需要想出一种方法将其转换为字典.例如: >>> d3 c1 c2 2 one 0.366428 4 two 0.569094 7 three 0.701822 如果我们尝试使用dict或to_dict,我们就不会得到我们想要的东西,因为索引和列标签会妨碍: >>> dict(d3) {'c1': 2 one 4 two 7 three Name: c1,dtype: object,'c2': 2 0.366428 4 0.569094 7 0.701822 Name: c2,dtype: float64} >>> d3.to_dict() {'c1': {2: 'one',4: 'two',7: 'three'},'c2': {2: 0.36642834809341279,4: 0.56909442245146236,7: 0.70182218381297889}} 但是我们可以通过使用.values下拉到底层数据来忽略这一点,然后将其传递给dict: >>> d3.values array([['one',0.3664283480934128],['two',0.5690944224514624],['three',0.7018221838129789]],dtype=object) >>> dict(d3.values) {'three': 0.7018221838129789,'one': 0.3664283480934128,'two': 0.5690944224514624} 因此,如果我们应用这个,我们得到一个系列,索引为我们想要的c3键,值为字典,我们可以使用.to_dict()转换为字典: >>> result = df.groupby("c3")[["c1","c2"]].apply(lambda x: dict(x.values)) >>> result c3 alpha {'three': '0.7458197179794177','one': '0.6799... beta {'one': '0.12943266757277916','three': '0.289... gamma {'three': '0.7018221838129789','one': '0.3664... zeta {'three': '0.7195852795555373'} dtype: object >>> result.to_dict() {'zeta': {'three': '0.7195852795555373'},'gamma': {'three': '0.7018221838129789','one': '0.36642834809341274','two': '0.5690944224514624'},'beta': {'one': '0.12943266757277916','three': '0.28944292691097817'},'alpha': {'three': '0.7458197179794177','one': '0.679926178687387','two': '0.11495090934413166'}} (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |