python – 将索引拆分为pandas中的单独列
发布时间:2020-12-20 12:02:53 所属栏目:Python 来源:网络整理
导读:我有一个大型数据框,从中我可以获得groupby所需的数据.我需要从新数据帧的索引中获取几个单独的列. 部分原始数据框如下所示: code place vl year week0 111.0002.0056 region1 1 2017 291 112.6500.2285 region2 1 2017 312 112.5600.6325 region2 1 2017 3
我有一个大型数据框,从中我可以获得groupby所需的数据.我需要从新数据帧的索引中获取几个单独的列.
部分原始数据框如下所示: code place vl year week 0 111.0002.0056 region1 1 2017 29 1 112.6500.2285 region2 1 2017 31 2 112.5600.6325 region2 1 2017 30 3 112.5600.6325 region2 1 2017 30 4 112.5600.8159 region2 1 2017 30 5 111.0002.0056 region2 1 2017 29 6 111.0002.0056 region2 1 2017 30 7 111.0002.0056 region2 1 2017 28 8 112.5600.8159 region3 1 2017 31 9 112.5600.8159 region3 1 2017 28 10 111.0002.0114 region3 1 2017 31 .... 应用groupby后,它看起来像这样(代码:df_test1 = df_test.groupby([‘code’,’year’,’week’,’place’])[‘vl’].sum().unstack(fill_value = 0 )): place region1 region2 region3 region4 index1 code year week 111.0002.0006 2017 29 0 3 0 0 (111.0002.0006,2017,29) 30 0 7 0 0 (111.0002.0006,30) 111.0002.0018 2017 29 0 0 0 0 (111.0002.0018,29) 111.0002.0029 2017 30 0 0 0 0 (111.0002.0029,30) 111.0002.0055 2017 28 0 33 0 8 (111.0002.0055,28) 29 1 155 2 41 (111.0002.0055,29) 30 0 142 1 39 (111.0002.0055,30) 31 0 31 0 13 (111.0002.0055,31) 111.0002.0056 2017 28 9 36 0 4 (111.0002.0056,28) 29 20 75 2 37 (111.0002.0056,29) 30 17 81 2 33 (111.0002.0056,30) .... 我将索引保存在单独的列index1中(代码:df_test1 [‘index1’] = df_test1.index) 结果应如下所示: region1 region2 region3 region4 code year week 0 3 0 0 111.0002.0006 2017 29 0 7 0 0 111.0002.0006 2017 30 0 0 0 0 111.0002.0018 2017 29 0 0 0 0 111.0002.0029 2017 30 0 33 0 8 111.0002.0055 2017 28 1 155 2 41 111.0002.0055 2017 29 0 142 1 39 111.0002.0055 2017 30 0 31 0 13 111.0002.0055 2017 31 .... 我会很感激任何建议! 解决方法
你添加
reset_index 而不是df_test1 [‘index1’] = df_test1.index,对于clean df add
rename_axis – 它删除列名称:
df_test1 = df_test.groupby(['code','year','week','place'])['vl'].sum() .unstack(fill_value=0) .reset_index() .rename_axis(None,axis=1) print (df_test1) code year week region1 region2 region3 0 111.0002.0056 2017 28 0 1 0 1 111.0002.0056 2017 29 1 1 0 2 111.0002.0056 2017 30 0 1 0 3 111.0002.0114 2017 31 0 0 1 4 112.5600.6325 2017 30 0 2 0 5 112.5600.8159 2017 28 0 0 1 6 112.5600.8159 2017 30 0 1 0 7 112.5600.8159 2017 31 0 0 1 8 112.6500.2285 2017 31 0 1 0 如有必要,最后更改列的排序: #all cols are columns in df_test1 cols = ['code','week'] df_test1 = df_test1[[x for x in df_test1.columns if x not in cols] + cols] print (df_test1) region1 region2 region3 code year week 0 0 1 0 111.0002.0056 2017 28 1 1 1 0 111.0002.0056 2017 29 2 0 1 0 111.0002.0056 2017 30 3 0 0 1 111.0002.0114 2017 31 4 0 2 0 112.5600.6325 2017 30 5 0 0 1 112.5600.8159 2017 28 6 0 1 0 112.5600.8159 2017 30 7 0 0 1 112.5600.8159 2017 31 8 0 1 0 112.6500.2285 2017 31 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |