如何使用正则表达式匹配按列对Pandas数据进行分组
发布时间:2020-12-14 05:37:49 所属栏目:百科 来源:网络整理
导读:我有以下数据框: import pandas as pddf = pd.DataFrame({'id':['a','b','c','d','e'],'XX_111_S5_R12_001_Mobile_05':[-14,-90,-96,-91],'YY_222_S00_R12_001_1-999_13':[-103,-110,-114,-114],'ZZ_111_S00_R12_001_1-999_13':[1,2.3,3,5,6],})df.set_inde
我有以下数据框:
import pandas as pd df = pd.DataFrame({'id':['a','b','c','d','e'],'XX_111_S5_R12_001_Mobile_05':[-14,-90,-96,-91],'YY_222_S00_R12_001_1-999_13':[-103,-110,-114,-114],'ZZ_111_S00_R12_001_1-999_13':[1,2.3,3,5,6],}) df.set_index('id',inplace=True) df 看起来像这样: Out[6]: XX_111_S5_R12_001_Mobile_05 YY_222_S00_R12_001_1-999_13 ZZ_111_S00_R12_001_1-999_13 id a -14 -103 1.0 b -90 0 2.3 c -90 -110 3.0 d -96 -114 5.0 e -91 -114 6.0 我想要做的是根据以下正则表达式对列进行分组: w+_w+_w+_d+_([wd-]+)_d+ 所以最终它被Mobile和1-999分组. 有什么办法呢.我尝试了这个,但未能将它们分组: import re grouped = df.groupby(lambda x: re.search("w+_w+_w+_d+_([wd-]+)_d+",x).group(),axis=1) for name,group in grouped: print name print group 哪个印刷品: XX_111_S5_R12_001_Mobile_05 YY_222_S00_R12_001_1-999_13 ZZ_111_S00_R12_001_1-999_13 我们想要的是名字打印到: Mobile 1-999 1-999 并且组打印相应的数据框. 解决方法
您可以在列上使用
.str.extract ,以便为您的groupby使用
extract substrings:
# Performing the groupby. pat = 'w+_w+_w+_d+_([wd-]+)_d+' grouped = df.groupby(df.columns.str.extract(pat,expand=False),axis=1) # Showing group information. for name,group in grouped: print name print group,'n' 返回预期的组: 1-999 YY_222_S00_R12_001_1-999_13 ZZ_111_S00_R12_001_1-999_13 id a -103 1.0 b 0 2.3 c -110 3.0 d -114 5.0 e -114 6.0 Mobile XX_111_S5_R12_001_Mobile_05 id a -14 b -90 c -90 d -96 e -91 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |