Pandas / Python将两列转换为矩阵.矩阵中的列名称
发布时间:2020-12-20 12:02:30 所属栏目:Python 来源:网络整理
导读:我可以使用以下命令将两列成功转换为矩阵. dfb = datab.parse("a")dfb Name Product0 Mike Apple,pear1 John Orange,Banana2 Bob Banana3 Connie Pearpd.get_dummies(dfb.Product).groupby(dfb.Name).apply(max) Apple,pear Banana Orange,Banana PearName B
我可以使用以下命令将两列成功转换为矩阵.
dfb = datab.parse("a") dfb Name Product 0 Mike Apple,pear 1 John Orange,Banana 2 Bob Banana 3 Connie Pear pd.get_dummies(dfb.Product).groupby(dfb.Name).apply(max) Apple,pear Banana Orange,Banana Pear Name Bob 0 1 0 0 Connie 0 0 0 1 John 0 0 1 0 Mike 1 0 0 0 但是,我想要的矩阵如下. Apple Banana Orange Pear Name Bob 0 1 0 0 Connie 0 0 0 1 John 0 1 1 0 Mike 1 0 0 1 解决方法
1.
df = dfb.set_index('Name').Product.str.get_dummies(',') print (df) Apple Banana Orange Pear Name Mike 1 0 0 1 John 0 1 1 0 Bob 0 1 0 0 Connie 0 0 0 1 2. 解决方案 dfb = dfb.set_index('Name') df = pd.get_dummies(dfb.Product.str.split(',',expand=True),prefix='',prefix_sep='') .groupby(axis=1,level=0).max() print (df) Apple Banana Orange Pear Name Mike 1 0 0 1 John 0 1 1 0 Bob 0 1 0 0 Connie 0 0 0 1 3. split和MultiLabelBinarizer的解决方案: from sklearn.preprocessing import MultiLabelBinarizer mlb = MultiLabelBinarizer() df = pd.DataFrame(mlb.fit_transform(dfb.Product.str.split(',')),columns=mlb.classes_,index=dfb.Name) print (df) Apple Banana Orange Pear Name Mike 1 0 0 1 John 0 1 1 0 Bob 0 1 0 0 Connie 0 0 0 1 如果列名称重复: df = df.groupby('Name').max() print (df) Apple Banana Orange Pear Name Bob 0 1 0 0 Connie 0 0 0 1 John 0 1 1 0 Mike 1 0 0 1 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |