加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

python – dataframe使列代表向量

发布时间:2020-12-20 11:06:49 所属栏目:Python 来源:网络整理
导读:我有流派的数据框 df = pd.DataFrame({'genres': [['Drama'],['Music','Drama','Romance'],['Action','Adventure','Comedy'],['Thriller','Romance','Drama'],['Adventure','Family']] })print(df)genres = ['Action','Comedy','Family','Music','Thriller']
我有流派的数据框

df = pd.DataFrame({'genres': [['Drama'],['Music','Drama','Romance'],['Action','Adventure','Comedy'],['Thriller','Romance','Drama'],['Adventure','Family']]
                    })
print(df)
genres = ['Action','Comedy','Family','Music','Thriller']  # list of all genres

数据:

genres
0                      [Drama]
1      [Music,Drama,Romance]
2  [Action,Adventure,Comedy]
3   [Thriller,Romance,Drama]
4          [Adventure,Family]

我想要输出像:

genres  Action  Adventure  Comedy  Drama  Family  
0                      [Drama]       0          0       0      1       0   
1      [Music,Romance]       0          0       0      1       0   
2  [Action,Comedy]       1          1       1      0       0   
3   [Thriller,Drama]       0          0       0      1       0   
4          [Adventure,Family]       0          1       0      0       1   

   Music  Romance  Thriller  
0      0        0         0  
1      1        1         0  
2      0        0         0  
3      0        1         1  
4      0        0         0

解决方法

使用 MultiLabelBinarizer

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()

df1 = pd.DataFrame(mlb.fit_transform(df['genres']),columns=mlb.classes_,index=df.index)
df = df.join(df1)
print (df)
                        genres  Action  Adventure  Comedy  Drama  Family  
0                      [Drama]       0          0       0      1       0   
1      [Music,Family]       0          1       0      0       1   

   Music  Romance  Thriller  
0      0        0         0  
1      1        1         0  
2      0        0         0  
3      0        1         1  
4      0        0         0

如果想通过列表过滤类型添加reindex

genres = ['Action','Drama']

df1 = pd.DataFrame(mlb.fit_transform(df['genres']),index=df.index)
df = df.join(df1.reindex(columns=genres,fill_value=0))
print (df)
                        genres  Action  Adventure  Comedy  Drama
0                      [Drama]       0          0       0      1
1      [Music,Romance]       0          0       0      1
2  [Action,Comedy]       1          1       1      0
3   [Thriller,Drama]       0          0       0      1
4          [Adventure,Family]       0          1       0      0

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读