python – dataframe使列代表向量
发布时间:2020-12-20 11:06:49 所属栏目:Python 来源:网络整理
导读:我有流派的数据框 df = pd.DataFrame({'genres': [['Drama'],['Music','Drama','Romance'],['Action','Adventure','Comedy'],['Thriller','Romance','Drama'],['Adventure','Family']] })print(df)genres = ['Action','Comedy','Family','Music','Thriller']
我有流派的数据框
df = pd.DataFrame({'genres': [['Drama'],['Music','Drama','Romance'],['Action','Adventure','Comedy'],['Thriller','Romance','Drama'],['Adventure','Family']] }) print(df) genres = ['Action','Comedy','Family','Music','Thriller'] # list of all genres 数据: genres 0 [Drama] 1 [Music,Drama,Romance] 2 [Action,Adventure,Comedy] 3 [Thriller,Romance,Drama] 4 [Adventure,Family] 我想要输出像: genres Action Adventure Comedy Drama Family 0 [Drama] 0 0 0 1 0 1 [Music,Romance] 0 0 0 1 0 2 [Action,Comedy] 1 1 1 0 0 3 [Thriller,Drama] 0 0 0 1 0 4 [Adventure,Family] 0 1 0 0 1 Music Romance Thriller 0 0 0 0 1 1 1 0 2 0 0 0 3 0 1 1 4 0 0 0 解决方法
使用
MultiLabelBinarizer :
from sklearn.preprocessing import MultiLabelBinarizer mlb = MultiLabelBinarizer() df1 = pd.DataFrame(mlb.fit_transform(df['genres']),columns=mlb.classes_,index=df.index) df = df.join(df1) print (df) genres Action Adventure Comedy Drama Family 0 [Drama] 0 0 0 1 0 1 [Music,Family] 0 1 0 0 1 Music Romance Thriller 0 0 0 0 1 1 1 0 2 0 0 0 3 0 1 1 4 0 0 0 如果想通过列表过滤类型添加 genres = ['Action','Drama'] df1 = pd.DataFrame(mlb.fit_transform(df['genres']),index=df.index) df = df.join(df1.reindex(columns=genres,fill_value=0)) print (df) genres Action Adventure Comedy Drama 0 [Drama] 0 0 0 1 1 [Music,Romance] 0 0 0 1 2 [Action,Comedy] 1 1 1 0 3 [Thriller,Drama] 0 0 0 1 4 [Adventure,Family] 0 1 0 0 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |