如何通过Python(pandas)中的列中的事件对Dataframe进行排序
发布时间:2020-12-20 13:11:17 所属栏目:Python 来源:网络整理
导读:我正在尝试使用 python中的pandas从我的数据(化学物质和蛋白质之间的分数)创建数据帧. 我希望我的数据帧首先显示出现次数最多的蛋白质,所以我之前对数据进行了排序.但是当我创建数据帧时,它没有得到预期的结果. 这是我的数据样本: chemicals prots scoresCI
我正在尝试使用
python中的pandas从我的数据(化学物质和蛋白质之间的分数)创建数据帧.
我希望我的数据帧首先显示出现次数最多的蛋白质,所以我之前对数据进行了排序.但是当我创建数据帧时,它没有得到预期的结果. 这是我的数据样本: chemicals prots scores CID000000006 10116.ENSRNOP00000003921 196 CID000000051 10116.ENSRNOP00000003921 246 CID000000085 10116.ENSRNOP00000003921 196 CID000000119 10116.ENSRNOP00000003921 247 CID000000134 10116.ENSRNOP00000008952 159 CID000000135 10116.ENSRNOP00000008952 157 CID000000174 10116.ENSRNOP00000008952 439 CID000000175 10116.ENSRNOP00000001021 858 CID000000177 10116.ENSRNOP00000004027 760 如您所见,“10116.ENSRNOP00000003921”是我数据中出现次数最多的蛋白质. 所以我想得到类似的东西: 10116.ENSRNOP00000003921 10116.ENSRNOP00000008952 CID000000006 196 CID000000051 246 CID000000085 196 CID000000119 247 CID000000134 159 CID000000135 157 CID000000174 439 这是我的代码: import pandas as pd df_rat= pd.read_csv("dt_matrix_rat.csv",sep="t",header=True) df_rat.columns = ['chemicals','proteins','scores'] df_rat1 = df_rat.pivot(index='chemicals',columns='proteins',values='scores') df_rat1.to_csv("rat_matrix.csv",sep='t',index=True ) 解决方法
我认为你需要0700 0700的
sort_values 并获得cols的索引. Lasy使用子集:
df1 = df.pivot(index='chemicals',values='scores') cols = df1.notnull().sum(axis=0).sort_values(ascending=False).index print cols Index([u'10116.ENSRNOP00000003921',u'10116.ENSRNOP00000008952',u'10116.ENSRNOP00000004027',u'10116.ENSRNOP00000001021'],dtype='object',name=u'proteins') print df1[cols] proteins 10116.ENSRNOP00000003921 10116.ENSRNOP00000008952 chemicals CID000000006 196.0 NaN CID000000051 246.0 NaN CID000000085 196.0 NaN CID000000119 247.0 NaN CID000000134 NaN 159.0 CID000000135 NaN 157.0 CID000000174 NaN 439.0 CID000000175 NaN NaN CID000000177 NaN NaN proteins 10116.ENSRNOP00000004027 10116.ENSRNOP00000001021 chemicals CID000000006 NaN NaN CID000000051 NaN NaN CID000000085 NaN NaN CID000000119 NaN NaN CID000000134 NaN NaN CID000000135 NaN NaN CID000000174 NaN NaN CID000000175 NaN 858.0 CID000000177 760.0 NaN 或 print df1.reindex_axis(cols,axis=1) proteins 10116.ENSRNOP00000003921 10116.ENSRNOP00000008952 chemicals CID000000006 196.0 NaN CID000000051 246.0 NaN CID000000085 196.0 NaN CID000000119 247.0 NaN CID000000134 NaN 159.0 CID000000135 NaN 157.0 CID000000174 NaN 439.0 CID000000175 NaN NaN CID000000177 NaN NaN proteins 10116.ENSRNOP00000004027 10116.ENSRNOP00000001021 chemicals CID000000006 NaN NaN CID000000051 NaN NaN CID000000085 NaN NaN CID000000119 NaN NaN CID000000134 NaN NaN CID000000135 NaN NaN CID000000174 NaN NaN CID000000175 NaN 858.0 CID000000177 760.0 NaN (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |