python – 为什么构造multiIndex Dataframe时所有元素都是NaN
发布时间:2020-12-20 11:04:32 所属栏目:Python 来源:网络整理
导读:假设我有这样的Dataframe.我想将其转换为2级multiIndex数据帧. dt st close volume0 20100101 000001.sz 1 100001 20100101 000002.sz 10 500002 20100101 000003.sz 5 10003 20100101 000004.sz 15 70004 20100101 000005.sz 100 1000005 20100102 000001.s
假设我有这样的Dataframe.我想将其转换为2级multiIndex数据帧.
dt st close volume 0 20100101 000001.sz 1 10000 1 20100101 000002.sz 10 50000 2 20100101 000003.sz 5 1000 3 20100101 000004.sz 15 7000 4 20100101 000005.sz 100 100000 5 20100102 000001.sz 2 20000 6 20100102 000002.sz 20 60000 7 20100102 000003.sz 6 2000 8 20100102 000004.sz 20 8000 9 20100102 000005.sz 110 110000 但是当我尝试这段代码时: data = pd.read_csv('data/trial.csv') print(data) idx = pd.MultiIndex.from_product([data.dt.unique(),data.st.unique()],names=['dt','st']) col = ['close','volume'] df = pd.DataFrame(data,idx,col) print(df) 我发现所有元素都是NaN close volume dt st 20100101 000001.sz NaN NaN 000002.sz NaN NaN 000003.sz NaN NaN 000004.sz NaN NaN 000005.sz NaN NaN 20100102 000001.sz NaN NaN 000002.sz NaN NaN 000003.sz NaN NaN 000004.sz NaN NaN 000005.sz NaN NaN 如何处理这种情况?谢谢. 解决方法
在
read_csv 中只需要参数index_col:
#by positions of columns data = pd.read_csv('data/trial.csv',index_col=[0,1]) 要么: #by names of columns data = pd.read_csv('data/trial.csv',index_col=['dt','st']) print (data) close volume dt st 20100101 000001.sz 1 10000 000002.sz 10 50000 000003.sz 5 1000 000004.sz 15 7000 000005.sz 100 100000 20100102 000001.sz 2 20000 000002.sz 20 60000 000003.sz 6 2000 000004.sz 20 8000 000005.sz 110 110000
原因在于DataFrame构造函数: df = pd.DataFrame(data,col) DataFrame调用数据具有RangeIndex并且不与新的MultiIndex对齐,因此在数据中获取NaN. 如果每个dt始终具有相同的st值,则可能的解决方案是按列名称过滤Dataframe,然后转换为numpy数组,但更好的是index_col和set_index解决方案: df = pd.DataFrame(data[col].values,col) (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |