Python Sklearn – RandomForest和Missing值
发布时间:2020-12-16 21:28:39 所属栏目:Python 来源:网络整理
导读:我正在尝试在包含缺失值的数据集上执行RandomForest. 我的数据集如下: train_data = [['1' 'NaN' 'NaN' '0.0127034' '0.0435092'] ['1' 'NaN' 'NaN' '0.0113187' '0.228205'] ['1' '0.648' '0.248' '0.0142176' '0.202707'] ...,['1' '0.357' '0.470' '0.03
我正在尝试在包含缺失值的数据集上执行RandomForest.
我的数据集如下: train_data = [['1' 'NaN' 'NaN' '0.0127034' '0.0435092'] ['1' 'NaN' 'NaN' '0.0113187' '0.228205'] ['1' '0.648' '0.248' '0.0142176' '0.202707'] ...,['1' '0.357' '0.470' '0.0328121' '0.255039'] ['1' 'NaN' 'NaN' '0.00311825' '0.0381745'] ['1' 'NaN' 'NaN' '0.0332604' '0.2857']] 为了估算“NaN”值,我正在使用: from sklearn.preprocessing import Imputer imp=Imputer(missing_values='NaN',strategy='mean',axis=0) imp.fit(train_data[0::,1::]) new_train_data=imp.transform(train_data) 但是我收到以下错误: Traceback (most recent call last): File "./RandomForest.py",line 72,in <module> new_train_data=imp.transform(train_data) File "/home/aurore/.local/lib/python2.7/site-packages/sklearn/preprocessing /imputation.py",line 388,in transform values = np.repeat(valid_statistics,n_missing) File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py",line 343,in repeat return repeat(repeats,axis) ValueError: a.shape[axis] != len(repeats) 我做的: new_train_data = imp.fit_transform(train_data) 然后我收到这个错误: Traceback (most recent call last): File "./RandomForest.py",line 82,in <module> forest = forest.fit(train_data[0::,1::],train_data[0::,0]) File "/home/aurore/.local/lib/python2.7/site-packages/sklearn/ensemble/forest.py",line 224,in fit X,= check_arrays(X,dtype=DTYPE,sparse_format="dense") File "/home/aurore/.local/lib/python2.7/site-packages/sklearn/utils/validation.py",line 283,in check_arrays _assert_all_finite(array) File "/home/aurore/.local/lib/python2.7/site-packages/sklearn/utils/validation.py",line 43,in _assert_all_finite " or a value too large for %r." % X.dtype) ValueError: Input contains NaN,infinity or a value too large for dtype('float32'). 包裹有问题吗? 解决方法
您在列1 ::上训练imputer,但之后您尝试将其应用于所有列.这不起作用.做
new_train_data = imp.fit_transform(train_data) (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |