python – AttributeError:’numpy.ndarray’对象没有属性’toa
发布时间:2020-12-20 13:38:38 所属栏目:Python 来源:网络整理
导读:我正在从文本语料库中提取特征,我正在使用td-fidf矢量化器并从scikit-learn中截断奇异值分解以实现这一点.但是,由于我想要尝试的算法需要密集矩阵并且向量化器返回稀疏矩阵,我需要将这些矩阵转换为密集数组.但是,每当我尝试转换这些数组时,我都会收到错误,告
我正在从文本语料库中提取特征,我正在使用td-fidf矢量化器并从scikit-learn中截断奇异值分解以实现这一点.但是,由于我想要尝试的算法需要密集矩阵并且向量化器返回稀疏矩阵,我需要将这些矩阵转换为密集数组.但是,每当我尝试转换这些数组时,我都会收到错误,告诉我我的numpy数组对象没有属性“toarray”.我究竟做错了什么?
功能: def feature_extraction(train,train_test,test_set): vectorizer = TfidfVectorizer(min_df = 3,strip_accents = "unicode",analyzer = "word",token_pattern = r'w{1,}',ngram_range = (1,2)) print("fitting Vectorizer") vectorizer.fit(train) print("transforming text") train = vectorizer.transform(train) train_test = vectorizer.transform(train_test) test_set = vectorizer.transform(test_set) print("Dimensionality reduction") svd = TruncatedSVD(n_components = 100) svd.fit(train) train = svd.transform(train) train_test = svd.transform(train_test) test_set = svd.transform(test_set) print("convert to dense array") train = train.toarray() test_set = test_set.toarray() train_test = train_test.toarray() print(train.shape) return train,test_set 追溯: Traceback (most recent call last): File "C:UsersAnonymousworkspacefinal_submissionsrclinearSVM.py",line 24,in <module> x_train,x_test,test_set = feature_extraction(x_train,test_set) File "C:UsersAnonymousworkspacefinal_submissionsrcPreprocessing.py",line 57,in feature_extraction train = train.toarray() AttributeError: 'numpy.ndarray' object has no attribute 'toarray' 更新: Traceback (most recent call last): File "C:UsersAnonymousworkspacefinal_submissionsrclinearSVM.py",line 28,in <module> result = bayesian_ridge(x_train,y_train,y_test,test_set) File "C:UsersAnonymousworkspacefinal_submissionsrcAlgorithms.py",line 84,in bayesian_ridge algo = algo.fit(x_train,y_train[:,i]) File "C:Python27libsite-packagessklearnlinear_modelbayes.py",line 136,in fit dtype=np.float) File "C:Python27libsite-packagessklearnutilsvalidation.py",line 220,in check_arrays raise TypeError('A sparse matrix was passed,but dense ' TypeError: A sparse matrix was passed,but dense data is required. Use X.toarray() to convert to a dense numpy array. 有人可以解释一下吗? UPDATE2 根据要求,我将提供所涉及的所有代码.由于它分散在不同的文件上,我只是将其分步发布.为清楚起见,我将保留所有模块导入. 这是我预处理代码的方式: def regexp(data): for row in range(len(data)): data[row] = re.sub(r'[W_]+'," ",data[row]) return data def clean_the_text(data): alist = [] data = nltk.word_tokenize(data) for j in data: j = j.lower() alist.append(j.rstrip('n')) alist = " ".join(alist) return alist def loop_data(data): for i in range(len(data)): data[i] = clean_the_text(data[i]) return data if __name__ == "__main__": print("loading train") train_text = porter_stemmer(loop_data(regexp(list(np.array(p.read_csv(os.path.join(dir,"train.csv")))[:,1])))) print("loading test_set") test_set = porter_stemmer(loop_data(regexp(list(np.array(p.read_csv(os.path.join(dir,"test.csv")))[:,1])))) 在将train_set拆分为x_train和x_test进行交叉验证后,我使用上面的feature_extraction函数转换数据. x_train,test_set) 最后,我将它们输入我的算法 def bayesian_ridge(x_train,test_set): algo = linear_model.BayesianRidge() algo = algo.fit(x_train,y_train) pred = algo.predict(x_test) error = pred - y_test result.append(algo.predict(test_set)) print("Bayes_error: ",cross_val(error)) return result 解决方法
TruncatedSVD.transform返回一个数组,而不是稀疏矩阵.事实上,在scikit-learn的当前版本中,只有矢量化器返回稀疏矩阵.
(编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |