python – 在Scikit-learn分类器中查找最常用的术语

发布时间：2020-12-20 12:30:11 所属栏目：Python 来源：网络整理

导读：参见英文答案 List the words in a vocabulary according to occurrence in a text corpus,Scikit-Learn????????????????????????????????????2个我正在关注 example in Scikit learn docs,其中CountVectorizer用于某些数据集. 问题：count_vect.vocabulary

参见英文答案 > List the words in a vocabulary according to occurrence in a text corpus,Scikit-Learn????????????????????????????????????2个
我正在关注 example in Scikit learn docs,其中CountVectorizer用于某些数据集.

问题：count_vect.vocabulary_.viewitems()列出了所有术语及其频率.你如何根据出现次数对它们进行排序？

sorted(count_vect.vocabulary_.viewitems())似乎不起作用.

解决方法

vocabulary_.viewitems()实际上并不列出术语及其频率,而是列出从术语到索引的映射. fit_transform方法返回频率(每个文档),返回稀疏(coo)矩阵,其中行是文档,列是单词(列索引通过词汇表映射到单词).例如,您可以获得总频率

matrix = count_vect.fit_transform(doc_list)
freqs = zip(count_vect.get_feature_names(),matrix.sum(axis=0))    
# sort from largest to smallest
print sorted(freqs,key=lambda x: -x[1])

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!