加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

多标签推文分类python nltk

发布时间:2020-12-20 13:09:10 所属栏目:Python 来源:网络整理
导读:我有一些300k的推文,每个推文都没有标签或最多四个标签.例如 :- 1.] "I really sci-fi documentaries and movies" ; ["science","movies"]2.] "The international politics scene is getting dirty"; ["politics"]3.] "I dont know what to say"; [null]4.]
我有一些300k的推文,每个推文都没有标签或最多四个标签.例如 :-

1.] "I really sci-fi documentaries and movies" ; ["science","movies"]
2.] "The international politics scene is getting dirty"; ["politics"]
3.] "I dont know what to say"; [null]
4.] "I dont have any interest in national political debates on tv,I'd rather watch science shows like cosmos or sports like soccer,baseball; ["sports","science","politics"]

现在我一直在使用NaiveBayes,并且在培训期间(而不是多标签)只为每条推文使用了一个标签: –

1.] "I really sci-fi documentaries and movies" ; ["science"]
    2.] "The international politics scene is getting dirty"; ["politics"]
    3.] "I dont know what to say"; [null]
    4.] "I dont have any interest in national political debates on tv,baseball; ["politics"]

但正如你所看到的,我想要“多标签”分类虽然我从Naive-Bayes开始,因为我可以找到一个非常棒的教程,我可以很容易地参考以便开始,但没有我在哪里可以找到一个python教程迎合我的实际“多标签”问题.我所能找到的只是关于算法的研究论文或建议(KNN,Multinomial NB等).有人可以帮帮我吗.

解决方法

你可以尝试这个,但这还是天真的.

#Initialize a weight matrix of size NxM; N is number of classes and M number of features.
#label is a set of label(s) associated with a tweet.
for tweet,label in tweets
    #you have to write a feature extraction function
    features = extractFeatures(tweet)
    #write a simple predict function that implements arg max over dot product
    predictions = perceptron.predict(features)
    for each prediction in predictions:
        #use simple additive procedure to move your decision boundary by -1,+1.
        if prediction not in label:
           subtract the weights associated with prediction
           add the weights for the correct class(s) in label

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读