加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 编程开发 > Python > 正文

在Python 3中找到网站上最常见的单词

发布时间:2020-12-20 12:24:52 所属栏目:Python 来源:网络整理
导读:我需要使用 Python 3代码找到并复制在给定网站上出现5次以上的单词,我不知道该怎么做.我已经查看了堆栈溢出的存档,但其他解决方案依赖于python 2代码.这是我到目前为止的可靠代码: from urllib.request import urlopen website = urllib.urlopen("http://en
我需要使用 Python 3代码找到并复制在给定网站上出现5次以上的单词,我不知道该怎么做.我已经查看了堆栈溢出的存档,但其他解决方案依赖于python 2代码.这是我到目前为止的可靠代码:

from urllib.request import urlopen
   website = urllib.urlopen("http://en.wikipedia.org/wiki/Wolfgang_Amadeus_Mozart")

有没有人对如何做有任何建议?我安装了NLTK,我看了很漂亮的汤,但对于我的生活,我不知道如何正确安装它(我非常Python绿色)!在我学习的过程中,任何解释也会非常感激.谢谢 :)

解决方法

这并不完美,但想知道如何开始使用 requests,BeautifulSoup和 collections.Counter

import requests
from bs4 import BeautifulSoup
from collections import Counter
from string import punctuation

r = requests.get("http://en.wikipedia.org/wiki/Wolfgang_Amadeus_Mozart")

soup = BeautifulSoup(r.content)

text = (''.join(s.findAll(text=True))for s in soup.findAll('p'))

c = Counter((x.rstrip(punctuation).lower() for y in text for x in y.split()))
print (c.most_common()) # prints most common words staring at most common.

[('the',279),('and',192),('in',175),('of',168),('his',140),('a',124),('to',103),('mozart',82),('was',77),('he',70),('with',53),('as',50),('for',40),("mozart's",39),('on',35),('from',34),('at',31),('by',('that',26),('is',23),('k.',21),('an',20),('had',('were',('but',19),('which',.............

print ([x for x in c if c.get(x) > 5]) # words appearing more than 5 times

['there','but','both','wife','for','musical','salzburg','it','more','first','this','symphony','wrote','one','during','mozart','vienna','joseph','in','later','salzburg,','other','such','last','needed]','only','their','including','by','music,'at',"mozart's",'mannheim,'composer','and','are','became','four','premiered','time','did','the','not','often','is','have','began','some','success','court','that','performed','work','him','leopold','these','while','been','new','most','were','father','opera','as','who','classical','k.','to','of','has','many','was','works','which','early','three','family','on','a','when','had','december','after','he','no.','year','from','great','period','music','with','his','composed','minor','two','number','1782','an','piano']

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读