Python统计纯文本文件中英文单词出现个数的方法总结【测试可用】
|
本篇章节讲解Python统计纯文本文件中英文单词出现个数的方法。分享给大家供大家参考,具体如下: 第一版: 效率低
# -*- coding:utf-8 -*-
#!python3
path = 'test.txt'
with open(path,encoding='utf-8',newline='') as f:
word = []
words_dict= {}
for letter in f.read():
if letter.isalnum():
word.append(letter)
elif letter.isspace(): #空白字符 空格 t n
if word:
word = ''.join(word).lower() #转小写
if word not in words_dict:
words_dict[word] = 1
else:
words_dict[word] += 1
word = []
#处理最后一个单词
if word:
word = ''.join(word).lower() # 转小写
if word not in words_dict:
words_dict[word] = 1
else:
words_dict[word] += 1
word = []
for k,v in words_dict.items():
print(k,v)
运行结果:
第二版: 缺点:遇到大文件要一次读入内存,性能不好
# -*- coding:utf-8 -*-
#!python3
import re
path = 'test.txt'
with open(path,'r',encoding='utf-8') as f:
data = f.read()
word_reg = re.compile(r'w+')
#word_reg = re.compile(r'w+b')
word_list = word_reg.findall(data)
word_list = [word.lower() for word in word_list] #转小写
word_set = set(word_list) #避免重复查询
# words_dict = {}
# for word in word_set:
# words_dict[word] = word_list.count(word)
# 简洁写法
words_dict = {word: word_list.count(word) for word in word_set}
for k,v in words_dict.items():
print(k,v)
运行结果:
第三版:
# -*- coding:utf-8 -*-
#!python3
import re
path = 'test.txt'
with open(path,encoding='utf-8') as f:
word_list = []
word_reg = re.compile(r'w+')
for line in f:
#line_words = word_reg.findall(line)
#比上面的正则更加简单
line_words = line.split()
word_list.extend(line_words)
word_set = set(word_list) # 避免重复查询
words_dict = {word: word_list.count(word) for word in word_set}
for k,v)
运行结果:
第四版:使用
# -*- coding:utf-8 -*-
#!python3
import collections
import re
path = 'test.txt'
with open(path,encoding='utf-8') as f:
word_list = []
word_reg = re.compile(r'w+')
for line in f:
line_words = line.split()
word_list.extend(line_words)
words_dict = dict(collections.Counter(word_list)) #使用Counter统计
for k,v)
运行结果:
注:这里使用的测试文本test.txt如下:
PS:这里再为大家推荐2款相关统计工具供大家参考: 在线字数统计工具: 在线字符统计与编辑工具: 更多关于Python相关内容感兴趣的读者可查看本站专题:《Python文件与目录操作技巧汇总》、《Python文本文件操作技巧汇总》、《Python数据结构与算法教程》、《Python函数使用技巧总结》、《Python字符串操作技巧汇总》及《Python入门与进阶经典教程》 希望本文所述对大家Python程序设计有所帮助。 您可能感兴趣的文章:
(编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
