bash – 如何创建文件中每个单词的频率列表?
发布时间:2020-12-15 19:10:22 所属栏目:安全 来源:网络整理
导读:我有一个这样的文件: This is a file with many words.Some of the words appear more than once.Some of the words only appear one time. 我想生成一个两列列表。第一列显示出什么词,第二列显示出现的频率,例如: this@1is@1a@1file@1with@1many@1words
我有一个这样的文件:
This is a file with many words. Some of the words appear more than once. Some of the words only appear one time. 我想生成一个两列列表。第一列显示出什么词,第二列显示出现的频率,例如: this@1 is@1 a@1 file@1 with@1 many@1 words3 some@2 of@2 the@2 only@1 appear@2 more@1 than@1 one@1 once@1 time@1 >为了使此工作更简单,在处理列表之前,我将删除所有标点符号,并将所有文本更改为小写字母。 到目前为止,我有这个: sed -i "s/ /n/g" ./file1.txt # put all words on a new line while read line do count="$(grep -c $line file1.txt)" echo $line"@"$count >> file2.txt # add word and frequency to file done < ./file1.txt sort -u -d # remove duplicate lines 由于某些原因,这只是在每个单词之后显示“0”。 如何生成文件中显示的每个单词列表以及频率信息?
不是sed和grep,而是tr,sort,uniq和awk:
% (tr ' ' 'n' | sort | uniq -c | awk '{print $2"@"$1}') <<EOF This is a file with many words. Some of the words appear more than once. Some of the words only appear one time. EOF a@1 appear@2 file@1 is@1 many@1 more@1 of@2 once.@1 one@1 only@1 Some@2 than@1 the@2 This@1 time.@1 with@1 words@2 words.@1 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |