加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文


发布时间:2020-12-15 21:46:43 所属栏目:大数据 来源:网络整理
导读:我是Perl的新手,为了我的一份作业,我提出了这样的解决方案: #wordcount.pl FILE # #if no filename is given,print help and exit if (length($ARGV[0]) 1) { print "Usage is : words.pl word filenamen"; exit; } my $file = $ARGV[0]; #filename given
#wordcount.pl FILE 

    #if no filename is given,print help and exit 
    if (length($ARGV[0]) < 1) 
           print "Usage is : words.pl word filenamen"; 

   my $file = $ARGV[0];          #filename given in commandline 

   open(FILE,$file);            #open the mentioned filename 
   while(<FILE>)                 #continue reading until the file ends 
           tr/A-Z/a-z/;          #convert all upper case words to lower case 
           tr/.,:;!?"(){}//d;            #remove some common punctuation symbols 
           #We are creating a hash with the word as the key.  
           #Each time a word is encountered,its hash is incremented by 1. 
           #If the count for a word is 1,it is a new distinct word. 
           #We keep track of the number of words parsed so far. 
           #We also keep track of the no. of words of a particular length.  

          foreach $wd (split) 
                if ($count{$wd} == 1) 

   #To print the distinct words and their frequency,#we iterate over the hash containing the words and their count. 
   print "nThe words and their frequency in the text is:n"; 
   foreach $w (sort keys%count) 
         print "$w : $count{$w}n"; 

   #For the word length and frequency we use the word length hash 
   print "The word length and frequency in the given text is:n"; 
   foreach $w (sort keys%lcount) 
         print "$w : $lcount{$w}n"; 

   print "There are $wcount words in the file.n"; 
   print "There are $dcount distinct words in the file.n"; 

   $ttratio = ($dcount/$wcount)*100;       #Calculating the type-token ratio. 

   print "The type-token ratio of the file is $ttratio.n";


The words and their frequency in the text is: 
1949 : 1
a : 1
adopt : 1
all : 2
among : 1
and : 8
assembly : 1
assuring : 1
belief : 1
citizens : 1
constituent : 1
constitute : 1
The word length and frequency in the given text is:
1 : 1
10 : 5
11 : 2
12 : 2
2 : 15
3 : 18
There are 85 words in the file. 
There are 61 distinct words in the file. 
The type-token ratio of the file is 71.7647058823529.




>在Perl脚本中包含use strict和使用警告.
>虽然每条规则都有例外,但从<>分配回报通常是一种好习惯.到一个命名变量,而不是依赖于$_.如果循环中的代码可能需要使用也依赖于$_的Perl构造之一(例如,map,grep或post-fix for loops),则尤其如此.

while (my $line = <>){

> Perl为小写字符串提供内置函数(lc).

while (my $line = <>){
    $words{$_} ++ for split /s+/,$line;


my %lengths;
$lengths{length $_} += $words{$_} for keys %words;


