如何以更多的方式完成此操作

发布时间：2020-12-15 21:46:43 所属栏目：大数据来源：网络整理

导读：我是Perl的新手,为了我的一份作业,我提出了这样的解决方案： #wordcount.pl FILE # #if no filename is given,print help and exit if (length($ARGV[0]) 1) { print "Usage is : words.pl word filenamen"; exit; } my $file = $ARGV[0]; #filename given

我是Perl的新手,为了我的一份作业,我提出了这样的解决方案：

#wordcount.pl FILE 
    # 

    #if no filename is given,print help and exit 
    if (length($ARGV[0]) < 1) 
    { 
           print "Usage is : words.pl word filenamen"; 
           exit; 
    } 

   my $file = $ARGV[0];          #filename given in commandline 

   open(FILE,$file);            #open the mentioned filename 
   while(<FILE>)                 #continue reading until the file ends 
    { 
           chomp; 
           tr/A-Z/a-z/;          #convert all upper case words to lower case 
           tr/.,:;!?"(){}//d;            #remove some common punctuation symbols 
           #We are creating a hash with the word as the key.  
           #Each time a word is encountered,its hash is incremented by 1. 
           #If the count for a word is 1,it is a new distinct word. 
           #We keep track of the number of words parsed so far. 
           #We also keep track of the no. of words of a particular length.  

          foreach $wd (split) 
          { 
                $count{$wd}++; 
                if ($count{$wd} == 1) 
                 { 
                       $dcount++; 
                 } 
                $wcount++; 
                $lcount{length($wd)}++; 
          } 
   } 

   #To print the distinct words and their frequency,#we iterate over the hash containing the words and their count. 
   print "nThe words and their frequency in the text is:n"; 
   foreach $w (sort keys%count) 
   { 
         print "$w : $count{$w}n"; 
   } 

   #For the word length and frequency we use the word length hash 
   print "The word length and frequency in the given text is:n"; 
   foreach $w (sort keys%lcount) 
   { 
         print "$w : $lcount{$w}n"; 
   } 

   print "There are $wcount words in the file.n"; 
   print "There are $dcount distinct words in the file.n"; 

   $ttratio = ($dcount/$wcount)*100;       #Calculating the type-token ratio. 

   print "The type-token ratio of the file is $ttratio.n";

我在评论中提到了它的作用.实际上我必须从给定的文本文件中找到单词count.上述程序的输出如下：

The words and their frequency in the text is: 
1949 : 1
a : 1
adopt : 1
all : 2
among : 1
and : 8
assembly : 1
assuring : 1
belief : 1
citizens : 1
constituent : 1
constitute : 1
.
.
.
The word length and frequency in the given text is:
1 : 1
10 : 5
11 : 2
12 : 2
2 : 15
3 : 18
There are 85 words in the file. 
There are 61 distinct words in the file. 
The type-token ratio of the file is 71.7647058823529.

即使在谷歌的帮助下,我也能找到我作业的解决方案.但是我认为使用Perl的真正功能将会有一个更小巧简洁的代码.任何人都可以用更少的代码行给我一个Perl的解决方案吗？

解决方法

以下是一些建议：

>在Perl脚本中包含use strict和使用警告.
>您的参数验证不测试它应该测试的内容：(1)@ARGV中是否只有1个项目,以及(2)该项目是否是有效的文件名.
>虽然每条规则都有例外,但从<>分配回报通常是一种好习惯.到一个命名变量,而不是依赖于$_.如果循环中的代码可能需要使用也依赖于$_的Perl构造之一(例如,map,grep或post-fix for loops),则尤其如此.

while (my $line = <>){
    ...
}

> Perl为小写字符串提供内置函数(lc).
>您正在线读取循环内执行不必要的计算.如果您只是建立一个单词的计数,您将获得所需的所有信息.另请注意,Perl为其大多数控制结构(for,while,if等)提供单线形式,如下所示.

while (my $line = <>){
    ...
    $words{$_} ++ for split /s+/,$line;
}

>然后,您可以使用单词tallies来计算您需要的其他信息.例如,唯一字的数量简单地是散列中的键的数量,并且字的总数是散列值的总和.
>字长的分布可以这样计算：

my %lengths;
$lengths{length $_} += $words{$_} for keys %words;

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!