perl下关于文件读写，hash统计频数并排序的总结

发布时间：2020-12-16 00:10:34 所属栏目：大数据来源：网络整理

导读：很少用到perl，这次用了一把，特意记录一下关于文件读写和hash统计频数并排序的总结： 1.文件读写 perl下读写文件非常简单：，首先是读： #打开文件 open(FILE_NAME,$_)||die "can't open part-m file"; 一行一行读出来并处理： while (FILE_NAME) ??{ ???ch

很少用到perl，这次用了一把，特意记录一下关于文件读写和hash统计频数并排序的总结：

1.文件读写

perl下读写文件非常简单：，首先是读：

#打开文件

open(FILE_NAME,$_)||die "can't open part-m file";

一行一行读出来并处理：

while (<FILE_NAME>)
??{
???chomp;
???print $_,"n";

?? ###

?? 处理?

? ###

}

然后是写：

#创建写的文件：不存在就创建，存在就清空然后再写；如果将">"改成">>"就是追加的写

open(OUTFILE1_LIST,">$resultpath.SpecificResult.txt")||die "can't open file SpecificResult.txt!";

#一行行写文件

print OUTFILE1_LIST ("$lineTxtn");

#写完关闭

close(OUTFILE1_LIST);

2.哈希统计频数并排序

#初始化

my %hash=();

#统计每个词的个数

$hash{$lineTxt}++;

#按value值排序

my @keys = sort { $hash{$b} <=> $hash{$a} } keys %hash;?#@key里头存的是按哈希值的数值大小排序后的键

#按key值排序

my @keys = sort { $b <=> $a} keys %hash;?#@key里头存的是按哈希键的数值大小排序后的键

?foreach(@keys)
?{
??print OUTFILE2_LIST ("$_"."t"."$hash{$_}"."n");
?}

?下面是我写的用来解析现网数据并排序的源代码：

#!/usr/bin/perl -W
##################
# File:
# Author:
# License:

use strict;
use warnings;
use encoding 'gbk';?? # 系统默认编码为GBK
use open IN=>':encoding(utf8)';?? # 读入文件时认为数据按UTF-16编码，自动根据BOM头判断是LE还是BE
use Encode;
use File::Path;
use Tie::File;

#读取外部传入的待解析现网数据的存放目录路径
my $dirpath="";
if(@ARGV == 1)
{
?$dirpath = $ARGV[0];
}else
{
?print "< .pl >?? <待解析现网数据的存放目录路径>n";
?exit(0);
}
print "dir path: ${dirpath}n";
#$dirpath="E:video_network_data";
my @filearray=(); #存放每个part-m文件的绝对路径
my $filecount = 0; #存放part-m文件的个数
######? 遍历文件夹?? #####
sub parse_env {???
??? my $path = $_[0];
??? my $subpath;
??? my $handle;
??? if (-d $path) {#当前路径是否为一个目录
??????? if (opendir($handle,$path)) {
?????????? while ($subpath = readdir($handle)) {
???????????????? if (!($subpath =~ m/^.$/) and !($subpath =~ m/^(..)$/)) {
????????????????? my $p = $path."/$subpath";
???????????????????? if (-d $p) {
??????????????????????? parse_env($p);
???????????????? ? }
???????????????????? else {
???????????????????? ?if($p=~m/part-m/) {
???????????????? ???push(@filearray,$p);
???????????????? ???$filecount++;
???????????????????? ?}
???????????????????? }
???????????????? }???????????????
???????? ?}
??????? }
??????? closedir($handle);???????????
????? }
????? return? $filecount;
}

my %hash=();
my $filenum=parse_env $dirpath;
if($filenum > 0) #存在part-m文件
{
?print "There are $filenum part-m files!!","n";
?my $resultpath=$dirpath."parse-result"; ?mkdir($resultpath)? unless(-d $resultpath); #创建目录，准备存放解析结果 ?open(OUTFILE1_LIST,">$resultpath.SpecificResult.txt")||die "can't open file SpecificResult.txt!"; ?open(OUTFILE2_LIST,">$resultpath.FrequencyResult.txt")||die "can't open file FrequencyResult.txt!"; ?foreach(@filearray) ?{ ??#print $_,"n"; ??open(FILE_NAME,$_)||die "can't open part-m file"; ??while (<FILE_NAME>) ??{ ???my @strlist=split("t",$_); ???if(($#strlist+1)>=4) ???{ ????my $lineTxt=$strlist[3]; ????print OUTFILE1_LIST ("$lineTxtn"); ????$hash{$lineTxt}++; ???} ??? ??}? ?} ?my @keys = sort { $hash{$b} <=> $hash{$a} } keys %hash;? #sort the hash table??? ?foreach(@keys) ?{ ??print OUTFILE2_LIST ("$_"."t"."$hash{$_}"."n"); ?} ? ?close(OUTFILE1_LIST); ?close(OUTFILE2_LIST); ?print "-------FINISH!!!--------n"; }

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!