Perl在不同情况下找到有效的线对
我有每个GET / POST以制表符分隔形式的HTTP标头请求和回复数据,并以不同的行回复.此数据使得一个TCP流有多个GET,POST和REPLY.我需要在这些情况下只选择第一个有效的GET – REPLY对.一个例子(简化)是:
ID Source Dest Bytes Type Content-Length host lines.... 1 A B 10 GET NA yahoo.com 2 1 A B 10 REPLY 10 NA 2 2 C D 40 GET NA google.com 4 2 C D 40 REPLY 20 NA 4 2 C D 40 GET NA google.com 4 2 C D 40 REPLY 30 NA 4 3 A B 250 POST NA mail.yahoo.com 5 3 A B 250 REPLY NA NA 5 3 A B 250 REPLY 15 NA 5 3 A B 250 GET NA yimg.com 5 3 A B 250 REPLY 35 NA 5 4 G H 415 REPLY 10 NA 6 4 G H 415 POST NA facebook.com 6 4 G H 415 REPLY NA NA 6 4 G H 415 REPLY NA NA 6 4 G H 415 GET NA photos.facebook.com 6 4 G H 415 REPLY 50 NA 6 .... 所以,基本上我需要为每个ID获取一个请求 – 回复对并将它们写入新文件. 对于’1′,它只是一对,所以很容易.但也存在错误情况,两行都是GET,POST或REPLY.所以,这种情况被忽略了. 对于’2′,我会选择第一个GET – REPLY对. 对于’3′,我会选择第一个GET但是第二个REPLY,因为Content-Length在第一个中没有(使子请求REPLY成为更好的候选者). 对于’4′,我会选择第一个POST(或GET),因为第一个头不能是REPLY.即使在POST之后缺少内容长度,我也不会在第二次GET之后选择REPLY,因为REPLY在此之后出现.所以我会选择第一个REPLY. 因此,在选择最佳请求和回复对之后,我需要将它们配对在一行中.例如,输出将是: ID Source Dest Bytes Type Content-Length host .... 1 A B 10 GET 10 yahoo.com 2 C D 40 GET 20 google.com 3 A B 250 POST 15 mail.yahoo.com 4 G H 415 POST NA facebook.com 实际数据中有很多其他标题,但这个例子几乎显示了我需要的内容.如何在Perl中做到这一点?我几乎陷入困境,所以我一次只能读取一行文件. open F,"<","file.txt" || die "Cannot open $f: $!"; while (<F>) { chomp; my @line = split /t/; # get the valid pairs for cases with multiple request - replies # get the paired up data together } close (F); *编辑:我添加了一个额外的列,给出了每个ID的HTTP标题行数.这可能有助于了解要检查的后续行数.此外,我修改了ID’4′,以便第一个标题行是REPLY. * 解决方法
下面的程序做我认为你需要的.
它被评论,我认为它是相当清晰的.请问是否有任何不清楚的地方. use strict; use warnings; use List::Util 'max'; my $file = $ARGV[0] // 'file.txt'; open my $fh,'<',$file or die qq(Unable to open "$file" for reading: $!); # Read the field names from the first line to index the hashes # Remember where the data in the file starts so we can get back here # my @fields = split ' ',<$fh>; my $start = tell $fh; # Build a format to print the accumulated data # Create a hash that relates column headers to their widths # my @headers = qw/ ID Source Dest Bytes Type Content-Length host /; my %len = map { $_ => length } @headers; # Read through the file to find the maximum data width for each column # while (<$fh>) { my %data; @data{@fields} = split; next unless $data{ID} =~ /^d/; $len{$_} = max($len{$_},length $data{$_}) for @headers; } # Build a format string using the values calculated # my $format = join ' ',map sprintf('%%%ds',$_),@len{@headers}; $format .= "n"; # Go back to the start of the data # Print the column headers # seek $fh,$start,0; printf $format,@headers; # Build transaction data hashes into $record and print them # Ignore any events before the first request # Ignore the second request and anything after it # Update the stored Content-Length field if a value other than NA appears # my $record; my $nreq = 0; while (<$fh>) { my %data; @data{@fields} = split; my ($id,$type) = @data{ qw/ ID Type / }; next unless $id =~ /^d/; if ($record and $id ne $record->{ID}) { printf $format,@{$record}{@headers}; undef $record; $nreq = 0; } if ($type eq 'GET' or $type eq 'POST') { $record = %data if $nreq == 0; $nreq++; } elsif ($nreq == 1) { if ($record->{'Content-Length'} eq 'NA' and $data{'Content-Length'} ne 'NA') { $record->{'Content-Length'} = $data{'Content-Length'}; } } } printf $format,@{$record}{@headers} if $record; 产量 根据问题中给出的数据,该程序产生 ID Source Dest Bytes Type Content-Length host 1 A B 10 GET 10 yahoo.com 2 C D 40 GET 20 google.com 3 A B 250 POST 15 mail.yahoo.com 4 G H 415 POST NA facebook.com (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |