加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

Perl Huge XML Solution(1)Split Files and Multiple Threads

发布时间:2020-12-15 23:42:13 所属栏目:大数据 来源:网络整理
导读:Perl Huge XML Solution(1)Split Files and Multiple Threads 1. Upgrade the Perl sudo yum install cpan sudo cpan cpaninstall Bundle::CPAN cpanreload cpan cpanupgrade Not working with Error Message make NO isa perl Solution: sudo yum install p
Perl Huge XML Solution(1)Split Files and Multiple Threads 1. Upgrade the Perl >sudo yum install cpan >sudo cpan cpan>install Bundle::CPAN cpan>reload cpan cpan>upgrade Not working with Error Message make NO isa perl Solution: > sudo yum install perl-Config* Not working to upgrade the perl,but I can install the modules one by one cpan> install Time::Piece cpan> install Path::Class cpan> install autodie cpan> install Thread::Queue 2. Split The File split_hero.pl #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use Time::Piece; use Path::Class; use autodie; # die if problem reading or writing a file my $OutputSize = 0; my $OutputCount = 0; my $MaxSize = 100_000_000; my $HugeFileName = "data/728"; print localtime->strftime('%Y-%m-%d %X') . "n"; my $out; open(my $in,'<',$HugeFileName . '.xml') or die "input: $!n"; while(<$in>) { ??? if(!$out) { ??????? $OutputCount++; ??????? $OutputSize = 0; ??????? open($out,'>',$HugeFileName . "/output$OutputCount.xml") or die "output: $!n"; ??????? unless($OutputCount==1) { ??????????? print $out qq{<?xml version='1.0' encoding='UTF-8'?>n}; ??????????? print $out qq{<source>n}; ??????? } ??? } ??? print $out $_; ??? $OutputSize += length($_); ??? if(m|</job>|i) { #/ ??????? if($OutputSize > $MaxSize) { ??????????? print $out "</source>n"; ??????????? close($out); ??????????? $out = undef; ??????? } ??? } } close($in); my @files = glob($HugeFileName . "/*.xml"); my $dir = dir($HugeFileName); my $list_file = $dir->file("file_list"); my $list_file_handle = $list_file->open('>>'); foreach my $file (@files) { ?? $list_file_handle->print($file . "n"); ?? print "$filen"; } print localtime->strftime('%Y-%m-%d %X') . "n"; 3. Multiple Threads on Perl #!/usr/bin/perl use strict; use warnings; use threads; use Thread::Queue; my $nthreads = 5; my $process_q = Thread::Queue->new(); my $failed_q? = Thread::Queue->new(); #this is a subroutine,but that runs 'as a thread'. #when it starts,it inherits the program state 'as is'. E.g. #the variable declarations above all apply - but changes to #values within the program are 'thread local' unless the #variable is defined as 'shared'. #Behind the scenes - Thread::Queue are 'shared' arrays. sub worker { ??? #NB - this will sit a loop indefinitely,until you close the queue. ??? #using $process_q -> end ??? #we do this once we've queued all the things we want to process ??? #and the sub completes and exits neatly. ??? #however if you _don't_ end it,this will sit waiting forever. ??? while ( my $server = $process_q->dequeue() ) { ??????? chomp($server); ??????? print threads->self()->tid() . ": pinging $servern"; ??????? my $result = `/sbin/ping -c 1 $server`; ??????? if ($?) { $failed_q->enqueue($server) } ??????? print $result; ??? } } #insert tasks into thread queue. open( my $input_fh,"<","server_list" ) or die $!; print("what is the task list = " . $input_fh . "n"); $process_q->enqueue(<$input_fh>); close($input_fh); #we 'end' process_q? - when we do,no more items may be inserted,#and 'dequeue' returns 'undefined' when the queue is emptied. #this means our worker threads (in their 'while' loop) will then exit. $process_q->end(); #start some threads for ( 1 .. $nthreads ) { ??? threads->create( &;worker ); } #Wait for threads to all finish processing. foreach my $thr ( threads->list() ) { ??? $thr->join(); } #collate results. ('synchronise' operation) while ( my $server = $failed_q->dequeue_nb() ) { ??? print "$server failed to pingn"; } I change that a little bit to call PHP my $result = `php src/import.php 728 $server`; 4. Test Result split Huge XML(4.5G)? on 2 cores CPU 4G memory Machine in 00:02:05 04:17:24 04:19:29 send to Redis/SQS on 2 cores CPU 4G memory Machine in 00:03:12 04:23:46 04:26:58 References: http://sillycat.iteye.com/blog/1017590? file handler http://sillycat.iteye.com/blog/2193773 Perl 1,2,3,4,6 http://sillycat.iteye.com/blog/1012882 http://sillycat.iteye.com/blog/1012923 http://sillycat.iteye.com/blog/1012940 http://sillycat.iteye.com/blog/1016428 http://sillycat.iteye.com/blog/1017632 string http://sillycat.iteye.com/blog/1021197 web http://sillycat.iteye.com/blog/1027282 queue client http://sillycat.iteye.com/blog/1073593 browser info Split XML File http://stackoverflow.com/questions/11313852/split-one-file-into-multiple-files-based-on-delimiter http://stackoverflow.com/questions/15503980/split-file-by-xml-tag http://www.experts-exchange.com/Programming/Languages/Scripting/Perl/Q_24760607.html https://metacpan.org/pod/XML::Twig#xml_split---cut-a-big-XML-file-into-smaller-chunks http://code.izzid.com/2008/01/21/How-to-move-back-a-line-with-reading-a-perl-filehandle.html Perl threads http://stackoverflow.com/questions/26296206/perl-daemonize-with-child-daemons/26297240#26297240 http://stackoverflow.com/questions/6556976/how-to-use-perl-to-run-the-same-php-script-parallel Perl Zip the File http://perldoc.perl.org/IO/Compress/Zip.html

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读