加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

哪些情况从Perl的学习中受益?

发布时间:2020-12-15 21:21:09 所属栏目:大数据 来源:网络整理
导读:我正在玩 study,一个Perl功能来检查一个字符串,使后续的正则表达式可能更快: while( ) { study; $count++ if /PATTERN/; $count++ if /OTHER/; $count++ if /PATTERN2/; } 关于哪些情况将从中受益匪浅?你可以从the docs中挑选出几件事情: 具有恒定字符
我正在玩 study,一个Perl功能来检查一个字符串,使后续的正则表达式可能更快:
while( <> ) {
    study;
    $count++ if /PATTERN/;
    $count++ if /OTHER/;
    $count++ if /PATTERN2/;
    }

关于哪些情况将从中受益匪浅?你可以从the docs中挑选出几件事情:

>具有恒定字符串的模式
>多种模式
>较短的目标字符串可能会更好(需要较少的学习时间)

我正在寻找具体的案例,我不仅可以表现出很大的优势,而且还可以稍微调整以弥补这一优势。 the docs年的警告之一是您应该对个别情况进行基准测试。我想找到一些边缘情况,一个字符串(或模式)的小差异在性能上有很大差异。

如果没有使用study,请不要回答。我宁愿拥有正确的答案,而不是快速猜测。这里没有紧迫性,这并不妨碍任何工作。

而且,作为一个奖励,我一直在使用比较两个NYTProf运行的基准测试工具,我宁愿使用通常的基准测试工具。如果我想出了一种自动化的方法,我也会分享。

解决方法

谷歌提出了这个 lovely test scenario:
#!/usr/bin/perl
# 
#  Exercise 7.8 
# 
# This is a more difficult exercise. The study function in Perl may speed up searches 
# for motifs in DNA or protein. Read the Perl documentation on this function. Its use 
# is simple: given some sequence data in a variable $sequence,type:
# 
# study $sequence;
# 
# before doing the searches. Do you think study will speed up searches in DNA or 
# protein,based on what you've read about it in the documentation?
# 
# For lots of extra credit! Now read the Perl documentation on the standard module 
# Benchmark. (Type perldoc Benchmark,or visit the Perl home page at http://www.
# perl.com.) See if your guess is right by writing a program that benchmarks motif 
# searches of DNA and of protein,with and without study.
#
# Answer to Exercise 7.8

use strict;
use warnings;

use Benchmark;

my $dna = join ('',qw(
agatggcggcgctgaggggtcttgggggctctaggccggccacctactgg
tttgcagcggagacgacgcatggggcctgcgcaataggagtacgctgcct
gggaggcgtgactagaagcggaagtagttgtgggcgcctttgcaaccgcc
tgggacgccgccgagtggtctgtgcaggttcgcgggtcgctggcgggggt
cgtgagggagtgcgccgggagcggagatatggagggagatggttcagacc
cagagcctccagatgccggggaggacagcaagtccgagaatggggagaat
gcgcccatctactgcatctgccgcaaaccggacatcaactgcttcatgat
cgggtgtgacaactgcaatgagtggttccatggggactgcatccggatca
ctgagaagatggccaaggccatccgggagtggtactgtcgggagtgcaga
gagaaagaccccaagctagagattcgctatcggcacaagaagtcacggga
gcgggatggcaatgagcgggacagcagtgagccccgggatgagggtggag
ggcgcaagaggcctgtccctgatccagacctgcagcgccgggcagggtca
gggacaggggttggggccatgcttgctcggggctctgcttcgccccacaa
atcctctccgcagcccttggtggccacacccagccagcatcaccagcagc
agcagcagcagatcaaacggtcagcccgcatgtgtggtgagtgtgaggca
tgtcggcgcactgaggactgtggtcactgtgatttctgtcgggacatgaa
gaagttcgggggccccaacaagatccggcagaagtgccggctgcgccagt
gccagctgcgggcccgggaatcgtacaagtacttcccttcctcgctctca
ccagtgacgccctcagagtccctgccaaggccccgccggccactgcccac
ccaacagcagccacagccatcacagaagttagggcgcatccgtgaagatg
agggggcagtggcgtcatcaacagtcaaggagcctcctgaggctacagcc
acacctgagccactctcagatgaggaccta
));

my $protein = join('',qw(
MNIDDKLEGLFLKCGGIDEMQSSRTMVVMGGVSGQSTVSGELQD
SVLQDRSMPHQEILAADEVLQESEMRQQDMISHDELMVHEETVKNDEEQMETHERLPQ
GLQYALNVPISVKQEITFTDVSEQLMRDKKQIR
));

my $count = 1000;

print "DNA pattern matches without 'study' function:n";
timethis($count,' for(my $i=1 ; $i < 10000; ++$i) {
        $dna =~ /aggtc/;
        $dna =~ /aatggccgt/;
        $dna =~ /gatcgatcagctagcat/;
        $dna =~ /gtatgaac/;
        $dna =~ /[ac][cg][gt][ta]/;
        $dna =~ /ccccccccc/;
    } '
);

print "nDNA pattern matches with 'study' function:n";
timethis($count,' study $dna;
    for(my $i=1 ; $i < 10000; ++$i) {
        $dna =~ /aggtc/;
        $dna =~ /aatggccgt/;
        $dna =~ /gatcgatcagctagcat/;
        $dna =~ /gtatgaac/;
        $dna =~ /[ac][cg][gt][ta]/;
        $dna =~ /ccccccccc/;
    } '
);

print "nProtein pattern matches without 'study' function:n";
timethis($count,' for(my $i=1 ; $i < 10000; ++$i) {
        $protein =~ /PH.EI/;
        $protein =~ /KFTEQGESMRLY/;
        $protein =~ /[YAL][NVP][ISV][KQE]/;
        $protein =~ /DKKQIR/;
        $protein =~ /[MD][VT][HQ][ER]/;
        $protein =~ /NVPISVKQEITFTDVSEQL/;
    } '
);

print "nProtein pattern matches with 'study' function:n";
timethis($count,' study $protein;
    for(my $i=1 ; $i < 10000; ++$i) {
        $protein =~ /PH.EI/;
        $protein =~ /KFTEQGESMRLY/;
        $protein =~ /[YAL][NVP][ISV][KQE]/;
        $protein =~ /DKKQIR/;
        $protein =~ /[MD][VT][HQ][ER]/;
        $protein =~ /NVPISVKQEITFTDVSEQL/;
    } '
);

请注意,对于最有利可图的情况(蛋白质匹配),报告的收益仅为约2%:

#  $ perl exer07.08
# On my computer,this is the output I get: your results probably vary.

#  DNA pattern matches without 'study' function:
#  timethis 1000: 29 wallclock secs (29.25 usr +  0.00 sys = 29.25 CPU) @ 34.19/s (n=1000)
#  
#  DNA pattern matches with 'study' function:
#  timethis 1000: 30 wallclock secs (29.21 usr +  0.15 sys = 29.36 CPU) @ 34.06/s (n=1000)
#  
#  Protein pattern matches without 'study' function:
#  timethis 1000: 32 wallclock secs (29.47 usr +  0.04 sys = 29.51 CPU) @ 33.89/s (n=1000)
#  
#  Protein pattern matches with 'study' function:
#  timethis 1000: 30 wallclock secs (28.97 usr +  0.02 sys = 28.99 CPU) @ 34.49/s (n=1000)
#

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读