如何从Perl中的DNA序列中提取起始和结束密码子?
发布时间:2020-12-16 06:21:51 所属栏目:大数据 来源:网络整理
导读:我有一个代码,试图确定给定DNA序列的起始和结束密码子的位置. 我们将起始密码子定义为ATG序列,并将末端密码子定义为TGA,TAA,TAG序列. 我遇到的问题是下面的代码只适用于前两个序列(DM208659和AF038953),但不适用于其余的序列. 我的方法下面有什么问题? 此代
我有一个代码,试图确定给定DNA序列的起始和结束密码子的位置.
我们将起始密码子定义为ATG序列,并将末端密码子定义为TGA,TAA,TAG序列. 我遇到的问题是下面的代码只适用于前两个序列(DM208659和AF038953),但不适用于其余的序列. 我的方法下面有什么问题? 此代码可以从here复制粘贴. #!/usr/bin/perl -w while (<DATA>) { chomp; print "$_n"; my ($id,$rna_sq) = split(/s+/,$_); local $_ = $rna_sq; while (/atg/g) { my $start = pos() - 2; if (/tga|taa|tag/g) { my $stop = pos(); my $gene = substr( $_,$start - 1,$stop - $start + 1 ),$/; my $genelen = length($gene); my $ct = "$id $start $stop $gene $genelen"; print "t$ctn"; } } } __DATA__ DM208659 gtgggcctcaaatgtggagcactattctgatgtccaagtggaaagtgctgcgacatttgagcgtcac AF038953 gatcccagacctcggcttgcagtagtgttagactgaagataaagtaagtgctgtttgggctaacaggatctcctcttgcagtctgcagcccaggacgctgattccagcagcgccttaccgcgcagcccgaagattcactatggtgaaaatcgccttcaatacccctaccgccgtgcaaaaggaggaggcgcggcaagacgtggaggccctcctgagccgcacggtcagaactcagatactgaccggcaaggagctccgagttgccacccaggaaaaagagggctcctctgggagatgtatgcttactctcttaggcctttcattcatcttggcaggacttattgttggtggagcctgcatttacaagtacttcatgcccaagagcaccatttaccgtggagagatgtgcttttttgattctgaggatcctgcaaattcccttcgtggaggagagcctaacttcctgcctgtgactgaggaggctgacattcgtgaggatgacaacattgcaatcattgatgtgcctgtccccagtttctctgatagtgaccctgcagcaattattcatgactttgaaaagggaatgactgcttacctggacttgttgctggggaactgctatctgatgcccctcaatacttctattgttatgcctccaaaaaatctggtagagctctttggcaaactggcgagtggcagatatctgcctcaaacttatgtggttcgagaagacctagttgctgtggaggaaattcgtgatgttagtaaccttggcatctttatttaccaactttgcaataacagaaagtccttccgccttcgtcgcagagacctcttgctgggtttcaacaaacgtgccattgataaatgctggaagattagacacttccccaacgaatttattgttgagaccaagatctgtcaagagtaagaggcaacagatagagtgtccttggtaataagaagtcagagatttacaatatgactttaacattaaggtttatgggatactcaagatatttactcatgcatttactctattgcttatgccgtaaaaaaaaaaaaaaaaaaaaaaaaaaaaa BC021011 ggggagtccggggcggcgcctggaggcggagccgcccgctgggctaaatggggcagaggccgggaggggtgggggttccccgcgccgcagccatggagcagcttcgcgccgccgcccgtctgcagattgttctg DM208660 gggatactcaaaatgggggcgctttcctttttgtctgtactgggaagtgcttcgattttggggtgtccc AF038954 ggacccaagggggccttcgaggtgccttaggccgcttgccttgctctcagaatcgctgccgccatggctagtcagtctcaggggattcagcagctgctgcaggccgagaagcgggcagccgagaaggtgtccgaggcccgcaaaagaaagaaccggaggctgaagcaggccaaagaagaagctcaggctgaaattgaacagtaccgcctgcagagggagaaagaattcaaggccaaggaagctgcggcattgggatcccgtggcagttgcagcactgaagtggagaaggagacccaggagaagatgaccatcctccagacatacttccggcagaacagggatgaagtcttggacaacctcttggcttttgtctgtgacattcggccagaaatccatgaaaactaccgcataaatggatagaagagagaagcacctgtgctgtggagtggcattttagatgccctcacgaatatggaagcttagcacagctctagttacattcttaggagatggccattaaattatttccatatattataagagaggtccttccactttttggagagtagccaatctagctttttggtaacagacttagaaattagcaaagatgtccagctttttaccacagattcctgagggattttagatgggtaaatagagtcagactttgaccaggttttgggcaaagcacatgtatatcagtgtggacttttcctttcttagatctagtttaaaaaaaaaaaccccttaccattctttgaagaaaggaggggattaaataattttttcccctaacactttcttgaaggtcaggggctttatctatgaaaagttagtaaatagttctttgtaacctgtgtgaagcagcagccagccttaaagtagtccattcttgctaatggttagaacagtgaatactagtggaattgtttgggctgcttttagtttctcttaatcaaaattactagatgatagaattcaagaacttgttacatgtattacttggtgtatcgataatcatttaaaagtaaagactctgtcatgcaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 解决方法
我删除了$_的使用(当你本地化时我特别颤抖 – 你这样做是正确的,但为什么要强迫自己担心如果其他一些函数要破坏$_,而不是使用已经可用的$rna_sq?
另外我修正了$start和$stop为基于0的索引到字符串中(这使得数学的其余部分更加直接),并且提前计算了$genelen,因此可以直接在substr操作中使用. (或者,您可以本地化$[1以使用基于1的数组索引,请参阅perldoc perlvar.) use strict; use warnings; while (my $line = <DATA>) { chomp $line; print "processing $linen"; my ($id,$line); while ($rna_sq =~ /atg/g) { # $start and $stop are 0-based indexes my $start = pos($rna_sq) - 3; # back up to include the start sequence # discard remnant if no stop sequence can be found last unless $rna_sq =~ /tga|taa|tag/g; my $stop = pos($rna_sq); my $genelen = $stop - $start; my $gene = substr($rna_sq,$start,$genelen); print "t" . join(' ',$id,$start+1,$stop,$gene,$genelen) . "n"; } } (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |