submit assembly to NCBI
二、submit assembly to NCBI 1、prepare data 首先要具有fasta格式数据(NO .gz),这是处理的基础,具体格式如下: >Scaffold633 TCATTTCTCCACTCTCGATGAACAAATCTGGAGGGATTTTTTTTCATTCC ACTCAATAGGTTGTCTATAAAGGTGTGATTCGTGGAACTTCTTCACACAG CAGCTAGTCTATATAATACAGAAGATCG >Scaffold553 AAAAAATTTTTTTTTTAAACTATCATCTCATGGATCAGCAGCAATTCTGA GTGTAACGTCTTCATTAAATGCGTATATAAATTTGCATAAAGATATGCGA CCAATATTGAGCCTGGAAATATATGCGCAGAGTGCAAAATTGTGTTTTTT GATCGGTTAATTAAAGG >Scaffold641 GTTTCCCAGTAGGTCTCTCCCGCTACGGCGTCCGCACGAACGCGATCTGC CCTCGTGCCCGCACCGCCATGACGGCAGAAGCCTTCGGCGAGAACAACAC CGGCGTCGTCGGCCTCGATCCGCTTGCACCCGAGCGCGTCGCGACCCTGG TCAGCTACCTCGCATCCCCCGATTCCGACGAGATCAACGGACAGGTCTTC GTCGTCTACGGCAAGATGGTGGCGTTGATGGAAGCACCCAAGGTCGAGAA CCGTTTCGACGCAGCCGGATCCGCGTTCACCGTCGAAGAACTCGGTGGCC AGCTCTCGTCTTACTTCTCCGGCCGTGGGCCGTACGAGACCTACTGGGAA AC 2、处理数据 分为几步: (1)生成.greater,short.list和ZERO_BASE_COUNT文件 perl ?../ scaf_filter_2k.pl? Ascaris_suum.scaf.fa scaf_filter_2k.pl代码 #!/usr/bin/perl use strict; use warnings; my $file=shift; #my $cutoff=shift; my $outfile="short.list"; my $outfile2="$file.greater"; my $outfile3="ZERO_BASE_COUNT"; open IN,"< $file" or die $!; open OUT,"> $outfile" or die $!; open OUT1,"> $outfile2" or die $!; open OUT2,"> $outfile3" or die $!; $/='>';<IN>;$/="n"; while(<IN>){ chomp; my $id=$1 if(/^(S+)/); $/='>'; my $seq=<IN>; chomp($seq); $/="n"; $seq=~s/s//g; my $len=length($seq); if ($len < 200){ print OUT "$idt$lenn"; next; } else{ my $a=$seq=~tr/aA/aA/; my $t=$seq=~tr/tT/tT/; my $c=$seq=~tr/cC/cC/; my $g=$seq=~tr/gG/gG/; if($a==0 || $c==0 || $g==0 || $t==0){ print OUT2 "$idt$lenn"; next; } print OUT1 ">$idn$seqn"; } } close IN; close OUT; close OUT1; close OUT2; (2)生成.Nchange文件。 perl ../ info_N_plus.pl? Ascaris_suum.scaf.fa.greater? > Ascaris_suum.scaf.fa.greater.Nchange info_N_plus.pl代码: #!/usr/bin/perl -w use strict; #use Getopt::Long; sub usage{ print STDERR <<USAGE; ############################################ Version 1.1 by Wing-L 2011.07.15 usage: $0 <sequence.fa> <len> >STDOUT ############################################ USAGE exit; } &usage if(@ARGV <1); my ($fa,$len)=@ARGV; $len||=10; open IN,"<$fa" or die("$!n"); $/='>';<IN>;$/="n"; while(my $line=<IN>){ my @block; $line=~/^S+/; my $tag={1}; $/='>'; my $seq=<IN>; chomp $seq; $/="n"; $seq=~s/s//g; my $chr_length=length $seq; while($seq=~/[^N]N{1,9}[^N]/g){ substr ($seq,$-[0]+1,$len)=~s/S/N/g; } while($seq=~/N([^N]{1,49})N/g){ my $tlen=length $1; substr ($seq,$tlen)=~s/S/N/g; } if($seq=~/^N?[^N]{0,49}N+/){ print STDERR "$tagt1t{1}[0]n"; substr($seq,{1}[0]-$-[0])=''; } if($seq=~/N+[^N]{0,49}N{0,}$/){ print STDERR "$tagt$-[0]t$chr_lengthn"; substr($seq,$-[0],$chr_length-$-[0])=''; } print ">$tagn$seqn"; } close IN; #open IN,"<" or die("$!n"); #while(my $line=<IN>){} #foreach my $e (){} #(split /s+/,$line)[0] #open OUT,">" or die("$!n"); ############### sub ############### (3)生成分割文件 perl? ../get_scaftig.pl? Ascaris_suum.scaf.fa.greater.Nchange? > Ascaris_suum.scaf.fa.greater.Nchange.split get_scaftig.pl代码: #!/usr/bin/perl -w # #Author: Ruan Jue <ruanjue@genomics.org.cn> # use warnings; use strict; my $min_length = 0; my $name = ''; my $seq = ''; while(<>){ if(/^>(S+)/){ &print_scafftig($name,$seq) if($seq); $name = $1; $seq = ''; } else { chomp; $seq .={1} (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |