submit assembly to NCBI
|
二、submit assembly to NCBI 1、prepare data 首先要具有fasta格式数据(NO .gz),这是处理的基础,具体格式如下: >Scaffold633 TCATTTCTCCACTCTCGATGAACAAATCTGGAGGGATTTTTTTTCATTCC ACTCAATAGGTTGTCTATAAAGGTGTGATTCGTGGAACTTCTTCACACAG CAGCTAGTCTATATAATACAGAAGATCG >Scaffold553 AAAAAATTTTTTTTTTAAACTATCATCTCATGGATCAGCAGCAATTCTGA GTGTAACGTCTTCATTAAATGCGTATATAAATTTGCATAAAGATATGCGA CCAATATTGAGCCTGGAAATATATGCGCAGAGTGCAAAATTGTGTTTTTT GATCGGTTAATTAAAGG >Scaffold641 GTTTCCCAGTAGGTCTCTCCCGCTACGGCGTCCGCACGAACGCGATCTGC CCTCGTGCCCGCACCGCCATGACGGCAGAAGCCTTCGGCGAGAACAACAC CGGCGTCGTCGGCCTCGATCCGCTTGCACCCGAGCGCGTCGCGACCCTGG TCAGCTACCTCGCATCCCCCGATTCCGACGAGATCAACGGACAGGTCTTC GTCGTCTACGGCAAGATGGTGGCGTTGATGGAAGCACCCAAGGTCGAGAA CCGTTTCGACGCAGCCGGATCCGCGTTCACCGTCGAAGAACTCGGTGGCC AGCTCTCGTCTTACTTCTCCGGCCGTGGGCCGTACGAGACCTACTGGGAA AC 2、处理数据 分为几步: (1)生成.greater,short.list和ZERO_BASE_COUNT文件 perl ?../ scaf_filter_2k.pl? Ascaris_suum.scaf.fa scaf_filter_2k.pl代码 #!/usr/bin/perl
use strict;
use warnings;
my $file=shift;
#my $cutoff=shift;
my $outfile="short.list";
my $outfile2="$file.greater";
my $outfile3="ZERO_BASE_COUNT";
open IN,"< $file" or die $!;
open OUT,"> $outfile" or die $!;
open OUT1,"> $outfile2" or die $!;
open OUT2,"> $outfile3" or die $!;
$/='>';<IN>;$/="n";
while(<IN>){
chomp;
my $id=$1 if(/^(S+)/);
$/='>';
my $seq=<IN>;
chomp($seq);
$/="n";
$seq=~s/s//g;
my $len=length($seq);
if ($len < 200){
print OUT "$idt$lenn";
next;
}
else{
my $a=$seq=~tr/aA/aA/;
my $t=$seq=~tr/tT/tT/;
my $c=$seq=~tr/cC/cC/;
my $g=$seq=~tr/gG/gG/;
if($a==0 || $c==0 || $g==0 || $t==0){
print OUT2 "$idt$lenn";
next;
}
print OUT1 ">$idn$seqn";
}
}
close IN;
close OUT;
close OUT1;
close OUT2;
(2)生成.Nchange文件。 perl ../ info_N_plus.pl? Ascaris_suum.scaf.fa.greater? > Ascaris_suum.scaf.fa.greater.Nchange info_N_plus.pl代码: #!/usr/bin/perl -w
use strict;
#use Getopt::Long;
sub usage{
print STDERR <<USAGE;
############################################
Version 1.1 by Wing-L 2011.07.15
usage: $0 <sequence.fa> <len> >STDOUT
############################################
USAGE
exit;
}
&usage if(@ARGV <1);
my ($fa,$len)=@ARGV;
$len||=10;
open IN,"<$fa" or die("$!n");
$/='>';<IN>;$/="n";
while(my $line=<IN>){
my @block;
$line=~/^S+/;
my $tag={1};
$/='>';
my $seq=<IN>;
chomp $seq;
$/="n";
$seq=~s/s//g;
my $chr_length=length $seq;
while($seq=~/[^N]N{1,9}[^N]/g){
substr ($seq,$-[0]+1,$len)=~s/S/N/g;
}
while($seq=~/N([^N]{1,49})N/g){
my $tlen=length $1;
substr ($seq,$tlen)=~s/S/N/g;
}
if($seq=~/^N?[^N]{0,49}N+/){
print STDERR "$tagt1t{1}[0]n";
substr($seq,{1}[0]-$-[0])='';
}
if($seq=~/N+[^N]{0,49}N{0,}$/){
print STDERR "$tagt$-[0]t$chr_lengthn";
substr($seq,$-[0],$chr_length-$-[0])='';
}
print ">$tagn$seqn";
}
close IN;
#open IN,"<" or die("$!n");
#while(my $line=<IN>){}
#foreach my $e (){}
#(split /s+/,$line)[0]
#open OUT,">" or die("$!n");
############### sub ###############
(3)生成分割文件 perl? ../get_scaftig.pl? Ascaris_suum.scaf.fa.greater.Nchange? > Ascaris_suum.scaf.fa.greater.Nchange.split get_scaftig.pl代码: #!/usr/bin/perl -w
#
#Author: Ruan Jue <ruanjue@genomics.org.cn>
#
use warnings;
use strict;
my $min_length = 0;
my $name = '';
my $seq = '';
while(<>){
if(/^>(S+)/){
&print_scafftig($name,$seq) if($seq);
$name = $1;
$seq = '';
} else {
chomp;
$seq .=
(编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |
