基因数据处理1之mapping_to_cram
基因数据处理1之mapping_to_cram 参考资料: A Worked ExampleObtain some public dataWe will use the first 100,000 read-pairs from a yeast data set. curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head -100000 > y1.fastq curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d | head -100000 > y2.fastq curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz > yeast.fasta Prepare the BWA indicesWe need to ensure there exists a .fai fasta index and also indices for whichever aligner we are using (Bwa-mem in this example). samtools faidx yeast.fasta bwa index yeast.fasta Produce the alignmentsThe aligner is likely to output SAM in the same order or similar order to the input fastq files. It won’t be outputting in chromosome position order,so the output is typically not well suited to CRAM. bwa mem -R '@RGtID:footSM:bartLB:library1' yeast.fasta y1.fastq y2.fastq > yeast.sam The -R option adds a read-group line and applies that read-group to all aligned sequence records. It is not necessary,but a recommended practice. Sort into chromosome/positon orderIdeally at this point we would be outputting CRAM directly,but at present samtools 1.0 does not have a way to indicate the reference on the command line. We can output to BAM instead and convert (below),or modify the SAM @SQ header to include MD5 sums in the M5: field. samtools sort -O bam -T /tmp -l 0 -o yeast.bam yeast.sam The “-l 0” indicates to use no compression in the BAM file,as it is transitory and will be replaced by CRAM soon. We may wish to use -l 1 if disk space is short and we wish to reduce temporary file size. Convert to CRAM formatsamtools view -T yeast.fasta -C -o yeast.cram yeast.bam Note that since the BAM file did not have M5 tags for the reference sequences,they are computed by Samtools and added to the CRAM. In a production environment,this step can be avoided by ensuring that the M5 tags are already in the SAM/BAM header. The last 3 steps can be combined into a pipeline to reduce disk I/O: bwa mem yeast.fasta y1.fastq y2.fastq | samtools sort -O bam -l 0 -T /tmp - | samtools view -T yeast.fasta -C -o yeast.cram - Viewing in alignment and pileup formatSee the variant calling workflow for more advanced examples. samtools view yeast.cram samtools mpileup -f yeast.fasta yeast.cram 有问题: 改正:原下载没有夹gzip -d curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d > y1.fastq curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d > y2.fastq curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz|gzip -d > yeast.fasta 参考【1】中第三个没有加 |gzip -d 运行记录: hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head -100000 > y1.fastq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 537M 0 842k 0 0 5753 0 27:13:40 0:02:30 27:11:10 23013 curl: (23) Failed writing body (1496 != 4344) hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d | head -100000 > y2.fastq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 559M 0 932k 0 0 6937 0 23:28:57 0:02:17 23:26:40 193k curl: (23) Failed writing body (11536 != 16384) hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz > yeast.fasta % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3782k 100 3782k 0 0 18197 0 0:03:32 0:03:32 --:--:-- 47280 hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head -100000 > y1.fastq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 537M 0 848k 0 0 6525 0 24:00:23 0:02:13 23:58:10 188k curl: (23) Failed writing body (1000 != 1448) hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d | head -100000 > y2.fastq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 559M 0 937k 0 0 6989 0 23:18:28 0:02:17 23:16:11 196k curl: (23) Failed writing body (6872 != 16384) hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz|gzip -d > yeast.fasta % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3782k 100 3782k 0 0 26203 0 0:02:27 0:02:27 --:--:-- 254k hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ls y1.fastq y2.fastq yeast.fasta hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ cd ../ hadoop@Mcnode6:~/cloud/adam/xubo$ ls 1000genomes autoconf-archive bwa-0.7.12 htslib resources samtools yeast yeast201603101121 yeast201603101125 hadoop@Mcnode6:~/cloud/adam/xubo$ cp yeast yeast/ yeast201603101121/ yeast201603101125/ hadoop@Mcnode6:~/cloud/adam/xubo$ cp yeast201603101125/ yeast0 cp: omitting directory ‘yeast201603101125/’ hadoop@Mcnode6:~/cloud/adam/xubo$ cp -r yeast201603101125 yeast yeast/ yeast201603101121/ yeast201603101125/ hadoop@Mcnode6:~/cloud/adam/xubo$ cp -r yeast201603101125 yeast0 hadoop@Mcnode6:~/cloud/adam/xubo$ ls 1000genomes autoconf-archive bwa-0.7.12 htslib resources samtools yeast yeast0 yeast201603101121 yeast201603101125 hadoop@Mcnode6:~/cloud/adam/xubo$ cd yeast201603101125 hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ls y1.fastq y2.fastq yeast.fasta hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h total 18M drwxrwxr-x 2 hadoop hadoop 4.0K 3月 10 11:53 ./ drwxrwxr-x 12 hadoop hadoop 4.0K 3月 10 11:57 ../ -rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:27 y1.fastq -rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:29 y2.fastq -rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:55 yeast.fasta hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools faidx yeast.fasta hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ bwa index yeast.fasta [bwa_index] Pack FASTA... 0.11 sec [bwa_index] Construct BWT for the packed sequence... [bwa_index] 7.49 seconds elapse. [bwa_index] Update BWT... 0.11 sec [bwa_index] Pack forward-only FASTA... 0.07 sec [bwa_index] Construct SA from BWT and Occ... 2.94 sec [main] Version: 0.7.12-r1039 [main] CMD: bwa index yeast.fasta [main] Real time: 13.239 sec; CPU: 10.718 sec hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h total 38M drwxrwxr-x 2 hadoop hadoop 4.0K 3月 10 11:58 ./ drwxrwxr-x 12 hadoop hadoop 4.0K 3月 10 11:57 ../ -rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:27 y1.fastq -rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:29 y2.fastq -rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:55 yeast.fasta -rw-rw-r-- 1 hadoop hadoop 14 3月 10 11:58 yeast.fasta.amb -rw-rw-r-- 1 hadoop hadoop 1.4K 3月 10 11:58 yeast.fasta.ann -rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:58 yeast.fasta.bwt -rw-rw-r-- 1 hadoop hadoop 415 3月 10 11:58 yeast.fasta.fai -rw-rw-r-- 1 hadoop hadoop 2.9M 3月 10 11:58 yeast.fasta.pac -rw-rw-r-- 1 hadoop hadoop 5.8M 3月 10 11:58 yeast.fasta.sa hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ bwa mem -R '@RGtID:footSM:bartLB:library1' yeast.fasta y1.fastq y2.fastq > yeast.sam [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 50000 sequences (1800000 bp)... [M::mem_pestat] # candidate unique pairs for (FF,FR,RF,RR): (103,4007,13219,91) [M::mem_pestat] analyzing insert size distribution for orientation FF... [M::mem_pestat] (25,50,75) percentile: (240,2467,3114) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1,8862) [M::mem_pestat] mean and std.dev: (1893.02,1414.42) [M::mem_pestat] low and high boundaries for proper pairs: (1,11736) [M::mem_pestat] analyzing insert size distribution for orientation FR... [M::mem_pestat] (25,75) percentile: (243,275,314) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (101,456) [M::mem_pestat] mean and std.dev: (278.34,50.30) [M::mem_pestat] low and high boundaries for proper pairs: (30,527) [M::mem_pestat] analyzing insert size distribution for orientation RF... [M::mem_pestat] (25,75) percentile: (3111,3320,3565) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (2203,4473) [M::mem_pestat] mean and std.dev: (3352.46,321.09) [M::mem_pestat] low and high boundaries for proper pairs: (1749,4927) [M::mem_pestat] analyzing insert size distribution for orientation RR... [M::mem_pestat] (25,75) percentile: (291,2611,3188) [M::mem_pestat] low and high boundaries for computing mean and std.dev: (1,8982) [M::mem_pestat] mean and std.dev: (2036.04,1501.65) [M::mem_pestat] low and high boundaries for proper pairs: (1,11879) [M::mem_pestat] skip orientation FF [M::mem_pestat] skip orientation RR [M::mem_process_seqs] Processed 50000 reads in 7.764 CPU sec,8.119 real sec [main] Version: 0.7.12-r1039 [main] CMD: bwa mem -R @RGtID:footSM:bartLB:library1 yeast.fasta y1.fastq y2.fastq [main] Real time: 8.523 sec; CPU: 7.867 sec hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll total 47020 drwxrwxr-x 2 hadoop hadoop 4096 3月 10 11:59 ./ drwxrwxr-x 12 hadoop hadoop 4096 3月 10 11:57 ../ -rw-rw-r-- 1 hadoop hadoop 3051773 3月 10 11:27 y1.fastq -rw-rw-r-- 1 hadoop hadoop 3051773 3月 10 11:29 y2.fastq -rw-rw-r-- 1 hadoop hadoop 12360755 3月 10 11:55 yeast.fasta -rw-rw-r-- 1 hadoop hadoop 14 3月 10 11:58 yeast.fasta.amb -rw-rw-r-- 1 hadoop hadoop 1341 3月 10 11:58 yeast.fasta.ann -rw-rw-r-- 1 hadoop hadoop 12157188 3月 10 11:58 yeast.fasta.bwt -rw-rw-r-- 1 hadoop hadoop 415 3月 10 11:58 yeast.fasta.fai -rw-rw-r-- 1 hadoop hadoop 3039278 3月 10 11:58 yeast.fasta.pac -rw-rw-r-- 1 hadoop hadoop 6078608 3月 10 11:58 yeast.fasta.sa -rw-rw-r-- 1 hadoop hadoop 8366688 3月 10 11:59 yeast.sam hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h total 46M drwxrwxr-x 2 hadoop hadoop 4.0K 3月 10 11:59 ./ drwxrwxr-x 12 hadoop hadoop 4.0K 3月 10 11:57 ../ -rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:27 y1.fastq -rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:29 y2.fastq -rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:55 yeast.fasta -rw-rw-r-- 1 hadoop hadoop 14 3月 10 11:58 yeast.fasta.amb -rw-rw-r-- 1 hadoop hadoop 1.4K 3月 10 11:58 yeast.fasta.ann -rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:58 yeast.fasta.bwt -rw-rw-r-- 1 hadoop hadoop 415 3月 10 11:58 yeast.fasta.fai -rw-rw-r-- 1 hadoop hadoop 2.9M 3月 10 11:58 yeast.fasta.pac -rw-rw-r-- 1 hadoop hadoop 5.8M 3月 10 11:58 yeast.fasta.sa -rw-rw-r-- 1 hadoop hadoop 8.0M 3月 10 11:59 yeast.sam hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools sort -O bam -T /tmp -l 0 -o yeast.bam yeast.sam hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h total 53M drwxrwxr-x 2 hadoop hadoop 4.0K 3月 10 12:00 ./ drwxrwxr-x 12 hadoop hadoop 4.0K 3月 10 11:57 ../ -rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:27 y1.fastq -rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:29 y2.fastq -rw-rw-r-- 1 hadoop hadoop 6.6M 3月 10 12:00 yeast.bam -rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:55 yeast.fasta -rw-rw-r-- 1 hadoop hadoop 14 3月 10 11:58 yeast.fasta.amb -rw-rw-r-- 1 hadoop hadoop 1.4K 3月 10 11:58 yeast.fasta.ann -rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:58 yeast.fasta.bwt -rw-rw-r-- 1 hadoop hadoop 415 3月 10 11:58 yeast.fasta.fai -rw-rw-r-- 1 hadoop hadoop 2.9M 3月 10 11:58 yeast.fasta.pac -rw-rw-r-- 1 hadoop hadoop 5.8M 3月 10 11:58 yeast.fasta.sa -rw-rw-r-- 1 hadoop hadoop 8.0M 3月 10 11:59 yeast.sam hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools view -T yeast.fasta -C -o yeast.cram yeast.bam hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h total 54M drwxrwxr-x 2 hadoop hadoop 4.0K 3月 10 12:01 ./ drwxrwxr-x 12 hadoop hadoop 4.0K 3月 10 11:57 ../ -rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:27 y1.fastq -rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:29 y2.fastq -rw-rw-r-- 1 hadoop hadoop 6.6M 3月 10 12:00 yeast.bam -rw-rw-r-- 1 hadoop hadoop 945K 3月 10 12:01 yeast.cram -rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:55 yeast.fasta -rw-rw-r-- 1 hadoop hadoop 14 3月 10 11:58 yeast.fasta.amb -rw-rw-r-- 1 hadoop hadoop 1.4K 3月 10 11:58 yeast.fasta.ann -rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:58 yeast.fasta.bwt -rw-rw-r-- 1 hadoop hadoop 415 3月 10 11:58 yeast.fasta.fai -rw-rw-r-- 1 hadoop hadoop 2.9M 3月 10 11:58 yeast.fasta.pac -rw-rw-r-- 1 hadoop hadoop 5.8M 3月 10 11:58 yeast.fasta.sa -rw-rw-r-- 1 hadoop hadoop 8.0M 3月 10 11:59 yeast.sam hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ bwa mem yeast.fasta y1.fastq y2.fastq | > samtools sort -O bam -l 0 -T /tmp - | > samtools view -T yeast.fasta -C -o yeast.cram - [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::process] read 50000 sequences (1800000 bp)... [M::mem_pestat] # candidate unique pairs for (FF,11879) [M::mem_pestat] skip orientation FF [M::mem_pestat] skip orientation RR [M::mem_process_seqs] Processed 50000 reads in 7.820 CPU sec,8.093 real sec [main] Version: 0.7.12-r1039 [main] CMD: bwa mem yeast.fasta y1.fastq y2.fastq [main] Real time: 8.217 sec; CPU: 7.886 sec hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h total 54M drwxrwxr-x 2 hadoop hadoop 4.0K 3月 10 12:01 ./ drwxrwxr-x 12 hadoop hadoop 4.0K 3月 10 11:57 ../ -rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:27 y1.fastq -rw-rw-r-- 1 hadoop hadoop 3.0M 3月 10 11:29 y2.fastq -rw-rw-r-- 1 hadoop hadoop 6.6M 3月 10 12:00 yeast.bam -rw-rw-r-- 1 hadoop hadoop 945K 3月 10 12:03 yeast.cram -rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:55 yeast.fasta -rw-rw-r-- 1 hadoop hadoop 14 3月 10 11:58 yeast.fasta.amb -rw-rw-r-- 1 hadoop hadoop 1.4K 3月 10 11:58 yeast.fasta.ann -rw-rw-r-- 1 hadoop hadoop 12M 3月 10 11:58 yeast.fasta.bwt -rw-rw-r-- 1 hadoop hadoop 415 3月 10 11:58 yeast.fasta.fai -rw-rw-r-- 1 hadoop hadoop 2.9M 3月 10 11:58 yeast.fasta.pac -rw-rw-r-- 1 hadoop hadoop 5.8M 3月 10 11:58 yeast.fasta.sa -rw-rw-r-- 1 hadoop hadoop 8.0M 3月 10 11:59 yeast.sam hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools view yeast.cram |wc -l 50000 hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools view yeast.cram |head -10 SRR507778.19213 147 I 62 60 36M = 3183 3087 ATCCTAACACTACCCTAACACAGCCCTAATCTAACC 15=@9:@C3<CBGGGDGDBGDFCC?>GGG<GGGDGG AS:i:36 XS:i:19 MD:Z:36 NM:i:0 SRR507778.12312 147 I 205 60 36M = 3626 3387 CCACTCACCCACCGTTACCCTCCAATTACCCATATC GD<B?B>DGGBGGGGGCIIIIGIIIIIEIIGIIIII AS:i:36 XS:i:26 MD:Z:36 NM:i:0 SRR507778.11604 83 I 402 60 36M = 3869 3433 CTCACTTGTATACTGATTTTACGTACGCACACGGAT GHGHIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIII AS:i:36 XS:i:0 MD:Z:36 NM:i:0 SRR507778.10609 83 I 2661 40 36M = 6131 3436 TGAATTCGTACAACATTAAACGTGTGTTGGGAGTCG IIIGFIIGIIHIIIIGIIIIIIIIIIHIIIIIIGII AS:i:36 XS:i:36 MD:Z:36 NM:i:0 SRR507778.6249 147 I 2925 60 36M = 6404 3445 TTTCTAAGTGGGATTTTTCTTAATCCTTGGATTCTT GGGGADIGIHIIHHHIEHIHIHI<DIIIIIIIGIIF AS:i:36 XS:i:0 MD:Z:36 NM:i:0 SRR507778.14609 129 I 3048 60 36M IV 1525643 0 AAAAGTAGCCGTTCATTTCCCTTCCGATTTCATTCC >5833+?=8>B@FBF?9B7AGGGB<G@BGGGGEGD> AS:i:36 XS:i:0 MD:Z:36 NM:i:0 SRR507778.20233 83 I 3132 60 36M = 6388 3222 TATTTGTGTCCCATTCTCGTAGATAAAATTCTTGGA IIIIIIIGGIIIIIIIIIIIIIIIIIIIIIIBIIII AS:i:36 XS:i:0 MD:Z:36 NM:i:0 SRR507778.19213 99 I 3183 60 36M = 62 -3087 ATTTTCTTCATAAAGAAGCTTTCAAGATATAAGATA HHHGHGDAHHHHHEHHHHHHHHGHEGBDGGGGG<GE AS:i:36 XS:i:0 MD:Z:36 NM:i:0 SRR507778.20882 73 I 3259 60 36M = 3259 0 CAAAAAGGAAAGCATGGAGGGAAACAGTAAACAGTG @GGGGGG>GGBD4DDGGEDGDDG@GAA1CBEEEE3D AS:i:36 XS:i:0 MD:Z:36 NM:i:0 SRR507778.20882 133 I 3259 0 * = 3259 0 GTGGTGTGTGTGGGTGAGGTGTGGGTGTGGGGAGGG EGBG8GCB8BBBB####################### AS:i:0 XS:i:0 hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools mpileup -f yeast.fasta yeast.cram |wc -l [mpileup] 1 samples in 1 input files <mpileup> Set max per-file depth to 8000 1381594 hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools mpileup -f yeast.fasta yeast.cram |head -10 [mpileup] 1 samples in 1 input files <mpileup> Set max per-file depth to 8000 I 62 a 1 ^],1 I 63 T 1,5 I 64 C 1,= I 65 C 1,@ I 66 T 1,9 I 67 A 1,: I 68 A 1,@ I 69 C 1,C I 70 A 1,3 I 71 C 1,< hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll total 54728 drwxrwxr-x 2 hadoop hadoop 4096 3月 10 12:01 ./ drwxrwxr-x 12 hadoop hadoop 4096 3月 10 11:57 ../ -rw-rw-r-- 1 hadoop hadoop 3051773 3月 10 11:27 y1.fastq -rw-rw-r-- 1 hadoop hadoop 3051773 3月 10 11:29 y2.fastq -rw-rw-r-- 1 hadoop hadoop 6919225 3月 10 12:00 yeast.bam -rw-rw-r-- 1 hadoop hadoop 967382 3月 10 12:03 yeast.cram -rw-rw-r-- 1 hadoop hadoop 12360755 3月 10 11:55 yeast.fasta -rw-rw-r-- 1 hadoop hadoop 14 3月 10 11:58 yeast.fasta.amb -rw-rw-r-- 1 hadoop hadoop 1341 3月 10 11:58 yeast.fasta.ann -rw-rw-r-- 1 hadoop hadoop 12157188 3月 10 11:58 yeast.fasta.bwt -rw-rw-r-- 1 hadoop hadoop 415 3月 10 11:58 yeast.fasta.fai -rw-rw-r-- 1 hadoop hadoop 3039278 3月 10 11:58 yeast.fasta.pac -rw-rw-r-- 1 hadoop hadoop 6078608 3月 10 11:58 yeast.fasta.sa -rw-rw-r-- 1 hadoop hadoop 8366688 3月 10 11:59 yeast.sam hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools view yeast.cram |wc -l 50000 hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools view yeast.cram |head -10 SRR507778.19213 147 I 62 60 36M = 3183 3087 ATCCTAACACTACCCTAACACAGCCCTAATCTAACC 15=@9:@C3<CBGGGDGDBGDFCC?>GGG<GGGDGG AS:i:36 XS:i:19 MD:Z:36 NM:i:0 SRR507778.12312 147 I 205 60 36M = 3626 3387 CCACTCACCCACCGTTACCCTCCAATTACCCATATC GD<B?B>DGGBGGGGGCIIIIGIIIIIEIIGIIIII AS:i:36 XS:i:26 MD:Z:36 NM:i:0 SRR507778.11604 83 I 402 60 36M = 3869 3433 CTCACTTGTATACTGATTTTACGTACGCACACGGAT GHGHIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIII AS:i:36 XS:i:0 MD:Z:36 NM:i:0 SRR507778.10609 83 I 2661 40 36M = 6131 3436 TGAATTCGTACAACATTAAACGTGTGTTGGGAGTCG IIIGFIIGIIHIIIIGIIIIIIIIIIHIIIIIIGII AS:i:36 XS:i:36 MD:Z:36 NM:i:0 SRR507778.6249 147 I 2925 60 36M = 6404 3445 TTTCTAAGTGGGATTTTTCTTAATCCTTGGATTCTT GGGGADIGIHIIHHHIEHIHIHI<DIIIIIIIGIIF AS:i:36 XS:i:0 MD:Z:36 NM:i:0 SRR507778.14609 129 I 3048 60 36M IV 1525643 0 AAAAGTAGCCGTTCATTTCCCTTCCGATTTCATTCC >5833+?=8>B@FBF?9B7AGGGB<G@BGGGGEGD> AS:i:36 XS:i:0 MD:Z:36 NM:i:0 SRR507778.20233 83 I 3132 60 36M = 6388 3222 TATTTGTGTCCCATTCTCGTAGATAAAATTCTTGGA IIIIIIIGGIIIIIIIIIIIIIIIIIIIIIIBIIII AS:i:36 XS:i:0 MD:Z:36 NM:i:0 SRR507778.19213 99 I 3183 60 36M = 62 -3087 ATTTTCTTCATAAAGAAGCTTTCAAGATATAAGATA HHHGHGDAHHHHHEHHHHHHHHGHEGBDGGGGG<GE AS:i:36 XS:i:0 MD:Z:36 NM:i:0 SRR507778.20882 73 I 3259 60 36M = 3259 0 CAAAAAGGAAAGCATGGAGGGAAACAGTAAACAGTG @GGGGGG>GGBD4DDGGEDGDDG@GAA1CBEEEE3D AS:i:36 XS:i:0 MD:Z:36 NM:i:0 SRR507778.20882 133 I 3259 0 * = 3259 0 GTGGTGTGTGTGGGTGAGGTGTGGGTGTGGGGAGGG EGBG8GCB8BBBB####################### AS:i:0 XS:i:0 hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools mpileup -f yeast.fasta yeast.cram |wc -l [mpileup] 1 samples in 1 input files <mpileup> Set max per-file depth to 8000 1381594 hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools mpileup -f yeast.fasta yeast.cram |head -10 [mpileup] 1 samples in 1 input files <mpileup> Set max per-file depth to 8000 I 62 a 1 ^],< 【1】?http://www.htslib.org/workflow/#mapping_to_cram 【2】?http://www.htslib.org/doc/samtools.html 下载文件: 指令:
<span style="font-family:monospace;">curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head -100000 > y1.fastq curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d | head -100000 > y2.fastq curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz|gzip -d > yeast.fasta</span> 参考【1】中第三个没有加 |gzip -d 运行: hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head -100000 > y1.fastq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 537M 0 842k 0 0 5753 0 27:13:40 0:02:30 27:11:10 23013 curl: (23) Failed writing body (1496 != 4344) hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d | head -100000 > y2.fastq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 559M 0 932k 0 0 6937 0 23:28:57 0:02:17 23:26:40 193k curl: (23) Failed writing body (11536 != 16384) hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz > yeast.fasta % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 3782k 100 3782k 0 0 18197 0 0:03:32 0:03:32 --:--:-- 47280 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |