加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

基因数据处理1之mapping_to_cram

发布时间:2020-12-14 02:00:41 所属栏目:大数据 来源:网络整理
导读:基因数据处理1之mapping_to_cram 参考资料: A Worked Example Obtain some public data We will use the first 100,000 read-pairs from a yeast data set. curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head

基因数据处理1之mapping_to_cram

参考资料:

A Worked Example

Obtain some public data

We will use the first 100,000 read-pairs from a yeast data set.

curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head -100000 > y1.fastq
curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d | head -100000 > y2.fastq
curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz > yeast.fasta

Prepare the BWA indices

We need to ensure there exists a .fai fasta index and also indices for whichever aligner we are using (Bwa-mem in this example).

samtools faidx yeast.fasta
bwa index yeast.fasta

Produce the alignments

The aligner is likely to output SAM in the same order or similar order to the input fastq files. It won’t be outputting in chromosome position order,so the output is typically not well suited to CRAM.

bwa mem -R '@RGtID:footSM:bartLB:library1' yeast.fasta y1.fastq y2.fastq > yeast.sam

The -R option adds a read-group line and applies that read-group to all aligned sequence records. It is not necessary,but a recommended practice.

Sort into chromosome/positon order

Ideally at this point we would be outputting CRAM directly,but at present samtools 1.0 does not have a way to indicate the reference on the command line. We can output to BAM instead and convert (below),or modify the SAM @SQ header to include MD5 sums in the M5: field.

samtools sort -O bam -T /tmp -l 0 -o yeast.bam yeast.sam

The “-l 0” indicates to use no compression in the BAM file,as it is transitory and will be replaced by CRAM soon. We may wish to use -l 1 if disk space is short and we wish to reduce temporary file size.

Convert to CRAM format

samtools view -T yeast.fasta -C -o yeast.cram yeast.bam

Note that since the BAM file did not have M5 tags for the reference sequences,they are computed by Samtools and added to the CRAM. In a production environment,this step can be avoided by ensuring that the M5 tags are already in the SAM/BAM header.

The last 3 steps can be combined into a pipeline to reduce disk I/O:

bwa mem yeast.fasta y1.fastq y2.fastq | 
samtools sort -O bam -l 0 -T /tmp - | 
samtools view -T yeast.fasta -C -o yeast.cram -

Viewing in alignment and pileup format

See the variant calling workflow for more advanced examples.

samtools view yeast.cram
samtools mpileup -f yeast.fasta yeast.cram


有问题:

改正:原下载没有夹gzip -d

curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d   > y1.fastq
curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d  > y2.fastq
curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz|gzip -d > yeast.fasta


 
参考【1】中第三个没有加  
|gzip -d 


运行记录:

hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head -100000 > y1.fastq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  537M    0  842k    0     0   5753      0 27:13:40  0:02:30 27:11:10 23013
curl: (23) Failed writing body (1496 != 4344)
hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d | head -100000 > y2.fastq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  559M    0  932k    0     0   6937      0 23:28:57  0:02:17 23:26:40  193k
curl: (23) Failed writing body (11536 != 16384)
hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz > yeast.fasta
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3782k  100 3782k    0     0  18197      0  0:03:32  0:03:32 --:--:-- 47280

hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head -100000 > y1.fastq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  537M    0  848k    0     0   6525      0 24:00:23  0:02:13 23:58:10  188k
curl: (23) Failed writing body (1000 != 1448)
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d | head -100000 > y2.fastq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  559M    0  937k    0     0   6989      0 23:18:28  0:02:17 23:16:11  196k
curl: (23) Failed writing body (6872 != 16384)
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz|gzip -d > yeast.fasta
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3782k  100 3782k    0     0  26203      0  0:02:27  0:02:27 --:--:--  254k
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ls
y1.fastq  y2.fastq  yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ cd ../
hadoop@Mcnode6:~/cloud/adam/xubo$ ls
1000genomes  autoconf-archive  bwa-0.7.12  htslib  resources  samtools  yeast  yeast201603101121  yeast201603101125
hadoop@Mcnode6:~/cloud/adam/xubo$  cp yeast
yeast/             yeast201603101121/ yeast201603101125/ 
hadoop@Mcnode6:~/cloud/adam/xubo$  cp yeast201603101125/ yeast0
cp: omitting directory ‘yeast201603101125/’
hadoop@Mcnode6:~/cloud/adam/xubo$ cp -r yeast201603101125 yeast
yeast/             yeast201603101121/ yeast201603101125/ 
hadoop@Mcnode6:~/cloud/adam/xubo$ cp -r yeast201603101125 yeast0
hadoop@Mcnode6:~/cloud/adam/xubo$ ls
1000genomes  autoconf-archive  bwa-0.7.12  htslib  resources  samtools  yeast  yeast0  yeast201603101121  yeast201603101125
hadoop@Mcnode6:~/cloud/adam/xubo$ cd yeast201603101125
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ls
y1.fastq  y2.fastq  yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h
total 18M
drwxrwxr-x  2 hadoop hadoop 4.0K  3月 10 11:53 ./
drwxrwxr-x 12 hadoop hadoop 4.0K  3月 10 11:57 ../
-rw-rw-r--  1 hadoop hadoop 3.0M  3月 10 11:27 y1.fastq
-rw-rw-r--  1 hadoop hadoop 3.0M  3月 10 11:29 y2.fastq
-rw-rw-r--  1 hadoop hadoop  12M  3月 10 11:55 yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools faidx yeast.fasta
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ bwa index yeast.fasta
[bwa_index] Pack FASTA... 0.11 sec
[bwa_index] Construct BWT for the packed sequence...
[bwa_index] 7.49 seconds elapse.
[bwa_index] Update BWT... 0.11 sec
[bwa_index] Pack forward-only FASTA... 0.07 sec
[bwa_index] Construct SA from BWT and Occ... 2.94 sec
[main] Version: 0.7.12-r1039
[main] CMD: bwa index yeast.fasta
[main] Real time: 13.239 sec; CPU: 10.718 sec
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h
total 38M
drwxrwxr-x  2 hadoop hadoop 4.0K  3月 10 11:58 ./
drwxrwxr-x 12 hadoop hadoop 4.0K  3月 10 11:57 ../
-rw-rw-r--  1 hadoop hadoop 3.0M  3月 10 11:27 y1.fastq
-rw-rw-r--  1 hadoop hadoop 3.0M  3月 10 11:29 y2.fastq
-rw-rw-r--  1 hadoop hadoop  12M  3月 10 11:55 yeast.fasta
-rw-rw-r--  1 hadoop hadoop   14  3月 10 11:58 yeast.fasta.amb
-rw-rw-r--  1 hadoop hadoop 1.4K  3月 10 11:58 yeast.fasta.ann
-rw-rw-r--  1 hadoop hadoop  12M  3月 10 11:58 yeast.fasta.bwt
-rw-rw-r--  1 hadoop hadoop  415  3月 10 11:58 yeast.fasta.fai
-rw-rw-r--  1 hadoop hadoop 2.9M  3月 10 11:58 yeast.fasta.pac
-rw-rw-r--  1 hadoop hadoop 5.8M  3月 10 11:58 yeast.fasta.sa
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ bwa mem -R '@RGtID:footSM:bartLB:library1' yeast.fasta y1.fastq y2.fastq > yeast.sam
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 50000 sequences (1800000 bp)...
[M::mem_pestat] # candidate unique pairs for (FF,FR,RF,RR): (103,4007,13219,91)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25,50,75) percentile: (240,2467,3114)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1,8862)
[M::mem_pestat] mean and std.dev: (1893.02,1414.42)
[M::mem_pestat] low and high boundaries for proper pairs: (1,11736)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25,75) percentile: (243,275,314)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (101,456)
[M::mem_pestat] mean and std.dev: (278.34,50.30)
[M::mem_pestat] low and high boundaries for proper pairs: (30,527)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25,75) percentile: (3111,3320,3565)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (2203,4473)
[M::mem_pestat] mean and std.dev: (3352.46,321.09)
[M::mem_pestat] low and high boundaries for proper pairs: (1749,4927)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25,75) percentile: (291,2611,3188)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1,8982)
[M::mem_pestat] mean and std.dev: (2036.04,1501.65)
[M::mem_pestat] low and high boundaries for proper pairs: (1,11879)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 50000 reads in 7.764 CPU sec,8.119 real sec
[main] Version: 0.7.12-r1039
[main] CMD: bwa mem -R @RGtID:footSM:bartLB:library1 yeast.fasta y1.fastq y2.fastq
[main] Real time: 8.523 sec; CPU: 7.867 sec
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll
total 47020
drwxrwxr-x  2 hadoop hadoop     4096  3月 10 11:59 ./
drwxrwxr-x 12 hadoop hadoop     4096  3月 10 11:57 ../
-rw-rw-r--  1 hadoop hadoop  3051773  3月 10 11:27 y1.fastq
-rw-rw-r--  1 hadoop hadoop  3051773  3月 10 11:29 y2.fastq
-rw-rw-r--  1 hadoop hadoop 12360755  3月 10 11:55 yeast.fasta
-rw-rw-r--  1 hadoop hadoop       14  3月 10 11:58 yeast.fasta.amb
-rw-rw-r--  1 hadoop hadoop     1341  3月 10 11:58 yeast.fasta.ann
-rw-rw-r--  1 hadoop hadoop 12157188  3月 10 11:58 yeast.fasta.bwt
-rw-rw-r--  1 hadoop hadoop      415  3月 10 11:58 yeast.fasta.fai
-rw-rw-r--  1 hadoop hadoop  3039278  3月 10 11:58 yeast.fasta.pac
-rw-rw-r--  1 hadoop hadoop  6078608  3月 10 11:58 yeast.fasta.sa
-rw-rw-r--  1 hadoop hadoop  8366688  3月 10 11:59 yeast.sam
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h
total 46M
drwxrwxr-x  2 hadoop hadoop 4.0K  3月 10 11:59 ./
drwxrwxr-x 12 hadoop hadoop 4.0K  3月 10 11:57 ../
-rw-rw-r--  1 hadoop hadoop 3.0M  3月 10 11:27 y1.fastq
-rw-rw-r--  1 hadoop hadoop 3.0M  3月 10 11:29 y2.fastq
-rw-rw-r--  1 hadoop hadoop  12M  3月 10 11:55 yeast.fasta
-rw-rw-r--  1 hadoop hadoop   14  3月 10 11:58 yeast.fasta.amb
-rw-rw-r--  1 hadoop hadoop 1.4K  3月 10 11:58 yeast.fasta.ann
-rw-rw-r--  1 hadoop hadoop  12M  3月 10 11:58 yeast.fasta.bwt
-rw-rw-r--  1 hadoop hadoop  415  3月 10 11:58 yeast.fasta.fai
-rw-rw-r--  1 hadoop hadoop 2.9M  3月 10 11:58 yeast.fasta.pac
-rw-rw-r--  1 hadoop hadoop 5.8M  3月 10 11:58 yeast.fasta.sa
-rw-rw-r--  1 hadoop hadoop 8.0M  3月 10 11:59 yeast.sam
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools sort -O bam -T /tmp -l 0 -o yeast.bam yeast.sam
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h
total 53M
drwxrwxr-x  2 hadoop hadoop 4.0K  3月 10 12:00 ./
drwxrwxr-x 12 hadoop hadoop 4.0K  3月 10 11:57 ../
-rw-rw-r--  1 hadoop hadoop 3.0M  3月 10 11:27 y1.fastq
-rw-rw-r--  1 hadoop hadoop 3.0M  3月 10 11:29 y2.fastq
-rw-rw-r--  1 hadoop hadoop 6.6M  3月 10 12:00 yeast.bam
-rw-rw-r--  1 hadoop hadoop  12M  3月 10 11:55 yeast.fasta
-rw-rw-r--  1 hadoop hadoop   14  3月 10 11:58 yeast.fasta.amb
-rw-rw-r--  1 hadoop hadoop 1.4K  3月 10 11:58 yeast.fasta.ann
-rw-rw-r--  1 hadoop hadoop  12M  3月 10 11:58 yeast.fasta.bwt
-rw-rw-r--  1 hadoop hadoop  415  3月 10 11:58 yeast.fasta.fai
-rw-rw-r--  1 hadoop hadoop 2.9M  3月 10 11:58 yeast.fasta.pac
-rw-rw-r--  1 hadoop hadoop 5.8M  3月 10 11:58 yeast.fasta.sa
-rw-rw-r--  1 hadoop hadoop 8.0M  3月 10 11:59 yeast.sam
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools view -T yeast.fasta -C -o yeast.cram yeast.bam
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h
total 54M
drwxrwxr-x  2 hadoop hadoop 4.0K  3月 10 12:01 ./
drwxrwxr-x 12 hadoop hadoop 4.0K  3月 10 11:57 ../
-rw-rw-r--  1 hadoop hadoop 3.0M  3月 10 11:27 y1.fastq
-rw-rw-r--  1 hadoop hadoop 3.0M  3月 10 11:29 y2.fastq
-rw-rw-r--  1 hadoop hadoop 6.6M  3月 10 12:00 yeast.bam
-rw-rw-r--  1 hadoop hadoop 945K  3月 10 12:01 yeast.cram
-rw-rw-r--  1 hadoop hadoop  12M  3月 10 11:55 yeast.fasta
-rw-rw-r--  1 hadoop hadoop   14  3月 10 11:58 yeast.fasta.amb
-rw-rw-r--  1 hadoop hadoop 1.4K  3月 10 11:58 yeast.fasta.ann
-rw-rw-r--  1 hadoop hadoop  12M  3月 10 11:58 yeast.fasta.bwt
-rw-rw-r--  1 hadoop hadoop  415  3月 10 11:58 yeast.fasta.fai
-rw-rw-r--  1 hadoop hadoop 2.9M  3月 10 11:58 yeast.fasta.pac
-rw-rw-r--  1 hadoop hadoop 5.8M  3月 10 11:58 yeast.fasta.sa
-rw-rw-r--  1 hadoop hadoop 8.0M  3月 10 11:59 yeast.sam
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ bwa mem yeast.fasta y1.fastq y2.fastq | 
> samtools sort -O bam -l 0 -T /tmp - | 
> samtools view -T yeast.fasta -C -o yeast.cram -
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 50000 sequences (1800000 bp)...
[M::mem_pestat] # candidate unique pairs for (FF,11879)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 50000 reads in 7.820 CPU sec,8.093 real sec
[main] Version: 0.7.12-r1039
[main] CMD: bwa mem yeast.fasta y1.fastq y2.fastq
[main] Real time: 8.217 sec; CPU: 7.886 sec
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll -h
total 54M
drwxrwxr-x  2 hadoop hadoop 4.0K  3月 10 12:01 ./
drwxrwxr-x 12 hadoop hadoop 4.0K  3月 10 11:57 ../
-rw-rw-r--  1 hadoop hadoop 3.0M  3月 10 11:27 y1.fastq
-rw-rw-r--  1 hadoop hadoop 3.0M  3月 10 11:29 y2.fastq
-rw-rw-r--  1 hadoop hadoop 6.6M  3月 10 12:00 yeast.bam
-rw-rw-r--  1 hadoop hadoop 945K  3月 10 12:03 yeast.cram
-rw-rw-r--  1 hadoop hadoop  12M  3月 10 11:55 yeast.fasta
-rw-rw-r--  1 hadoop hadoop   14  3月 10 11:58 yeast.fasta.amb
-rw-rw-r--  1 hadoop hadoop 1.4K  3月 10 11:58 yeast.fasta.ann
-rw-rw-r--  1 hadoop hadoop  12M  3月 10 11:58 yeast.fasta.bwt
-rw-rw-r--  1 hadoop hadoop  415  3月 10 11:58 yeast.fasta.fai
-rw-rw-r--  1 hadoop hadoop 2.9M  3月 10 11:58 yeast.fasta.pac
-rw-rw-r--  1 hadoop hadoop 5.8M  3月 10 11:58 yeast.fasta.sa
-rw-rw-r--  1 hadoop hadoop 8.0M  3月 10 11:59 yeast.sam
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools view yeast.cram |wc -l
50000
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools view yeast.cram |head -10
SRR507778.19213	147	I	62	60	36M	=	3183	3087	ATCCTAACACTACCCTAACACAGCCCTAATCTAACC	15=@9:@C3<CBGGGDGDBGDFCC?>GGG<GGGDGG	AS:i:36	XS:i:19	MD:Z:36	NM:i:0
SRR507778.12312	147	I	205	60	36M	=	3626	3387	CCACTCACCCACCGTTACCCTCCAATTACCCATATC	GD<B?B>DGGBGGGGGCIIIIGIIIIIEIIGIIIII	AS:i:36	XS:i:26	MD:Z:36	NM:i:0
SRR507778.11604	83	I	402	60	36M	=	3869	3433	CTCACTTGTATACTGATTTTACGTACGCACACGGAT	GHGHIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIII	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.10609	83	I	2661	40	36M	=	6131	3436	TGAATTCGTACAACATTAAACGTGTGTTGGGAGTCG	IIIGFIIGIIHIIIIGIIIIIIIIIIHIIIIIIGII	AS:i:36	XS:i:36	MD:Z:36	NM:i:0
SRR507778.6249	147	I	2925	60	36M	=	6404	3445	TTTCTAAGTGGGATTTTTCTTAATCCTTGGATTCTT	GGGGADIGIHIIHHHIEHIHIHI<DIIIIIIIGIIF	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.14609	129	I	3048	60	36M	IV	1525643	0	AAAAGTAGCCGTTCATTTCCCTTCCGATTTCATTCC	>5833+?=8>B@FBF?9B7AGGGB<G@BGGGGEGD>	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.20233	83	I	3132	60	36M	=	6388	3222	TATTTGTGTCCCATTCTCGTAGATAAAATTCTTGGA	IIIIIIIGGIIIIIIIIIIIIIIIIIIIIIIBIIII	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.19213	99	I	3183	60	36M	=	62	-3087	ATTTTCTTCATAAAGAAGCTTTCAAGATATAAGATA	HHHGHGDAHHHHHEHHHHHHHHGHEGBDGGGGG<GE	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.20882	73	I	3259	60	36M	=	3259	0	CAAAAAGGAAAGCATGGAGGGAAACAGTAAACAGTG	@GGGGGG>GGBD4DDGGEDGDDG@GAA1CBEEEE3D	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.20882	133	I	3259	0	*	=	3259	0	GTGGTGTGTGTGGGTGAGGTGTGGGTGTGGGGAGGG	EGBG8GCB8BBBB#######################	AS:i:0	XS:i:0
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools mpileup -f yeast.fasta yeast.cram |wc -l
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
1381594
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools mpileup -f yeast.fasta yeast.cram |head -10
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
I	62	a	1	^],1
I	63	T	1,5
I	64	C	1,=
I	65	C	1,@
I	66	T	1,9
I	67	A	1,:
I	68	A	1,@
I	69	C	1,C
I	70	A	1,3
I	71	C	1,<
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ ll
total 54728
drwxrwxr-x  2 hadoop hadoop     4096  3月 10 12:01 ./
drwxrwxr-x 12 hadoop hadoop     4096  3月 10 11:57 ../
-rw-rw-r--  1 hadoop hadoop  3051773  3月 10 11:27 y1.fastq
-rw-rw-r--  1 hadoop hadoop  3051773  3月 10 11:29 y2.fastq
-rw-rw-r--  1 hadoop hadoop  6919225  3月 10 12:00 yeast.bam
-rw-rw-r--  1 hadoop hadoop   967382  3月 10 12:03 yeast.cram
-rw-rw-r--  1 hadoop hadoop 12360755  3月 10 11:55 yeast.fasta
-rw-rw-r--  1 hadoop hadoop       14  3月 10 11:58 yeast.fasta.amb
-rw-rw-r--  1 hadoop hadoop     1341  3月 10 11:58 yeast.fasta.ann
-rw-rw-r--  1 hadoop hadoop 12157188  3月 10 11:58 yeast.fasta.bwt
-rw-rw-r--  1 hadoop hadoop      415  3月 10 11:58 yeast.fasta.fai
-rw-rw-r--  1 hadoop hadoop  3039278  3月 10 11:58 yeast.fasta.pac
-rw-rw-r--  1 hadoop hadoop  6078608  3月 10 11:58 yeast.fasta.sa
-rw-rw-r--  1 hadoop hadoop  8366688  3月 10 11:59 yeast.sam



 

hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools view yeast.cram |wc -l
50000
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools view yeast.cram |head -10
SRR507778.19213	147	I	62	60	36M	=	3183	3087	ATCCTAACACTACCCTAACACAGCCCTAATCTAACC	15=@9:@C3<CBGGGDGDBGDFCC?>GGG<GGGDGG	AS:i:36	XS:i:19	MD:Z:36	NM:i:0
SRR507778.12312	147	I	205	60	36M	=	3626	3387	CCACTCACCCACCGTTACCCTCCAATTACCCATATC	GD<B?B>DGGBGGGGGCIIIIGIIIIIEIIGIIIII	AS:i:36	XS:i:26	MD:Z:36	NM:i:0
SRR507778.11604	83	I	402	60	36M	=	3869	3433	CTCACTTGTATACTGATTTTACGTACGCACACGGAT	GHGHIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIII	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.10609	83	I	2661	40	36M	=	6131	3436	TGAATTCGTACAACATTAAACGTGTGTTGGGAGTCG	IIIGFIIGIIHIIIIGIIIIIIIIIIHIIIIIIGII	AS:i:36	XS:i:36	MD:Z:36	NM:i:0
SRR507778.6249	147	I	2925	60	36M	=	6404	3445	TTTCTAAGTGGGATTTTTCTTAATCCTTGGATTCTT	GGGGADIGIHIIHHHIEHIHIHI<DIIIIIIIGIIF	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.14609	129	I	3048	60	36M	IV	1525643	0	AAAAGTAGCCGTTCATTTCCCTTCCGATTTCATTCC	>5833+?=8>B@FBF?9B7AGGGB<G@BGGGGEGD>	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.20233	83	I	3132	60	36M	=	6388	3222	TATTTGTGTCCCATTCTCGTAGATAAAATTCTTGGA	IIIIIIIGGIIIIIIIIIIIIIIIIIIIIIIBIIII	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.19213	99	I	3183	60	36M	=	62	-3087	ATTTTCTTCATAAAGAAGCTTTCAAGATATAAGATA	HHHGHGDAHHHHHEHHHHHHHHGHEGBDGGGGG<GE	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.20882	73	I	3259	60	36M	=	3259	0	CAAAAAGGAAAGCATGGAGGGAAACAGTAAACAGTG	@GGGGGG>GGBD4DDGGEDGDDG@GAA1CBEEEE3D	AS:i:36	XS:i:0	MD:Z:36	NM:i:0
SRR507778.20882	133	I	3259	0	*	=	3259	0	GTGGTGTGTGTGGGTGAGGTGTGGGTGTGGGGAGGG	EGBG8GCB8BBBB#######################	AS:i:0	XS:i:0
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ 

hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools mpileup -f yeast.fasta yeast.cram |wc -l
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
1381594
hadoop@Mcnode6:~/cloud/adam/xubo/yeast201603101125$ samtools mpileup -f yeast.fasta yeast.cram |head -10
[mpileup] 1 samples in 1 input files
<mpileup> Set max per-file depth to 8000
I	62	a	1	^],<



参考

【1】?http://www.htslib.org/workflow/#mapping_to_cram

【2】?http://www.htslib.org/doc/samtools.html

下载文件:

指令:

<span style="font-family:monospace;">curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head -100000 > y1.fastq
curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d | head -100000 > y2.fastq
curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz|gzip -d > yeast.fasta</span>

参考【1】中第三个没有加
|gzip -d


运行:

hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_1.fastq.gz|gzip -d | head -100000 > y1.fastq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  537M    0  842k    0     0   5753      0 27:13:40  0:02:30 27:11:10 23013
curl: (23) Failed writing body (1496 != 4344)
hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR507/SRR507778/SRR507778_2.fastq.gz|gzip -d | head -100000 > y2.fastq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0  559M    0  932k    0     0   6937      0 23:28:57  0:02:17 23:26:40  193k
curl: (23) Failed writing body (11536 != 16384)
hadoop@Mcnode6:~/cloud/adam/xubo/1000genomes/CEU/NA12878/alignment$ curl ftp://ftp.ensembl.org/pub/current_fasta/saccharomyces_cerevisiae/dna/Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz > yeast.fasta
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3782k  100 3782k    0     0  18197      0  0:03:32  0:03:32 --:--:-- 47280

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读