加入收藏 | 设为首页 | 会员中心 | 我要投稿 李大同 (https://www.lidatong.com.cn/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

基因数据处理11之sam文件格式

发布时间:2020-12-14 01:59:57 所属栏目:大数据 来源:网络整理
导读:基因数据处理11之sam文件格式 SAM的全称是sequence alignment map format。而BAM就是SAM的二进制文件(B取自binary) 1. read名称 2. SAM标记 3. chromosome 4. 5′端起始位置 5. MAPQ(mapping quality,描述比对的质量,数字越大,特异性越高) 6. CIGAR字串

基因数据处理11之sam文件格式

SAM的全称是sequence alignment map format。而BAM就是SAM的二进制文件(B取自binary)
1. read名称
2. SAM标记
3. chromosome
4. 5′端起始位置
5. MAPQ(mapping quality,描述比对的质量,数字越大,特异性越高)
6. CIGAR字串,记录插入,删除,错配以及splice junctions(后剪切拼接的接头)
7. mate名称,记录mate pair信息
8. mate的位置
9. 模板的长度
10. read序列
11. read质量
12. 程序用标记

样例:

hadoop@Master:~/cloud/adam/xubo/data/test20160310$ samtools view SRR003161h20.sam 
SRR003161.1	0	chr1	143217889	0	4S35M85S	*	0	0	TCAGATGCAATCATCGAATGGTCTCGAATGGAATCNTCTANAGAGATGGAATGTATCNCTCGCCANACGACACNCGAACAGGGNAAGGCAAGCAGNAGGNAGNNNANNNNNNNNNNNNNNNNNN	AAAAAAAAAAAAAAAA:::BAAFAABAAB?>>=44!39=<!:866699888220862!08:8002!0200000!022200800!20660000600!000!06!!!6!!!!!!!!!!!!!!!!!!	NM:i:1	MD:Z:31A3	AS:i:3XS:i:33	XA:Z:chr10,+42092546,4S35M85S,1;chr1,+143217421,+143239587,-143252938,85S35M4S,+143220601,+143219665,-143210830,1;chr10,+42075371,+42101425,+143272381,-143204112,+143189975,+42080829,+42067652,+143236600,+42071261,+143202568,+143262016,+42094445,+143229991,+143194906,+42098197,+143229325,+143273144,+143236132,1;chr3,-196898795,-125173710,+42074903,+143193143,+143190443,+42085796,+143224622,+143267943,+42103854,+143225093,-143249828,+143231300,-143256486,-143209440,+143228021,+143185063,-41852367,-143251629,+143233540,+42093977,+143200517,+143194441,+42070793,+143206914,+143237811,+143227553,-143255189,+143231768,+143271341,+42080361,+143213870,+42074435,+143263324,+42097745,+42090276,-125180284,+143240055,+143265756,+143216113,-125169985,+143219197,+143192675,+42095848,+143195374,+143214338,+143270772,-125166285,+143275099,+143226451,+42104319,+143232233,+143211626,+143220133,+143215645,+42100036,-41846998,-125168084,-125179816,+143240523,+143264771,+143212094,4S34M86S,-41845898,86S34M4S,+143191375,4S31M89S,0;chr1,-125182919,89S31M4S,+143221908,+143190911,0;chr10,-41843753,0;chr10_KI270824v1_alt,+1080,1;chr10_KI270824v1_alt,+615,1;
SRR003161.2	0	chr7	41381016	60	4S153M1D132M1D5M1D28M1D73M3I12M1I40M54S	*	0	0	TCAGTTTGAGATGGAGTTTCATTCTTGTTGCCCAGGCTGGAGTGCAATGGCGCAATCTCAGCTCACAGCAACCTCCGCCTCCCGGGTTCAAGCGATTCTCCTGCCTCAGCCTCTCGAGTAGCTGGGATTACAGGCATGCACCATCACGCCCAGCTAATTTGCATTTTTTATTAGAGATGGGGTTTCTCCACATTGGTCAGGCTGATCTCGAACTCCTGACCTCAGGTGATCTGCCTGCCTTGGCCTCCCAAAGTGCTGGGATTACAGGCATGAGCCTGAGCCCAACCTATTTACTTTCAATCCATCTTTTCAATAACTTAAATACAAGTGTCAATATATACAATCTTTTCCTCCCTGGTTATCAAGCTTTCTAATATATATGGATGTATCTTCCAAGGTTTTTGATCCCATTTTACTTTACAGGCTCACTGCTGTGGAACCCAGAGAGCAGTCTCTTTTCAAGGNGGGCTGAGACNCGCAACAGGGGATTAGGCCAAGGCNCAGG	CCCCCCCCCCCCCCCC@@@CCCFEEEFEEG888EEEFFEEEEFGGGGGGCCCCCCCCCCCCCCCCCCCCCCCCCCCCCA<777@@CCCBCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCAAACCCCCCCCCCCCCCCCCCCCCCC:93339@A>77//39AC666666C22CAAAA93333///7-0017>9999>>A???ACCCCCCC2239322>9977<?????CCCCCCCCC877777777222221::::5555:555:::::::::;:555:;;::::0040-----***--467::::;;;;;;:::511155555:555:::;::::::7777744-------///245::;;;::::::;;;;;;;;:55554774----------44-----064---------6---522451115247644255-----,4---24464422---------!,4464224!11:::7:::222221--7777---!----	NM:i:1MD:Z:153^T40T91^T5^T28^G73G23C0G26	AS:i:379	XS:i:88



参考: http://www.bbioo.com/lifesciences/40-113338-1.html

(编辑:李大同)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!

    推荐文章
      热点阅读