使用awk或sed根据第1,第8和第9列值选择矩阵第一行
发布时间:2020-12-14 01:20:14 所属栏目:Linux 来源:网络整理
导读:我有一些行,第1列,第8列和第9列大致相同.行总数超过60K.现在我想简化只保留第1列,第8列和第9列相同的第一行. 输入文件: chr exon_start exon_end cnv tumor_DOC control_DOC rationormalized_after_smoothing CNV_start CNV_end seg_meanchr1 762097 762270
我有一些行,第1列,第8列和第9列大致相同.行总数超过60K.现在我想简化只保留第1列,第8列和第9列相同的第一行.
输入文件: chr exon_start exon_end cnv tumor_DOC control_DOC rationormalized_after_smoothing CNV_start CNV_end seg_mean chr1 762097 762270 3 821 717 1.456610215 762097 6706109 1.297328502 chr1 861281 861490 3 101 117 1.29744744 762097 6706109 1.297328502 chr1 7868860 7869039 2 78 119 1.123385189 7796356 8921423 1.088752407 chr1 7869841 7870041 2 140 169 1.123385189 7796356 8921423 1.088752407 chr1 7870411 7870596 2 83 163 1.123385189 7796356 8921423 1.088752407 chr1 7879297 7879467 2 290 360 1.024742732 7796356 8921423 1.088752407 chr1 21012415 21012609 3 89 135 1.230421209 19536504 21054539 1.247494175 chr1 21013924 21014512 3 234 219 1.359224182 19536504 21054539 1.247494175 chr1 21016588 21016803 3 172 179 1.230421209 19536504 21054539 1.247494175 chr1 21024895 21025101 3 147 120 1.230421209 19536504 21054539 1.247494175 chr14 20920169 20920704 3 211 214 1.254261327 20840851 20923828 1.288877208 chr14 20922716 20922919 3 253 262 1.228396526 20840851 20923828 1.288877208 chr14 20923634 20923828 3 188 201 1.206226522 20840851 20923828 1.288877208 chr14 20924141 20924329 2 244 344 0.902299535 20924141 21465086 1.088234038 chr14 20924787 20925701 2 314 306 1.305351797 20924141 21465086 1.088234038 chr14 20926636 20926836 2 134 136 1.206226522 20924141 21465086 1.088234038 期望的输出: chr exon_start exon_end cnv tumor_DOC control_DOC rationormalized_after_smoothing CNV_start CNV_end seg_mean chr1 762097 762270 3 821 717 1.456610215 762097 6706109 1.297328502 chr1 7869841 7870041 2 140 169 1.123385189 7796356 8921423 1.088752407 chr1 21024895 21025101 3 147 120 1.230421209 19536504 21054539 1.247494175 chr14 20922716 20922919 3 253 262 1.228396526 20840851 20923828 1.288877208 chr14 20924141 20924329 2 244 344 0.902299535 20924141 21465086 1.088234038 对于具有类似column1,第8列和第9列的每个不同类别,我只保留一行,最好是只要在发生更改时保留第一行. 我怎样才能在awk,sed或R中实现这一点? 解决方法
将数据导入R(您可以指定文件):
DF <- read.table(text = "chr exon_start exon_end cnv tumor_DOC control_DOC rationormalized_after_smoothing CNV_start CNV_end seg_mean chr1 762097 762270 3 821 717 1.456610215 762097 6706109 1.297328502 chr1 861281 861490 3 101 117 1.29744744 762097 6706109 1.297328502 chr1 7868860 7869039 2 78 119 1.123385189 7796356 8921423 1.088752407 chr1 7869841 7870041 2 140 169 1.123385189 7796356 8921423 1.088752407 chr1 7870411 7870596 2 83 163 1.123385189 7796356 8921423 1.088752407 chr1 7879297 7879467 2 290 360 1.024742732 7796356 8921423 1.088752407 chr1 21012415 21012609 3 89 135 1.230421209 19536504 21054539 1.247494175 chr1 21013924 21014512 3 234 219 1.359224182 19536504 21054539 1.247494175 chr1 21016588 21016803 3 172 179 1.230421209 19536504 21054539 1.247494175 chr1 21024895 21025101 3 147 120 1.230421209 19536504 21054539 1.247494175 chr14 20920169 20920704 3 211 214 1.254261327 20840851 20923828 1.288877208 chr14 20922716 20922919 3 253 262 1.228396526 20840851 20923828 1.288877208 chr14 20923634 20923828 3 188 201 1.206226522 20840851 20923828 1.288877208 chr14 20924141 20924329 2 244 344 0.902299535 20924141 21465086 1.088234038 chr14 20924787 20925701 2 314 306 1.305351797 20924141 21465086 1.088234038 chr14 20926636 20926836 2 134 136 1.206226522 20924141 21465086 1.088234038",header = TRUE) 提取第1列,第8列,第9列与先前行不重复的行: DF[!duplicated(DF[,c(1,8,9)]),] # chr exon_start exon_end cnv tumor_DOC control_DOC rationormalized_after_smoothing CNV_start CNV_end seg_mean #1 chr1 762097 762270 3 821 717 1.4566102 762097 6706109 1.297329 #3 chr1 7868860 7869039 2 78 119 1.1233852 7796356 8921423 1.088752 #7 chr1 21012415 21012609 3 89 135 1.2304212 19536504 21054539 1.247494 #11 chr14 20920169 20920704 3 211 214 1.2542613 20840851 20923828 1.288877 #14 chr14 20924141 20924329 2 244 344 0.9022995 20924141 21465086 1.088234 (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |