差异区域筛选(区间比较)
发布时间:2020-12-15 23:55:29 所属栏目:大数据 来源:网络整理
导读:原帖入口 文本处理前: data1.txt:range1:1-6range1:7-14range1:15-18range1:19-23range2:2-7range2:8-11range2:12-16range2:18-20data2.txt:range1:2-4range1:6-9range1:10-17range2:1-6range2:7-12range2:13-14range2:17-19 定义: 差异区域:如果两个集
原帖入口 文本处理前: data1.txt: range1:1-6 range1:7-14 range1:15-18 range1:19-23 range2:2-7 range2:8-11 range2:12-16 range2:18-20 data2.txt: range1:2-4 range1:6-9 range1:10-17 range2:1-6 range2:7-12 range2:13-14 range2:17-19 定义: 差异区域:如果两个集合的交集范围小于等于2,就认为是差异区域。如果两个集合是父子集的关系,不管它们的交集范围是多少,都不算差异区域。 目标: 根据这2个文件生成3个文件,分别是data1_diff.txt、data2_diff.txt和data_comm.txt。 data1_diff.txt用于存放data1.txt与data2.txt的差异区域。 data2_diff.txt用于存放data2.txt与data1.txt的差异区域。 data_comm.txt用于存放非差异区域。 文本处理后: data1_diff.txt: range1:15-18 range1:19-23 range2:18-20 data2_diff.txt: range1:6-9 range2:17-19 data_comm.txt: range1:1-6??2-4 range1:7-14??10-17 range2:2-7??1-6 range2:8-11??7-12 range2:12-16??13-14 我的脚本: #!/usr/bin/perl use?warnings; use?strict; use?5.010; sub?JiaoJi?{ ????????return?2?if?$_[0][0]?<=?$_[1][0]?&&?$_[0][1]?>=?$_[1][1]?||?$_[0][0]?>=?$_[1][0]?&&?$_[0][1]?<=?$_[1][1]; ????????return?($_[0][1]?-?$_[1][0]?>?2)???2?:?0?if?$_[0][0]?<=?$_[1][0]?&&?$_[0][1]?<=?$_[1][1]; ????????return?($_[1][1]?-?$_[0][0]?>?2)???2?:?1?if?$_[0][0]?>=?$_[1][0]?&&?$_[0][1]?>=?$_[1][1]; } open(FILE1,?'data1.txt')?or?die; open(FILE2,?'data2.txt')?or?die; my(%hash1,?%hash2); while(<FILE1>){ ????????chomp; ????????@_?=?split?/[:-]/; ????????push(@{$hash1{$_[0]}},?[$_[1],?$_[2]]); } while(<FILE2>){ ????????chomp; ????????@_?=?split?/[:-]/; ????????push(@{$hash2{$_[0]}},?$_[2]]); } close(FILE1); close(FILE2); for?my?$key?(keys?%hash1){ ????????while(@{$hash1{$key}}){ ????????????????while(@{$hash2{$key}}){ ????????????????????????unless(@{$hash1{$key}}){ ????????????????????????????????open(OUTPUT_FILE,?'>>data2_diff.txt'); ????????????????????????????????printf?OUTPUT_FILE?"%s:%d-%dn",?$key,?$_->[0],?$_->[1]?for?@{$hash2{$key}}; ????????????????????????????????undef?@{$hash2{$key}}; ????????????????????????????????close(OUTPUT_FILE); ????????????????????????????????last; ????????????????????????} ????????????????????????given(JiaoJi($hash1{$key}[0],?$hash2{$key}[0])){ ????????????????????????????????when(0){ ????????????????????????????????????????open(OUTPUT_FILE,?'>>data1_diff.txt'); ????????????????????????????????????????say?OUTPUT_FILE?"$key:$hash1{$key}[0][0]-$hash1{$key}[0][1]"; ????????????????????????????????????????shift?@{$hash1{$key}}; ????????????????????????????????} ????????????????????????????????when(1){ ????????????????????????????????????????open(OUTPUT_FILE,?'>>data2_diff.txt'); ????????????????????????????????????????say?OUTPUT_FILE?"$key:$hash2{$key}[0][0]-$hash2{$key}[0][1]"; ????????????????????????????????????????shift?@{$hash2{$key}}; ????????????????????????????????} ????????????????????????????????when(2){ ????????????????????????????????????????open(OUTPUT_FILE,?'>>data_comm.txt'); ????????????????????????????????????????say?OUTPUT_FILE?"$key:$hash1{$key}[0][0]-$hash1{$key}[0][1]?$hash2{$key}[0][0]-$hash2{$key}[0][1]"; ????????????????????????????????????????shift?@{$hash1{$key}}; ????????????????????????????????????????shift?@{$hash2{$key}}; ????????????????????????????????} ????????????????????????} ????????????????????????close(OUTPUT_FILE); ????????????????} ????????????????open(OUTPUT_FILE,?'>>data1_diff.txt'); ????????????????printf?OUTPUT_FILE?"%s:%d-%dn",?$_->[1]?for?@{$hash1{$key}}; ????????????????undef?@{$hash1{$key}}; ????????????????close(OUTPUT_FILE); ????????} } (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |