bash – 根据不同的gsub条件同时gsub多列？

发布时间：2020-12-15 21:19:21 所属栏目：安全来源：网络整理

导读：我有一个包含以下数据的文件 – 输入 – A B C D E FA B B B B BC A C D E FA B D E F AA A A A A FA B C B B B 如果从第2行开始的任何其他行与第1行具有相同的字母,则应将它们更改为1.基本上,我试图找出任何行与第一行的相似程度. 期望的输出 – 1 1 1 1 1

我有一个包含以下数据的文件 –

输入 –

A B C D E F
A B B B B B
C A C D E F
A B D E F A
A A A A A F
A B C B B B

如果从第2行开始的任何其他行与第1行具有相同的字母,则应将它们更改为1.基本上,我试图找出任何行与第一行的相似程度.

期望的输出 –

1 1 1 1 1 1
1 1 B B B B
C A 1 1 1 1
1 1 D E F A
1 A A A A 1
1 1 1 B B B

第一行已成为全1,因为它与自身相同(显然).在第二行中,第一列和第二列与第一行(A B)相同,因此它们变为11.依此类推其他行.

我写了以下代码来完成这个转换 –

for seq in {1..1} ; #Iterate over the rows (in this case just row 1)
do 
    for position in {1..6} ; #Iterate over the columns
    do 
        #Define the letter in the first row with which I'm comparing the rest of the rows
        aa=$(awk -v pos=$position -v line=$seq 'NR == line {print $pos}' f) 
        #If it matches,gsub it to 1 
        awk -v var=$aa -v pos=$position '{gsub (var,"1",$pos)} 1' f > temp
        #Save this intermediate file and now act on this
        mv temp f 
    done 
done

可以想象,这非常慢,因为嵌套循环很昂贵.我的真实数据是60×10000矩阵,此程序运行大约需要2个小时.

我希望你能帮助我摆脱内循环,这样我就可以一步完成所有6个gsubs.也许把它们放在自己的阵列中？我的awk技能还不是很好.

解决方法

输入

$cat f
A B C D E F
A B B B B B
C A C D E F
A B D E F A
A A A A A F
A B C B B B

期望的o / p

$awk 'FNR==1{split($0,a)}{for(i=1;i<=NF;i++)if (a[i]==$i) $i=1}1' f
1 1 1 1 1 1
1 1 B B B B
C A 1 1 1 1
1 1 D E F A
1 A A A A 1
1 1 1 B B B

说明

> FNR == 1 {..}

当awk读取当前文件的第一条记录时,在大括号内执行操作

split(string,array [,fieldsep [,seps ] ])

Divide string into pieces separated by fieldsep and store the pieces
in array and the separator strings in the seps array.

>拆分($0,a)

split current record or row ($0) into pieces by fieldsep (defualt space,as
we have not supplied 3rd argument) and store the pieces in array a
So array a contains data from first row

a[1] = A 
       a[2] = B
       a[3] = C 
       a[4] = D  
       a[5] = E  
       a[6] = F

> for(i = 1; i <= NF; i)

Loop through all the fields of for each record of file till end of file.

> if(a [i] == $i)$i = 1

if first row’s column value of current index (i) is equal to
current column value of current row set current column value = 1 ( meaning modify current column value )

现在我们修改列值接下来只打印修改过的行

>} 1

1总是计算为true,它执行默认操作{print $0}

有关评论的更新请求

Same question here,I have a second part of the program that adds up
the numbers in the rows. I.e. You would get 6,2,4,3 for this
output. Can your program be tweaked to get these values out at this
step itself?

$awk 'FNR==1{split($0,a)}{s=0;for(i=1;i<=NF;i++)if(a[i]==$i)s+=$i=1;print $0,s}' f
1 1 1 1 1 1 6
1 1 B B B B 2
C A 1 1 1 1 4
1 1 D E F A 2
1 A A A A 1 2
1 1 1 B B B 3

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!