unix - Awk to compare All the files in a directory & display the frequency of occurrence -


suppose in directory have 3 files, file 1, file 2 & file 3. same header name possible in awk compare & write frequency of occurrence

file 1  c1  c2  c3  c4   d     d   d     d   d     d    file 2  c1  c2  c3  c4   d     d   v     d   d     d  file 3   c1  c2  c3  c4   d   r   d   f     d   d     d 

step 1 compare file 1 & file 2

temp.output  c1  c2  c3  c4 0   0   0   0 0   1   0   0 0   0   0   0 

then compare file 2 & file 3 & overwrite temp.output frequency

final.output c1  c2  c3  c4 0   0   1   0 0   2   0   0 0   0   0   0 

the original directory may contain multiple files, , want each of them process in orderly manner, ie. file1.txt file2.txt file2.txt file3.txt

let me suggest convert input files lines. this, can apply awk easily.

the paste -s <file> command ally. below can see how sort files sorted , convert them lines:

$ cat file1.txt  c1  c2  c3  c4   d     d   d     d   d     d $ ls file1.txt  file2.txt  file3.txt $ ls | sort file1.txt file2.txt file3.txt $ ls | sort | xargs -l 1 -i {} /bin/bash -c 'echo -n {}" "; paste -s {}' file1.txt c1  c2  c3  c4      d     d     d     d     d     d file2.txt c1  c2  c3  c4      d     d     v     d     d     d file3.txt c1  c2  c3  c4      d   r   d     f     d     d     d $  

once lines, can use awk iterate fields (nf tell how many there). use several rules.

for every line, compare if field @ i different previous saved value , increment result accordingly. skip comparing results first line (nr != 1) selector.

(nr != 1) { (i = 1; <= nf; i++) { if (last[i] != $i) { result[i]++; } } } 

in same awk call, include rule updates array keep last values:

{ (i = 1; <= nf; i++) { last[i] = $i  }  } 

finally printout file , status of results:

{ printf("%s", $1); (i = 1; <= nf; i++) { printf(" %d", result[i]) } print "" } 

here whole command:

$ ls | sort | xargs -l 1 -i {} /bin/bash -c 'echo -n {}" "; paste -s {}' | awk '(nr != 1) { (i = 1; <= nf; i++) { if (last[i] != $i) { result[i]++; } } } { (i = 1; <= nf; i++) { last[i] = $i  }  } { printf("%s", $1); (i = 1; <= nf; i++) { printf(" %d", result[i]) } print "" }' file1.txt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 file2.txt 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 file3.txt 2 0 0 0 0 0 0 1 0 0 2 0 0 0 0 0 0 $  

this output starts filename, accumulated differences in:

  • the filename (always different)
  • the columns (c1, c2, c3, c4 same)
  • then 12 values. relevant data starts @ field 7.

you can format again awk inserting new lines when appropriate:

awk '{ print ""; printf("%s", $1); (i = 7; <= nf; i++) { if (((i - 7) % 4) == 0) print "" ; printf(" %d", $i) } print "" }' 

here have complete run:

$ ls | sort | xargs -l 1 -i {} /bin/bash -c 'echo -n {}" "; paste -s {}' | awk '(nr != 1) { (i = 1; <= nf; i++) { if (last[i] != $i) { result[i]++; } } } { (i = 1; <= nf; i++) { last[i] = $i  }  } { printf("%s", $1); (i = 1; <= nf; i++) { printf(" %d", result[i]) } print "" }' | awk '{ print ""; printf("%s", $1); (i = 7; <= nf; i++) { if (((i - 7) % 4) == 0) print "" ; printf(" %d", $i) } print "" }'  file1.txt  0 0 0 0  0 0 0 0  0 0 0 0  file2.txt  0 0 0 0  0 1 0 0  0 0 0 0  file3.txt  0 0 1 0  0 2 0 0  0 0 0 0 $  

Comments