suppose in directory have 3 files, file 1, file 2 & file 3. same header name possible in awk compare & write frequency of occurrence
file 1 c1 c2 c3 c4 d d d d d d file 2 c1 c2 c3 c4 d d v d d d file 3 c1 c2 c3 c4 d r d f d d d
step 1 compare file 1 & file 2
temp.output c1 c2 c3 c4 0 0 0 0 0 1 0 0 0 0 0 0
then compare file 2 & file 3 & overwrite temp.output frequency
final.output c1 c2 c3 c4 0 0 1 0 0 2 0 0 0 0 0 0
the original directory may contain multiple files, , want each of them process in orderly manner, ie. file1.txt file2.txt file2.txt file3.txt
let me suggest convert input files lines. this, can apply awk
easily.
the paste -s <file>
command ally. below can see how sort files sorted , convert them lines:
$ cat file1.txt c1 c2 c3 c4 d d d d d d $ ls file1.txt file2.txt file3.txt $ ls | sort file1.txt file2.txt file3.txt $ ls | sort | xargs -l 1 -i {} /bin/bash -c 'echo -n {}" "; paste -s {}' file1.txt c1 c2 c3 c4 d d d d d d file2.txt c1 c2 c3 c4 d d v d d d file3.txt c1 c2 c3 c4 d r d f d d d $
once lines, can use awk iterate fields (nf
tell how many there). use several rules.
for every line, compare if field @ i
different previous saved value , increment result accordingly. skip comparing results first line (nr != 1)
selector.
(nr != 1) { (i = 1; <= nf; i++) { if (last[i] != $i) { result[i]++; } } }
in same awk
call, include rule updates array keep last values:
{ (i = 1; <= nf; i++) { last[i] = $i } }
finally printout file , status of results:
{ printf("%s", $1); (i = 1; <= nf; i++) { printf(" %d", result[i]) } print "" }
here whole command:
$ ls | sort | xargs -l 1 -i {} /bin/bash -c 'echo -n {}" "; paste -s {}' | awk '(nr != 1) { (i = 1; <= nf; i++) { if (last[i] != $i) { result[i]++; } } } { (i = 1; <= nf; i++) { last[i] = $i } } { printf("%s", $1); (i = 1; <= nf; i++) { printf(" %d", result[i]) } print "" }' file1.txt 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 file2.txt 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 file3.txt 2 0 0 0 0 0 0 1 0 0 2 0 0 0 0 0 0 $
this output starts filename, accumulated differences in:
- the filename (always different)
- the columns (c1, c2, c3, c4 same)
- then 12 values. relevant data starts @ field 7.
you can format again awk
inserting new lines when appropriate:
awk '{ print ""; printf("%s", $1); (i = 7; <= nf; i++) { if (((i - 7) % 4) == 0) print "" ; printf(" %d", $i) } print "" }'
here have complete run:
$ ls | sort | xargs -l 1 -i {} /bin/bash -c 'echo -n {}" "; paste -s {}' | awk '(nr != 1) { (i = 1; <= nf; i++) { if (last[i] != $i) { result[i]++; } } } { (i = 1; <= nf; i++) { last[i] = $i } } { printf("%s", $1); (i = 1; <= nf; i++) { printf(" %d", result[i]) } print "" }' | awk '{ print ""; printf("%s", $1); (i = 7; <= nf; i++) { if (((i - 7) % 4) == 0) print "" ; printf(" %d", $i) } print "" }' file1.txt 0 0 0 0 0 0 0 0 0 0 0 0 file2.txt 0 0 0 0 0 1 0 0 0 0 0 0 file3.txt 0 0 1 0 0 2 0 0 0 0 0 0 $
Comments
Post a Comment