we like it when it is windy: uniq: the -f, --skip-fields=N option and --group=both // removing chunks of samey same stuff from logfiles

Wednesday, 4 September 2019

uniq: the -f, --skip-fields=N option and --group=both // removing chunks of samey same stuff from logfiles

Awkward log parsing question.
How to remove big chunks in log with similar/same content
. . . but leave first few and last few lines,
. . show information on the chunk removed.

Another question, can we do grep -v but when we match and do not show the lines can we show something like "1094 lines not shown" ?

e.g. grep -Ev "annoying message|other annoying message|other frequent annoying messages"

Similar question here . . .

https://unix.stackexchange.com/questions/244650/remove-partial-duplicates-consecutive-lines-but-keep-first-and-last/538960#538960

uniq is (sort of) the perfect tool for this, by default in uniq you can keep/show the first but not last line in set.

uniq has a -f flag which allows you to skip the first few fields.

From man uniq:

   -f, --skip-fields=N
          avoid comparing the first N fields

   -s, --skip-chars=N
          avoid comparing the first N characters

   A field is a run of blanks (usually spaces and/or TABs), then non-blank characters.  Fields are skipped before chars.

Example with uniq -c to show count see what uniq is doing:

-bash-4.2$ uniq -c -f 1 original_file
  1 1447790360      99999   99999   20.25   20.25   20.25   20.50
  9 1447790362      20.25   20.25   20.25   20.25   20.25   20.50
  1 1447790388      20.25   20.25   99999   99999   99999   99999
  1 1447790389      99999   99999   20.25   20.25   20.25   20.50
  1 1447790391      20.00   20.25   20.25   20.25   20.25   20.50
  3 1447790394      20.25   20.25   20.25   20.25   20.25   20.50

Not bad. Pretty close to what is wanted. And easy to do. But missing the last matching line in group . . . .

The grouping options in uniq are also interesting for this question . . .

   --group[=METHOD]
          show all items, separating groups with an empty line METHOD={separate(default),prepend,append,both}

   -D, --all-repeated[=METHOD]
          print all duplicate lines groups can be delimited with an empty line METHOD={none(default),prepend,separate}

Example, uniq by group . . .

    -bash-4.2$ uniq --group=both -f 1 original_file 

1447790360      99999   99999   20.25   20.25   20.25   20.50

1447790362      20.25   20.25   20.25   20.25   20.25   20.50
1447790365      20.25   20.25   20.25   20.25   20.25   20.50
1447790368      20.25   20.25   20.25   20.25   20.25   20.50
1447790371      20.25   20.25   20.25   20.25   20.25   20.50
1447790374      20.25   20.25   20.25   20.25   20.25   20.50
1447790377      20.25   20.25   20.25   20.25   20.25   20.50
1447790380      20.25   20.25   20.25   20.25   20.25   20.50
1447790383      20.25   20.25   20.25   20.25   20.25   20.50
1447790386      20.25   20.25   20.25   20.25   20.25   20.50

1447790388      20.25   20.25   99999   99999   99999   99999

1447790389      99999   99999   20.25   20.25   20.25   20.50

1447790391      20.00   20.25   20.25   20.25   20.25   20.50

1447790394      20.25   20.25   20.25   20.25   20.25   20.50
1447790397      20.25   20.25   20.25   20.25   20.25   20.50
1447790400      20.25   20.25   20.25   20.25   20.25   20.50

Then grep for line before and after every empty line and strip blank lines:

-bash-4.2$ uniq --group=both -f 1 original_file |grep -B1 -A1 ^$ |grep -Ev "^$|^--$"
1447790360      99999   99999   20.25   20.25   20.25   20.50
1447790362      20.25   20.25   20.25   20.25   20.25   20.50
1447790386      20.25   20.25   20.25   20.25   20.25   20.50
1447790388      20.25   20.25   99999   99999   99999   99999
1447790389      99999   99999   20.25   20.25   20.25   20.50
1447790391      20.00   20.25   20.25   20.25   20.25   20.50
1447790394      20.25   20.25   20.25   20.25   20.25   20.50
1447790400      20.25   20.25   20.25   20.25   20.25   20.50

Tah dahhh! Pretty good.

we like it when it is windy

Wednesday, 4 September 2019

uniq: the -f, --skip-fields=N option and --group=both // removing chunks of samey same stuff from logfiles

No comments:

About Me