How to remove big chunks in log with similar/same content
. . . but leave first few and last few lines,
. . show information on the chunk removed.
Another question, can we do grep -v but when we match and do not show the lines can we show something like "1094 lines not shown" ?
e.g. grep -Ev "annoying message|other annoying message|other frequent annoying messages"
Similar question here . . .
https://unix.stackexchange.com/questions/244650/remove-partial-duplicates-consecutive-lines-but-keep-first-and-last/538960#538960
uniq is (sort of) the perfect tool for this, by default in uniq you can keep/show the first but not last line in set.
uniq has a -f flag which allows you to skip the first few fields.
From man uniq:
-f, --skip-fields=N
avoid comparing the first N fields
-s, --skip-chars=N
avoid comparing the first N characters
A field is a run of blanks (usually spaces and/or TABs), then non-blank characters. Fields are skipped before chars.
Example with uniq -c to show count see what uniq is doing:
-bash-4.2$ uniq -c -f 1 original_file
1 1447790360 99999 99999 20.25 20.25 20.25 20.50
9 1447790362 20.25 20.25 20.25 20.25 20.25 20.50
1 1447790388 20.25 20.25 99999 99999 99999 99999
1 1447790389 99999 99999 20.25 20.25 20.25 20.50
1 1447790391 20.00 20.25 20.25 20.25 20.25 20.50
3 1447790394 20.25 20.25 20.25 20.25 20.25 20.50
Not bad. Pretty close to what is wanted. And easy to do. But missing the last matching line in group . . . .
The grouping options in uniq are also interesting for this question . . .
--group[=METHOD]
show all items, separating groups with an empty line METHOD={separate(default),prepend,append,both}
-D, --all-repeated[=METHOD]
print all duplicate lines groups can be delimited with an empty line METHOD={none(default),prepend,separate}
Example, uniq by group . . .
-bash-4.2$ uniq --group=both -f 1 original_file
1447790360 99999 99999 20.25 20.25 20.25 20.50
1447790362 20.25 20.25 20.25 20.25 20.25 20.50
1447790365 20.25 20.25 20.25 20.25 20.25 20.50
1447790368 20.25 20.25 20.25 20.25 20.25 20.50
1447790371 20.25 20.25 20.25 20.25 20.25 20.50
1447790374 20.25 20.25 20.25 20.25 20.25 20.50
1447790377 20.25 20.25 20.25 20.25 20.25 20.50
1447790380 20.25 20.25 20.25 20.25 20.25 20.50
1447790383 20.25 20.25 20.25 20.25 20.25 20.50
1447790386 20.25 20.25 20.25 20.25 20.25 20.50
1447790388 20.25 20.25 99999 99999 99999 99999
1447790389 99999 99999 20.25 20.25 20.25 20.50
1447790391 20.00 20.25 20.25 20.25 20.25 20.50
1447790394 20.25 20.25 20.25 20.25 20.25 20.50
1447790397 20.25 20.25 20.25 20.25 20.25 20.50
1447790400 20.25 20.25 20.25 20.25 20.25 20.50
Then grep for line before and after every empty line and strip blank lines:
-bash-4.2$ uniq --group=both -f 1 original_file |grep -B1 -A1 ^$ |grep -Ev "^$|^--$"
1447790360 99999 99999 20.25 20.25 20.25 20.50
1447790362 20.25 20.25 20.25 20.25 20.25 20.50
1447790386 20.25 20.25 20.25 20.25 20.25 20.50
1447790388 20.25 20.25 99999 99999 99999 99999
1447790389 99999 99999 20.25 20.25 20.25 20.50
1447790391 20.00 20.25 20.25 20.25 20.25 20.50
1447790394 20.25 20.25 20.25 20.25 20.25 20.50
1447790400 20.25 20.25 20.25 20.25 20.25 20.50
Tah dahhh! Pretty good.
No comments:
Post a Comment