This prevented a jenkins job from starting.
The cvs log failed when it hit a certain file.
$ cvs log dir/dir/file
cvs [server aborted]: unexpected '\x49' reading revision number in RCS file cvsroot/module/dir/dir/file,v
On the cvs server the file had cvs version/log details at start and latest text of file after but a section of binary junk in the middle.
So to check for corruption in other cvs files . . .
https://stackoverflow.com/questions/3001177/how-do-i-grep-for-all-non-ascii-characters-in-unix
Searching for non-ASCII chars just involves eliminating any char > 0x80.
Searching for non-printable chars.is more useful.
Useful grep:
grep -c -P -n "[\x00-\x08\x0E-\x1F\x80-\xFF]" *
breakdown:
\x00-\x08 - non-printable control chars 0 - 7 decimal
\x0E-\x1F - more non-printable control chars 14 - 31 decimal
\x80-1xFF - non-printable chars > 128 decimal
-c - print count of matching lines instead of lines
-P - perl style regexps
Instead of -c you may prefer to use -n (and optionally -b) or -l
-n, --line-number
-b, --byte-offset
-l, --files-with-matches
E.g. practical example of use find to grep all files under current directory:
find . -type f -exec grep -c -P -n "[\x00-\x08\x0E-\x1F\x80-\xFF]" {} +
You may wish to adjust the grep at times. e.g. BS(0x08 - backspace) char used in some printable files or to exclude VT(0x0B - vertical tab). The BEL(0x07) and ESC(0x1B) chars can also be deemed printable in some cases.
Non-Printable ASCII Chars
** marks PRINTABLE but CONTROL chars that is useful to exclude sometimes
Dec Hex Ctrl Char description Dec Hex Ctrl Char description
0 00 ^@ NULL 16 10 ^P DATA LINK ESCAPE (DLE)
1 01 ^A START OF HEADING (SOH) 17 11 ^Q DEVICE CONTROL 1 (DC1)
2 02 ^B START OF TEXT (STX) 18 12 ^R DEVICE CONTROL 2 (DC2)
3 03 ^C END OF TEXT (ETX) 19 13 ^S DEVICE CONTROL 3 (DC3)
4 04 ^D END OF TRANSMISSION (EOT) 20 14 ^T DEVICE CONTROL 4 (DC4)
5 05 ^E END OF QUERY (ENQ) 21 15 ^U NEGATIVE ACKNOWLEDGEMENT (NAK)
6 06 ^F ACKNOWLEDGE (ACK) 22 16 ^V SYNCHRONIZE (SYN)
7 07 ^G BEEP (BEL) 23 17 ^W END OF TRANSMISSION BLOCK (ETB)
8 08 ^H BACKSPACE (BS)** 24 18 ^X CANCEL (CAN)
9 09 ^I HORIZONTAL TAB (HT)** 25 19 ^Y END OF MEDIUM (EM)
10 0A ^J LINE FEED (LF)** 26 1A ^Z SUBSTITUTE (SUB)
11 0B ^K VERTICAL TAB (VT)** 27 1B ^[ ESCAPE (ESC)
12 0C ^L FF (FORM FEED)** 28 1C ^\ FILE SEPARATOR (FS) RIGHT ARROW
13 0D ^M CR (CARRIAGE RETURN)** 29 1D ^] GROUP SEPARATOR (GS) LEFT ARROW
14 0E ^N SO (SHIFT OUT) 30 1E ^^ RECORD SEPARATOR (RS) UP ARROW
15 0F ^O SI (SHIFT IN) 31 1F ^_ UNIT SEPARATOR (US) DOWN
There was just one corrupt file on cvs server.
Remove file from local workarea, remove file from cvs, remove Attic file on server, move file back in workarea, cvs add and commit. Log and version history and tags are lost, just have latest version of file.