Wednesday, 9 November 2016

Use the --foreground option with timeout command and docker exec - prevents hang

TLDR; 1. the timeout command is a very useful command in scripts to make sure they don't get stuck on a hanging command

TLDR; 2. Using the --foreground option in timeout command which calls docker exec or other commands might prevent the command hanging.

e.g. timeout --foreground 5 docker exec -i su - -c ' '


https://github.com/moby/moby/issues/28207



docker exec hang when run using timeout command (and docker v1.12.2) #28207

 Open
Gaoithe opened this issue on Nov 9, 2016 · 11 comments

Comments

@Gaoithe
 
 

Gaoithe commented on Nov 9, 2016  

There are a few issues open around docker exec hanging but this one seems quite specific to running docker exec under the timeout command and to latest docker version 1.12.2 . . .
Have scripts which run docker exec like this:
timeout 1m docker exec -i $id sh -c "$command"
They had been working ok. After upgrade to docker 1.12.2 the scripts hang on those docker exec commands.
Interestingly, if I run the command on command-line (bash shell from ssh) then no hangs are seen.
If the command is run from inside a #!/bin/bash script then the hang happens all the time.

BUG REPORT INFORMATION

Description
docker exec command hangs when run in a bash script using timeout command and
same command using timeout doesn't hang from command-line
Steps to reproduce the issue:
  1. make a script like this called ugh.sh and run it:
#!/bin/bash
timeout 1m docker exec -it $contid sh -c "ls"
  1. chmod 755 ugh.sh; ./ugh.sh
NOTE that the very same command run in shell doesn't hang.



LATER:


 

pixelb commented on Nov 14, 2016

Have you tried the --foreground option to timeout? If docker is behaving as a monitor itself then this should timeout all children and be the least invasive way to run docker
@Gaoithe
 
 
Author

Gaoithe commented on Nov 14, 2016

Thank you @pixelb. That works. Interesting! Using timeout --foreground 1m docker exec . . . etc the commands now do not hang (when running in bash script from bash shell which is when we saw hang before.). The hang was not and is not seen when the timeout and docker exec was called interactively from shell. I realise now (slightly more) the significance of that.
That looks like a good workaround for our scripts.


https://stackpointer.io/unix/unix-linux-run-command-with-timeout/500/

USAGE:

$ timeout 1s sleep 10
$ echo $?
124

$ timeout 10s sleep 1
$ echo $?
0

$ timeout --help
Usage: timeout [OPTION] DURATION COMMAND [ARG]...
  or:  timeout [OPTION]
Start COMMAND, and kill it if still running after DURATION.

Mandatory arguments to long options are mandatory for short options too.
      --preserve-status
                 exit with the same status as COMMAND, even when the
                   command times out
      --foreground
                 when not running timeout directly from a shell prompt,
                   allow COMMAND to read from the TTY and get TTY signals;
                   in this mode, children of COMMAND will not be timed out
  -k, --kill-after=DURATION
                 also send a KILL signal if COMMAND is still running
                   this long after the initial signal was sent
  -s, --signal=SIGNAL
                 specify the signal to be sent on timeout;
                   SIGNAL may be a name like 'HUP' or a number;
                   see 'kill -l' for a list of signals
      --help     display this help and exit
      --version  output version information and exit

DURATION is a floating point number with an optional suffix:
's' for seconds (the default), 'm' for minutes, 'h' for hours or 'd' for days.

If the command times out, and --preserve-status is not set, then exit with
status 124.  Otherwise, exit with the status of COMMAND.  If no signal
is specified, send the TERM signal upon timeout.  The TERM signal kills
any process that does not block or catch that signal.  It may be necessary
to use the KILL (9) signal, since this signal cannot be caught, in which
case the exit status is 128+9 rather than 124.

GNU coreutils online help:
For complete documentation, run: info coreutils 'timeout invocation'



Thursday, 3 November 2016

rsync from android MTP . . freezy/hangy problems. s/rsync/crude script/

Samsung Galaxy Nexus.
3.8G of photos in internal storage.
Options are MTP "mounting" or PTP.
Ubuntu 14.xx

1.

First attempt copying using file manager . . . started out okay. Reported 3.8G will be copied over in 7 mins . . 15 mins . . later 1 hour and x mins. And it was getting stuck / hanging up (looked like at the same point).

Unplug/replug USB did not bring back MTP connection after hang. Needed to go to phone USB computer connection Settings and un-check and then re-check MTP to recover it.

2.

rsync
cd /run/user/1000/gvfs/
cd mtp* # e.g. mtp::host=%5Busb%3A001%2C017%5D/
cd "Internal storage/DCIM"
rsync -avzhP ./ ~/Pictures/201611FionnOldMobilePhone/DCIM/
see errors like this:

          1.59M 100%    1.67MB/s    0:00:00 (xfr#1834, to-chk=957/2800)
Camera/IMG_20130808_123846.jpg
          1.98M 100%    1.25MB/s    0:00:01 (xfr#1835, to-chk=956/2800)
rsync: read errors mapping "/run/user/1000/gvfs/mtp:host=%5Busb%3A001%2C017%5D/Internal storage/DCIM/Camera/IMG_20130808_123846.jpg": No data available (61)
Camera/IMG_20130808_123902.jpg
          1.43M 100%    1.57MB/s    0:00:00 (xfr#1836, to-chk=955/2800)
rsync: read errors mapping "/run/user/1000/gvfs/mtp:host=%5Busb%3A001%2C017%5D/Internal storage/DCIM/Camera/IMG_20130808_123902.jpg": No data available (61)
Camera/IMG_20130808_124320.jpg
          1.58M 100%    1.21MB/s    0:00:01 (xfr#1837, to-chk=954/2800)
rsync: read errors mapping "/run/user/1000/gvfs/mtp:host=%5Busb%3A001%2C017%5D/Internal storage/DCIM/Camera/IMG_20130808_124320.jpg": No data available (61)
Camera/IMG_20130808_124435.jpg
          1.54M 100%    2.00MB/s    0:00:00 (xfr#1838, to-chk=953/2800)
rsync: read errors mapping "/run/user/1000/gvfs/mtp:host=%5Busb%3A001%2C017%5D/Internal storage/DCIM/Camera/IMG_20130808_124435.jpg": No data available (61)
Camera/IMG_20130808_124530.jpg
          1.90M 100%    1.54MB/s    0:00:01 (xfr#1839, to-chk=952/2800)
rsync: read errors mapping "/run/user/1000/gvfs/mtp:host=%5Busb%3A001%2C017%5D/Internal storage/DCIM/Camera/IMG_20130808_124530.jpg": No data available (61)
Camera/IMG_20130808_124625.jpg
         32.77K   1%  126.98kB/s    0:00:17 
. . restarted . .
. . stuck again . . 

. . seemed to be getting stuck on the same files . . 
 
These errors seem to be encountered if rsyncing from files which get changed while sync happens.  I think android should not be changing them. I don't think MTP would expose corrupt disk problem ? Something like antivirus locking the files should not be problem. Anyway, MTP connection a bit flaky . . maybe something on the android side is locking files or MTP drivers are a bit funny.
 
3.
 
Tried rsync with --remove-source-files on whole and on file selections (e.g. Camera/IMG_201606*) still got stuck. Each time after hang needed to go into droid unselect MTP and select again to connect again. At least files to be checked/transferred are being reduced now by removing them after they are synced.



    rsync -avzhP --remove-source-files ./ ~/Pictures/201611FionnOldMobilePhone/DCIM/

4.
 
rsync was getting stuck on the same file all the time. cp seemed to have less of a hanging problem. more tolerant of read errors? So made a script to do copy and remove if copy successful.

    # Script using cp -p (and remove files as we go)
    DEST=/home/jamesc/Pictures/201611FionnOldMobilePhone/DCIM/Camera/
    cd /run/user/1000/gvfs/mtp*/Intern*/DCIM
    FILES=$(ls Camera/)
    for f in $FILES; do echo -n $f; cp -p Camera/$f $DEST/ && rm Camera/$f && echo success; done

This was run in a while true loop periodically recovering MTP connection when it hung.
 
      while true; do ~/bin/sync_flaky_mtp.sh; sleep 1; done

The script saw some funny errors e.g. :
 
    IMG_20130814_113108.jpgsuccess
    IMG_20130815_145354.jpgsuccess
    IMG_20130815_145455.jpgcp: error reading ‘Camera/IMG_20130815_145455.jpg’: Input/output error
    cp: failed to extend ‘/home/jamesc/Pictures/201611FionnOldMobilePhone/DCIM/Camera/IMG_20130815_145455.jpg’: Input/output error
    IMG_20130815_145500.jpgsuccess
    IMG_20130815_151552.jpgsuccess
    . . .
    IMG_20130824_095526.jpgsuccess
    IMG_20130824_095537.jpg

And it would hang . . . periodically recovered MTP connection as already described.
 
5.
 
The script evolved . . . 

    Binary files Camera/IMG_20160104_095626.jpg and /home/jamesc/Pictures/201611FionnOldMobilePhone/DCIM/Camera//IMG_20160104_095626.jpg differ
    IMG_20160104_100128.jpgalready
    RM
    IMG_20160104_100157.jpgalready

       #!/bin/bash
DEST=/home/jamesc/Pictures/201611FionnOldMobilePhone/DCIM/Camera/
cd /run/user/1000/gvfs/mtp*/Intern*/DCIM
FILES=$(ls Camera/)
#echo FILES=$FILES
ls Camera/ |wc -l
for f in $FILES; do
   echo -n $f;
   if [[ ! -e $DEST/$f ]]; then
      # the timeout doesn't work on frozen mtp file
             # we run this full script in a while true; do ~/bin/sync_flaky_mtp.sh; sleep 1; done loop
             # and periodically visit, if stuck unlock phone, deselect MTP and select it again
             # (to interrupt and restart MTP connection)
             # simply unplug & replug doesn't bring MTP connection back ok
      timeout 20s cp -p Camera/$f $DEST/ && rm Camera/$f && echo success;
   else
      echo already done by earlier script which did not tidy up;
      diff -u Camera/$f $DEST/$f
      if [[ $? == 0 ]]; then
         rm Camera/$f && echo RM
      else
         # doing this copy on the weird files causes nasty freezy freezes.
         #cp -p Camera/$f $DEST/$f && rm Camera/$f && echo success;
                # ALOT of files found to be a few bytes different.
                # another copy caused hanging . . . so if dest file kindof looks ok we just remove source
         identify $DEST/$f && rm Camera/$f && echo RM
      fi
   fi;
done

beep

    # a few bytes in size different
    ls -al Camera/IMG_20150117_171621.jpg $DEST/Camera/IMG_20150117_171621.jpg
      -rw------- 1 jamesc jamesc 1277901 Jan 17  2015 Camera/IMG_20150117_171621.jpg
      -rwxr----- 1 jamesc jamesc 1275306 Nov  3 01:09 $DEST/Camera/IMG_20150117_171621.jpg
 
    # copy again ? results in same size difference, date/time fixed up though
    cp -p Camera/IMG_20150117_171621.jpg $DEST/Camera/IMG_20150117_171621.jpg
    ls -al Camera/IMG_20150117_171621.jpg $DEST/Camera/IMG_20150117_171621.jpg
      -rw------- 1 jamesc jamesc 1277901 Jan 17  2015 Camera/IMG_20150117_171621.jpg
      -rw------- 1 jamesc jamesc 1275306 Jan 17  2015 $DEST/Camera/IMG_20150117_171621.jpg
    diff -u Camera/IMG_20150117_171621.jpg $DEST/Camera/IMG_20150117_171621.jpg
      Binary files Camera/IMG_20150117_171621.jpg and $DEST/Camera/IMG_20150117_171621.jpg differ

So EVENTUALLY by removing files we had already copied we had less and less to do and got through all files. Phew. And yay.

    du -h $DEST/.. --max-depth=0
      3.6G    /home/jamesc/Pictures/201611FionnOldMobilePhone/DCIM/Camera//..


Various other reading . . .