Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker for Mac hangs constantly #1835

Closed
ryanfb opened this issue Jul 7, 2017 · 139 comments
Closed

Docker for Mac hangs constantly #1835

ryanfb opened this issue Jul 7, 2017 · 139 comments

Comments

@ryanfb
Copy link

ryanfb commented Jul 7, 2017

Expected behavior

Docker for Mac doesn't hang.

Actual behavior

Docker for Mac hangs.

Information

  • Full output of the diagnostics from "Diagnose & Feedback" in the menu
Docker for Mac: version: 17.06.0-ce-mac17 (4cdec4294a50b2233146b09469b49937dabdebdd)
macOS: version 10.11.6 (build: 15G1421)
logs: /tmp/BED37CCE-B2F9-41B9-B9E6-72EFEBE30091/20170707-134513.tar.gz
failure: docker ps failed: (Failure "docker ps: timeout after 10.00s")
[OK]     db.git
[OK]     vmnetd
[OK]     dns
[OK]     driver.amd64-linux
[OK]     virtualization VT-X
[OK]     app
[OK]     moby
[OK]     system
[OK]     moby-syslog
[OK]     db
[OK]     env
[OK]     virtualization kern.hv_support
[OK]     slirp
[OK]     osxfs
[OK]     moby-console
[OK]     logs
[ERROR]  docker-cli
         docker ps failed
[OK]     menubar
[OK]     disk

Diagnostic ID: BED37CCE-B2F9-41B9-B9E6-72EFEBE30091

Steps to reproduce the behavior

  1. Clone https://github.com/dcthree/dclp-docker
  2. Run docker-compose up --force-recreate
  3. Get output:
Creating network "dockercompose_default" with the default driver
Creating volume "dockercompose_repo" with default driver
Creating volume "dockercompose_maven" with default driver
Creating dockercompose_fuseki_1 ...
Creating dockercompose_xsugar_1 ...
Creating dockercompose_repo_clone_1 ...
Creating dockercompose_repo_clone_1
Creating dockercompose_xsugar_1
Creating dockercompose_xsugar_1 ... done
Creating dockercompose_navigator_1 ...
Creating dockercompose_sosol_1 ...
Creating dockercompose_navigator_1
Creating dockercompose_sosol_1 ... done
ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).
  1. Docker for Mac hangs until I quit/restart it. I can't run e.g. docker ps or any other command that interacts with Docker for Mac.

I've tried using docker system prune, using "Reset" in the Docker for Mac GUI, increasing RAM/CPU allocations (currently 16GB/6 CPU), and upgrading Docker for Mac to edge. I still encounter this problem multiple times per day, and now regularly/reliably when trying to start this Compose file.

@simonbh
Copy link

simonbh commented Jul 21, 2017

I am also having this same issue. Diag 11EED9F3-181B-41BC-A99D-BEF7DDC1580E

@bcully
Copy link

bcully commented Aug 9, 2017

Same issue. I have 4 CPUs/6GB RAM allocated.

@ryanfb
Copy link
Author

ryanfb commented Aug 25, 2017

Still seeing this non-stop. It's sufficient to run docker-compose up indexer after cloning/updating https://github.com/dcthree/dclp-docker to make the crash happen (since there are some unversioned config files for some of the other services).

@mickaelperrin
Copy link

Sadly, I can confirm that since a few weeks docker for mac is far less stable than it used to be.

It randomly crashes and only a mac reboot helps to bring back the service. Manually restarting the service doesn't work.

@westover
Copy link

westover commented Oct 2, 2017

Also having this issue. However I have noticed that starting a new Diagnose process tends to unblock the process. However I am using a custom container I am building. Restarting the app does help but you have to kill both the hyperkit process and the qcow-tool process. A53027AC-01D5-42AC-BBEB-2B7C58218846

@sudhagarc
Copy link

Installed docker on new MBP and after mac woke up from sleep, I noticed this issue. I have not noticed this on my previous MBP.

  1. Docker restart got hung.
  2. Activity monitor showed hyperkit taking about 100% of cpu
  3. Had to force quit both Docker and hyperkit to recover the situation
Docker for Mac: version: 17.09.0-ce-mac35 (69202b202f497d4b6e627c3370781b9e4b51ec78)
macOS: version 10.12.6 (build: 16G1036)
logs: /tmp/D06DD1FC-D953-476E-9381-80A47EB055D7/20171207-115540.tar.gz
failure: docker ps failed: (Failure "docker ps: timeout after 10.00s")
[OK]     db.git
[OK]     vmnetd
[OK]     dns
[OK]     driver.amd64-linux
[OK]     virtualization VT-X
[OK]     app
[OK]     moby
[OK]     system
[OK]     moby-syslog
[OK]     db
[OK]     env
[OK]     virtualization kern.hv_support
[OK]     slirp
[OK]     osxfs
[OK]     moby-console
[OK]     logs
[ERROR]  docker-cli
         docker ps failed
[OK]     menubar
[OK]     disk

Diagnostic ID:
D06DD1FC-D953-476E-9381-80A47EB055D7

@akimd
Copy link
Contributor

akimd commented Jan 18, 2018

Please, try a more recent version of Docker for Mac.

@akimd akimd closed this as completed Jan 18, 2018
@ryanfb
Copy link
Author

ryanfb commented Jan 19, 2018

@akimd I just tried again with the latest edge, 18.01.0-ce-mac48 (220004). The exact same thing still happens.

Docker for Mac: version: 18.01.0-ce-mac48 (d1778b704353fa5b79142a2055a2c11c8b48a653)
macOS: version 10.12.6 (build: 16G1114)
logs: /tmp/230E6503-3092-4DE1-BC76-47C03F92A4D5/20180119-121833.tar.gz
failure: docker ps failed: (Failure "docker ps: timeout after 10.00s")
[OK]     db.git
[OK]     vmnetd
[OK]     dns
[OK]     driver.amd64-linux
[OK]     virtualization VT-X
[OK]     app
[OK]     moby
[OK]     system
[OK]     moby-syslog
[OK]     kubernetes
[OK]     env
[OK]     virtualization kern.hv_support
[OK]     slirp
[OK]     osxfs
[OK]     moby-console
[OK]     logs
[ERROR]  docker-cli
         docker ps failed
[OK]     menubar
[OK]     disk

Diagnostic ID:
230E6503-3092-4DE1-BC76-47C03F92A4D5

@bailaohe
Copy link

bailaohe commented Feb 8, 2018

I had the same issue with docker 18.02-ce. After executing for a interval, some subcommands as 'docker-rmi' hanged forever

@brymon68
Copy link

Having this same issue. Do we need to open another issue?

@akimd
Copy link
Contributor

akimd commented Feb 21, 2018

Hi guys,

No, there's no need for another issue, thanks! We need to understand what is going on here and fix it. Currently we're busy preparing the next releases, we will be back to this issue as soon as possible.

Thanks for your help!

One question though: are you running qcow or raw images? What does

$ ls -l ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.*

give?

@akimd akimd reopened this Feb 21, 2018
@ryanfb
Copy link
Author

ryanfb commented Feb 21, 2018

Thanks for reopening this issue!

For me, that command shows a Docker.qcow2 file. Should I try switching to raw images, and would that happen automatically (if and only if) I upgrade to High Sierra + APFS for my ~/Library volume?

@akimd
Copy link
Contributor

akimd commented Feb 21, 2018

You don't need to update to raw. As a matter of fact I was asking because we found raw disks to be less reliable so far, and I was wondering if it could be related to your problems.

No, we don't migrate from qcow2 to raw, it's only when starting anew that raw might be chosen.

@iainbryson
Copy link

+1 ... I'm having the same issue. Docker for Mac is a better experience than virtual box was, what without the intermediate VM and all, but this makes it torture. docker rmi has about a 50% chance of hanging, so when I run out of space it's an endless cycle of rmi, HANG, stop docker, start docker, rmi, rmi, HANG...

Of course, when it's hanging diagnostics doesn't work. But this is what it says when everything's working:

Docker for Mac: version: 17.12.0-ce-mac55 (18467c0ae7afb7a736e304f991ccc1a61d67a4ab)
macOS: version 10.13.3 (build: 17D102)
logs: /tmp/A25F9E34-E0DC-43CA-A70F-CFD8467AF87C/20180228-081116.tar.gz
[OK]     vpnkit
[OK]     vmnetd
[OK]     dns
[OK]     driver.amd64-linux
[OK]     app
[OK]     virtualization VT-X
[OK]     moby
[OK]     system
[OK]     moby-syslog
[OK]     kubernetes
[OK]     env
[OK]     virtualization kern.hv_support
[OK]     moby-console
[OK]     osxfs
[OK]     logs
[OK]     docker-cli
[OK]     disk

Diagnostic ID: A25F9E34-E0DC-43CA-A70F-CFD8467AF87C

(incidentally, rmi and rm do seem to be the main triggers; otherwise it's pretty solid.

The ls command above gives:

iainbryson@Iains-MacBook-Pro (devel) $ ls -l ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.*
-rw-r--r--@ 1 iainbryson  staff  47927853056 Feb 28 08:15 /Users/iainbryson/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/Docker.qcow2

And this is an APFS volume, if that matters.

@ryanfb
Copy link
Author

ryanfb commented Mar 1, 2018

Still seeing this in 18.03.0-ce-rc1-mac54 (23022).

Usually when this happens, trying to restart or quit-then-start Docker via the Docker for Mac GUI will hang in the "Docker is starting" state, and I have to force quit the com.docker.hyperkit process then open Docker for Mac to get Docker into a usable state again.

Looking around for other solutions, I came across docker/compose#3633, which points (at the end) to moby/moby#35933. This may not be the same issue as what I'm experiencing since people there report that rolling back to 17.09 fixes the issue for them, while I was already experiencing this problem in 17.06. I don't believe that I'm seeing this due to tty, resource, or network issues either.

@akimd
Copy link
Contributor

akimd commented Mar 7, 2018

@ryanfb Thanks for the pointers, these issues are interesting.

I can't reproduce the behaviour. Could you submit a Diagnostic using the latest Edge (mac54 is good)?

@ryanfb
Copy link
Author

ryanfb commented Mar 7, 2018

Diagnostic ID: 230E6503-3092-4DE1-BC76-47C03F92A4D5

I've been trying to dig deeper into this over the last couple days - one thing I've been checking is what's going on inside Docker by using screen ~/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/tty and watching the kernel log messages when this behavior occurs. The first time I did this I was seeing NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [containerd:906]. Googling took me to #1950 where I tried disabling trim as suggested, but I still kept getting hangs, of the form:

[ 1660.441275] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1660.441744]  2-...: (1 GPs behind) idle=467/140000000000000/0 softirq=35675/35675 fqs=12882
[ 1660.442492]  (detected by 0, t=60142 jiffies, g=24859, c=24858, q=266)
[ 1660.443055] Task dump for CPU 2:
[ 1660.443329] scsi_eh_0       R  running task        0   284      2 0x00000008
[ 1660.443976]  0000000000000000 ffffffff955754cd 0000000000000000 ffff9b719da95000
[ 1660.444684]  ffffaf78c091be00 ffff9b719db68000 ffffaf78c091be78 ffff9b719dda44c0
[ 1660.445518]  0000000000000246 ffffffff9557588c ffff9b7193df1218 ffff9b719d02a658
[ 1660.446342] Call Trace:
[ 1660.446548]  [<ffffffff955754cd>] ? ata_scsi_port_error_handler+0x228/0x544
[ 1660.447211]  [<ffffffff9557588c>] ? ata_scsi_error+0xa3/0xdb
[ 1660.447678]  [<ffffffff9554526d>] ? scsi_error_handler+0xaf/0x472
[ 1660.448198]  [<ffffffff950fe5b3>] ? finish_task_switch+0x115/0x18b
[ 1660.448813]  [<ffffffff957f53e3>] ? __schedule+0x36c/0x465
[ 1660.449383]  [<ffffffff955451be>] ? scsi_eh_get_sense+0xdd/0xdd
[ 1660.449974]  [<ffffffff950f7b56>] ? kthread+0xb4/0xbc
[ 1660.450446]  [<ffffffff950f7aa2>] ? init_completion+0x1d/0x1d
[ 1660.450937]  [<ffffffff957f8261>] ? ret_from_fork+0x41/0x50

Since then I've done a few factory resets - after the first of these, I could no longer disable trim per the instructions in this comment, since ~/Library/Containers/com.docker.docker/Data/database/ no longer existed. When Docker hangs I'm still seeing INFO: rcu_sched detected stalls on CPUs/tasks with a similar backtrace, and/or e.g. NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [containerd:906], as in this screenshot:

igi2em83

@febeling
Copy link

febeling commented Mar 8, 2018

On my machine docker --version reliably takes about 30s.

$ time docker --version                                                                                          
Docker version 17.12.0-ce, build c97c6d6
docker --version  0.01s user 0.01s system 0% cpu 30.035 total

@eexit
Copy link

eexit commented Mar 9, 2018

Hello,

I'm also having many issues with the latest Docker version. It takes about 2 min to start although only 1 small dnsmasq service is configured to start along with Docker.

When waking up the mac, docker-compose timeouts when stopping or restarting a stack. Only "fix" is to restart Docker which is another 3 min wait...

@ryanfb
Copy link
Author

ryanfb commented Mar 9, 2018

Almost by accident, I think I've discovered a workaround for my particular case. I decided to try running the same docker-compose setup on my (relatively under-resourced) MacBook Pro instead of my iMac, to see what happened with the latest Docker edge (previously I had encountered the same behavior on both machines). Since the internal (SSD) drive on the MBP was running out of space, I decided to try moving the Docker disk image location (the Docker.qcow2 file) to an external USB drive, using the Docker for Mac UI. Miraculously, I was able to do everything I needed to do without Docker crashing or becoming completely unresponsive.

This gave me the idea to try the same thing on my iMac today - and after moving the Docker disk image off the internal 3TB Fusion Drive to an external USB drive, I seem to be able to do everything I need to do without having Docker crash or become completely unresponsive.

All drives (internal and external) are formatted Mac OS Extended Journaled (case-insensitive), and the internal drives report a S.M.A.R.T. status of "Verified" with no other programs appearing to have issues using them.

Perhaps this is consistent with the ATA/SCSI errors causing a CPU/task stall in my logs above, though I'm not sure what the root cause or error is. The thing consistent across both machines is that the problematic drive for Docker is an internal SSD or Fusion Drive.

@rn
Copy link

rn commented Mar 12, 2018

@ryanfb thanks for the logs above. I extracted the logs from the diagnostics and, for now, just add them here for completeness as there actually were some relevant error messages before the hung task messages:

[  446.797918] br-69957a06e2c6: port 4(veth8a39e2d) entered blocking state
[  446.798583] br-69957a06e2c6: port 4(veth8a39e2d) entered forwarding state
[ 1051.734506] ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6 frozen
[ 1051.735271] ata1.00: cmd 61/00:00:00:04:51/01:00:01:00:00/40 tag 0 ncq dma 131072 out
[ 1051.735271]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.736914] ata1.00: cmd 61/00:08:00:05:51/01:00:01:00:00/40 tag 1 ncq dma 131072 out
[ 1051.736914]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.738570] ata1.00: cmd 61/00:10:00:06:51/01:00:01:00:00/40 tag 2 ncq dma 131072 out
[ 1051.738570]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.740175] ata1.00: cmd 61/00:18:00:07:51/01:00:01:00:00/40 tag 3 ncq dma 131072 out
[ 1051.740175]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.741829] ata1.00: cmd 61/00:20:00:08:51/01:00:01:00:00/40 tag 4 ncq dma 131072 out
[ 1051.741829]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.743430] ata1.00: cmd 61/00:28:00:09:51/01:00:01:00:00/40 tag 5 ncq dma 131072 out
[ 1051.743430]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.745077] ata1.00: cmd 61/00:30:00:0a:51/01:00:01:00:00/40 tag 6 ncq dma 131072 out
[ 1051.745077]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.746778] ata1.00: cmd 61/00:38:00:0b:51/01:00:01:00:00/40 tag 7 ncq dma 131072 out
[ 1051.746778]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.748412] ata1.00: cmd 61/00:40:00:0c:51/01:00:01:00:00/40 tag 8 ncq dma 131072 out
[ 1051.748412]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.750128] ata1.00: cmd 61/00:48:00:0d:51/01:00:01:00:00/40 tag 9 ncq dma 131072 out
[ 1051.750128]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.751764] ata1.00: cmd 61/00:50:00:0e:51/01:00:01:00:00/40 tag 10 ncq dma 131072 out
[ 1051.751764]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.753277] ata1.00: cmd 61/00:58:00:0f:51/01:00:01:00:00/40 tag 11 ncq dma 131072 out
[ 1051.753277]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.755023] ata1.00: cmd 61/00:60:00:10:51/01:00:01:00:00/40 tag 12 ncq dma 131072 out
[ 1051.755023]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.756799] ata1.00: cmd 61/00:68:00:11:51/01:00:01:00:00/40 tag 13 ncq dma 131072 out
[ 1051.756799]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.758374] ata1.00: cmd 61/00:70:00:f3:50/01:00:01:00:00/40 tag 14 ncq dma 131072 out
[ 1051.758374]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.760058] ata1.00: cmd 61/00:78:00:f4:50/01:00:01:00:00/40 tag 15 ncq dma 131072 out
[ 1051.760058]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.761975] ata1.00: cmd 61/00:80:00:f5:50/01:00:01:00:00/40 tag 16 ncq dma 131072 out
[ 1051.761975]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.763722] ata1.00: cmd 61/00:88:00:f6:50/01:00:01:00:00/40 tag 17 ncq dma 131072 out
[ 1051.763722]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.765457] ata1.00: cmd 61/00:90:00:f7:50/01:00:01:00:00/40 tag 18 ncq dma 131072 out
[ 1051.765457]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.767301] ata1.00: cmd 61/00:98:00:f8:50/01:00:01:00:00/40 tag 19 ncq dma 131072 out
[ 1051.767301]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.769089] ata1.00: cmd 61/00:a0:00:f9:50/01:00:01:00:00/40 tag 20 ncq dma 131072 out
[ 1051.769089]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.770706] ata1.00: cmd 61/00:a8:00:fa:50/01:00:01:00:00/40 tag 21 ncq dma 131072 out
[ 1051.770706]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.772230] ata1.00: cmd 61/00:b0:00:fb:50/01:00:01:00:00/40 tag 22 ncq dma 131072 out
[ 1051.772230]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.773659] ata1.00: cmd 61/00:b8:00:fc:50/01:00:01:00:00/40 tag 23 ncq dma 131072 out
[ 1051.773659]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.775117] ata1.00: cmd 61/00:c0:00:fd:50/01:00:01:00:00/40 tag 24 ncq dma 131072 out
[ 1051.775117]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.776791] ata1.00: cmd 61/00:c8:00:fe:50/01:00:01:00:00/40 tag 25 ncq dma 131072 out
[ 1051.776791]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.778550] ata1.00: cmd 61/00:d0:00:ff:50/01:00:01:00:00/40 tag 26 ncq dma 131072 out
[ 1051.778550]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.780251] ata1.00: cmd 61/00:d8:00:00:51/01:00:01:00:00/40 tag 27 ncq dma 131072 out
[ 1051.780251]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.781892] ata1.00: cmd 61/00:e0:00:01:51/01:00:01:00:00/40 tag 28 ncq dma 131072 out
[ 1051.781892]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.783683] ata1.00: cmd 61/00:e8:00:02:51/01:00:01:00:00/40 tag 29 ncq dma 131072 out
[ 1051.783683]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.785207] ata1.00: cmd 61/00:f0:00:03:51/01:00:01:00:00/40 tag 30 ncq dma 131072 out
[ 1051.785207]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 1051.786715] ata1: hard resetting link
[ 1119.329182] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1119.329707] 	2-...: (1 GPs behind) idle=467/140000000000000/0 softirq=35675/35675 fqs=1572
[ 1119.330499] 	(detected by 1, t=6003 jiffies, g=24859, c=24858, q=45)
[ 1119.331166] Task dump for CPU 2:
[ 1119.331517] scsi_eh_0       R  running task        0   284      2 0x00000008
[ 1119.332222]  0000000000000000 ffffffff955754cd 0000000000000000 ffff9b719da95000
[ 1119.333034]  ffffaf78c091be00 ffff9b719db68000 ffffaf78c091be78 ffff9b719dda44c0
[ 1119.333808]  0000000000000246 ffffffff9557588c ffff9b7193df1218 ffff9b719d02a658
[ 1119.334685] Call Trace:
[ 1119.334915]  [<ffffffff955754cd>] ? ata_scsi_port_error_handler+0x228/0x544
[ 1119.335638]  [<ffffffff9557588c>] ? ata_scsi_error+0xa3/0xdb
[ 1119.336229]  [<ffffffff9554526d>] ? scsi_error_handler+0xaf/0x472
[ 1119.336844]  [<ffffffff950fe5b3>] ? finish_task_switch+0x115/0x18b
[ 1119.337453]  [<ffffffff957f53e3>] ? __schedule+0x36c/0x465
[ 1119.338027]  [<ffffffff955451be>] ? scsi_eh_get_sense+0xdd/0xdd
[ 1119.338591]  [<ffffffff950f7b56>] ? kthread+0xb4/0xbc
[ 1119.339057]  [<ffffffff950f7aa2>] ? init_completion+0x1d/0x1d
[ 1119.339734]  [<ffffffff957f8261>] ? ret_from_fork+0x41/0x50
[ 1300.536129] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1300.536665] 	2-...: (1 GPs behind) idle=467/140000000000000/0 softirq=35675/35675 fqs=8009
[ 1300.537446] 	(detected by 1, t=24133 jiffies, g=24859, c=24858, q=122)
[ 1300.538090] Task dump for CPU 2:
[ 1300.538404] scsi_eh_0       R  running task        0   284      2 0x00000008
[ 1300.539118]  0000000000000000 ffffffff955754cd 0000000000000000 ffff9b719da95000
[ 1300.540048]  ffffaf78c091be00 ffff9b719db68000 ffffaf78c091be78 ffff9b719dda44c0
[ 1300.540880]  0000000000000246 ffffffff9557588c ffff9b7193df1218 ffff9b719d02a658
[ 1300.541617] Call Trace:
[ 1300.541863]  [<ffffffff955754cd>] ? ata_scsi_port_error_handler+0x228/0x544
[ 1300.542524]  [<ffffffff9557588c>] ? ata_scsi_error+0xa3/0xdb
[ 1300.543058]  [<ffffffff9554526d>] ? scsi_error_handler+0xaf/0x472
[ 1300.543643]  [<ffffffff950fe5b3>] ? finish_task_switch+0x115/0x18b
[ 1300.544317]  [<ffffffff957f53e3>] ? __schedule+0x36c/0x465
[ 1300.544813]  [<ffffffff955451be>] ? scsi_eh_get_sense+0xdd/0xdd
[ 1300.545376]  [<ffffffff950f7b56>] ? kthread+0xb4/0xbc
[ 1300.545888]  [<ffffffff950f7aa2>] ? init_completion+0x1d/0x1d
[ 1300.546431]  [<ffffffff957f8261>] ? ret_from_fork+0x41/0x50

So this indicates some hang/timeout in the hyperkit blockdevice layer (or further down in the qcow2 code). We had related issues in the past (see moby/hyperkit#94).

@ryanfb you mentioned that your MBP is relatively under-resourced and may have been close to running out of diskspace but also said your iMac has 3TB drive and you had the same issue there. Is the drive in the iMac also close to full?

I've not been able to repro this locally with the two cases you mentioned above

@ryanfb
Copy link
Author

ryanfb commented Mar 12, 2018

The 3TB drive in the iMac has/had about 400GB+ free when I was encountering this problem. In the Docker for Mac UI the disk image was sized to ~200GB with ~20GB used on disk. The MBP is under-resourced in terms of memory/CPU - only 8GB (vs. 32GB in the iMac) and a slower/older CPU.

@raliste
Copy link

raliste commented Mar 27, 2018

+1 Couldn't even get a diagnostic.

@akimd
Copy link
Contributor

akimd commented Mar 28, 2018

Can someone reproduce it with 18.03 (stable or edge) and post diagnostics please?

@rclarkburns
Copy link

@akimd
Diagnostic ID: 3AE6953B-DEE7-4441-A4C9-11E944ECB248

Docker for Mac: version: 18.03.0-ce-mac59 (dd2831d4b7421cf559a0881cc7a5fdebeb8c2b98)
macOS: version 10.12.6 (build: 16G1212)
logs: /tmp/3AE6953B-DEE7-4441-A4C9-11E944ECB248/20180328-122732.tar.gz
failure: docker ps failed: (Failure "docker ps: timeout after 10.00s")
[OK]     vpnkit
[OK]     vmnetd
[OK]     dns
[OK]     driver.amd64-linux
[OK]     virtualization VT-X
[OK]     app
[OK]     moby
[OK]     system
[OK]     moby-syslog
[OK]     kubernetes
[OK]     files
[OK]     env
[OK]     virtualization kern.hv_support
[OK]     osxfs
[OK]     moby-console
[OK]     logs
[ERROR]  docker-cli
         docker ps failed
[OK]     disk

@ptadros
Copy link

ptadros commented Sep 1, 2020

I confirm I have the same issue as @dhinojosa in the same version above (latest version currently). It seems related to Kubernetes as kubectl cluster-info dump returns internal error

Error from server (InternalError): an error on the server ("") has prevented the request from succeeding

I also reset Kubernetes Cluster settings but the same issue.

@ptadros
Copy link

ptadros commented Sep 1, 2020

There is a magic solution here #4239 (comment) until it's get fixed properly.

@dhinojosa
Copy link

dhinojosa commented Sep 30, 2020

BTW, the latest version works like a dream. Version 2.3.0.5 (48029). I no longer have the issue and I can start and stop Kubernetes with ease.

Screen Shot 2020-09-30 at 11 26 32 AM

@docker docker deleted a comment from daveisfera Sep 30, 2020
@fdutey
Copy link

fdutey commented Dec 15, 2020

Updated to 3.0.1 this morning and now everything is completely stucked. Had to reset to factory settings and rebuild all my images just to be able to run one. After a very short while, can not run commands and get hung.
And every time, only solution is to restart the whole computer since "restart docker" is not working. Even quitting docker and relaunching is not working

@pcjmfranken
Copy link

pcjmfranken commented Feb 24, 2021

We've been experiencing these types of freezes at an increasing rate. It's gotten so bad that it almost feels like we're running some unstable nightly build!

Suggested things tried, unfortunately to no effect:

  • Fresh Docker install after a complete and thorough wipe
  • Disabled Kubernetes
  • Created some new containers using different base images, built without cache, ran without bind-mounts, volumes, no port mappings, etc.
  • Disabled the experimental gRPC FUSE file sharing and limited the volumes available for bind-mounting to the bare minimum
  • Disabled all code editor plugins/extensions that hook into Docker

Some observations:

  • These freezes are quite common across a mix of recent and well-specced MBPs and Mac Mini's (no M1's) used by different devs from different companies, on both of Catalina and Big Sur.
  • Plenty of system resources are available to Docker, but there doesn't seem to be any sort of spike in resource demand when these hangs occur anyway.
  • Command complexity in and by itself does not seem to be a factor, as these freezes even trigger for something as silly as running ls in an empty and non-mounted directory during an interactive session.

Perhaps related:

  • Even outside of these freezes, the GUI/Dashboard more often than not takes several seconds to respond to user input, or even just to appear when opened through the menu. Just now it took over 5 seconds for the About window to finally appear.
  • The GUI/Dashboard doesn't always seem to pick up on lifecycle events that were triggered through other interfaces such as the commandline. Due to this, its container and image lists are almost never displaying correct information.

What can we do to help troubleshoot this?

@stephen-turner
Copy link
Contributor

@pcjmfranken We are not getting widespread reports of this, so I would suspect something specific to your environment. Maybe some corporate security software is interrupting the network path for example?

@pcjmfranken
Copy link

@stephen-turner I'm genuinely happy to know that it seems to be working just fine for most!

Someone from ops will be helping us investigate early next week. Findings will be reported back here.

@stephen-turner
Copy link
Contributor

Thanks, I look forward to hearing the results.

@samuelhalff
Copy link

samuelhalff commented Apr 15, 2021

@pcjmfranken any updates? It seems I encountered the same issues, reverted to v2.3. Now kind of scared to upgrade again.

@wlrd
Copy link

wlrd commented Apr 29, 2021

@pcjmfranken any updates? I am running into the same issue.

@samuelhalff
Copy link

@wlrd I haven't seen any updates here. Had reverted to 2.3, it worked overall but was often slow. Tried upgrading again and running into similar issues again. Getting tired of this, I have to restart my docker several times a day.

@rfay
Copy link

rfay commented May 1, 2021

A more recent related issue, specific to mac M1, happens quite regularly, #5590. That one has nothing to do with kubernetes though.

@pcjmfranken
Copy link

Just discovered some of my mail-in replies were never actually posted!

Never managed to pinpoint any causes, but these problems persisted across all environments. The Docker dev setup has since been abandoned by the majority of these projects, as the additional overhead that it brought was becoming too costly.

@dpdornseifer
Copy link

Same behaviour like @samuelhalff, reverted to v2.3 as well.

@bacheson
Copy link

bacheson commented Jul 7, 2021

3.5.1 same issue here

@rfay
Copy link

rfay commented Jul 8, 2021

If you're having trouble on mac M1 and can't run docker run --rm busybox ls then your issue is probably M1-specific, #5590

@samuelhalff
Copy link

Just discovered some of my mail-in replies were never actually posted!

Never managed to pinpoint any causes, but these problems persisted across all environments. The Docker dev setup has since been abandoned by the majority of these projects, as the additional overhead that it brought was becoming too costly.

@pcjmfranken which container tech have you moved to?

@smailpouri
Copy link

same issues on 3.5.2,
Mac os Catalina.
The only solution is to restart docker

@pcjmfranken
Copy link

@samuelhalff Good old VMs.

This issue persists in the projects that still use a Docker-based development environment.

@Caesurus
Copy link

Caesurus commented Aug 31, 2021

recently upgraded to this (on MAC)

✗ docker version                                                                                                                                                                                                                       <aws:dev_backend>
Client:
 Cloud integration: 1.0.17
 Version:           20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        3967b7d
 Built:             Fri Jul 30 19:55:20 2021
 OS/Arch:           darwin/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.8
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       75249d8
  Built:            Fri Jul 30 19:52:10 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.9
  GitCommit:        e25210fe30a0a703442421b0f60afac609f950a3
 runc:
  Version:          1.0.1
  GitCommit:        v1.0.1-0-g4144b63
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

This is now a problem again. sigh. I can volume mount and access it, but as soon as something changes on the mounted volume, and I try to access anything on that mount in the container... it just hangs. I have to restart docker completely to be able to do anything again.

EDIT: going to Preferences -> General -> "Use gRPC FUSE for file sharing" actually fixed it and i was able to use volume mounts normally again.

Hope that helps someone looking for answers

@docker-robott
Copy link
Collaborator

Issues go stale after 90 days of inactivity.
Mark the issue as fresh with /remove-lifecycle stale comment.
Stale issues will be closed after an additional 30 days of inactivity.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle stale

@b4ti1c
Copy link

b4ti1c commented Jan 11, 2022

this issue needs to be reopened. still experiencing the same issue on 4.1.0

@docker-robott
Copy link
Collaborator

Closed issues are locked after 30 days of inactivity.
This helps our team focus on active issues.

If you have found a problem that seems similar to this, please open a new issue.

Send feedback to Docker Community Slack channels #docker-for-mac or #docker-for-windows.
/lifecycle locked

@docker docker locked and limited conversation to collaborators Feb 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.