Error removing intermediate container 20dbfe8b8d9d: Driver devicemapper failed to remove root filesystem #12923

bparees · 2017-02-10T21:25:59Z

2017-02-10T20:30:34.782053000Z Error removing intermediate container 20dbfe8b8d9d: Driver devicemapper failed to remove root filesystem 20dbfe8b8d9d836609fb35fcb7afe644e0a606fdbc3ca769744d65c57240f46a: remove /var/lib/docker/devicemapper/mnt/2ba0d9d550fe1b30b3c59c0c2228deaf81d21635ee5816b9829fdb4be195483b: device or resource busy

as seen in:
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_future/93/consoleFull#-62719347577f0ce7e4b0b14b5836ce6d

we previously had #9548 and #9490 in this space, but they got pretty messy and ultimately closed, but it's not clear to me if we think it should be working at this point or not (or maybe we have a bad docker in our AMIs).... so i'm starting the conversation here. Is the situation that:

we still have no idea
we thought it was fixed, apparently not
it is fixed, but our AMI doesn't contain the fix yet
~~our AMI used to contain the fix and our AMI regressed for some reason~~
we know the issue but we're still waiting for the fix to make it into our docker distribution
it was fixed and the docker package now contains a regression

?

@stevekuznetsov @jwhonce @runcom

The text was updated successfully, but these errors were encountered:

stevekuznetsov · 2017-02-10T23:39:21Z

Docker version on that test is:

Client:
 Version:         1.12.5
 API version:     1.24
 Package version: docker-common-1.12.5-8.el7.x86_64
 Go version:      go1.7.4
 Git commit:      1d8f205
 Built:           Wed Dec 21 08:37:50 2016
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.5
 API version:     1.24
 Package version: docker-common-1.12.5-8.el7.x86_64
 Go version:      go1.7.4
 Git commit:      1d8f205
 Built:           Wed Dec 21 08:37:50 2016
 OS/Arch:         linux/amd64

We have not bumped it in a while (we have had it locked to 1.12.5) so maybe we recently got a newer patch?

jwhonce · 2017-02-22T22:50:55Z

@stevekuznetsov There is a new docker build coming that will add the ID of the device that cannot be deleted. Using that ID and a script from https://bugzilla.redhat.com/show_bug.cgi?id=1391665 we will be able to determine which application is holding that device busy during the attempted delete.

stevekuznetsov · 2017-02-23T12:50:33Z

@jwhonce these failures happen on ephemeral test VMs so it will be impossible for someone to come in and run the script. Are you suggesting that we need to create an exit check for the failed to remove root filesystem ... device or resource busy in the log and run the script?

nanomad · 2017-02-23T22:30:31Z

We actually got our single-node cluster (ha!) stuck with this issue since a couple of days (probably due to some upgrade or sheer coincidence) and we are not able to some builds

Any help on how to debug this issue is more than welcome

I've tried running the script ex-post but I always get "No Pid" output

nanomad · 2017-02-23T23:08:34Z

A bit of differential analysis between a build "type" that succeeds and one that fails if it is of any help:

They are both based on the same builder image (jorgemoralespou/s2i-java with hash bd7903)
They both use maven 3.3.9 to build
The failing one runs npm and webpack during maven package phase to build the front-end assets (node 7.5.0, npm 4.1.2)
Altering the source location of the failing one to a repository that does not use npm and webpack makes the build succeed

It could be that either nodejs, npm or webpack fail to release some kind of resource which in turn creates issues to docker or openshift.

stevekuznetsov · 2017-02-24T01:44:36Z

@nanomad You are able to reproduce the failure consistently with the build that uses npm? We would be very interested in how you set up your cluster (if possible, your MasterConfig) and steps to recreate a namespace like that in which you see the failure. What version of Docker and OpenShift are you running?

nanomad · 2017-02-24T08:11:54Z

@stevekuznetsov We can even go a step further and provide ssh access to RH / Openshift developers if needed. The machine is not a PROD environment to begin with. If you would like to do so, just send me a message on GitHub.

The installation details are the following:

OS: CentOS Linux release 7.3.1611 (Core) (updated on 2017/02/23)
docker-1.12.5-14.el7.centos.x86_64
origin-sdn-ovs-1.4.1-1.el7.x86_64
origin-clients-1.4.1-1.el7.x86_64
origin-node-1.4.1-1.el7.x86_64
origin-master-1.4.1-1.el7.x86_64
origin-1.4.1-1.el7.x86_64
MasterConfig: https://gist.github.com/nanomad/a55e725b93284fc77fcad154508fa5fc
BuildConfig: https://gist.github.com/nanomad/ed4f71d17be407d2e80c846412c9fab0

stevekuznetsov · 2017-02-24T13:03:57Z

Cool. @jwhonce would be interested in SSH I believe, thank you for your help. The reproducer is huge -- I don't think we had one before.

jwhonce · 2017-02-24T20:27:28Z

@stevekuznetsov The more failures we can capture the faster we can zero in on the root cause. If it's possible to automate the capture even better! Thanks.

nanomad · 2017-02-24T21:40:20Z

@stevekuznetsov , @jwhonce you should receive a mail from me shortly

imcleod · 2017-03-06T17:26:46Z

@jwhonce - Should this enter the same MODIFIED/believed-to-be-fixed state as https://bugzilla.redhat.com/show_bug.cgi?id=1391665 ? (Or are we uncertain if this is tracking the same issue.)

cc: @stevekuznetsov

jwhonce · 2017-03-06T17:31:28Z

Re-assigned to @bparees as he provided fix to bug

bparees · 2017-03-06T17:42:09Z

we still see these issues just during normal docker builds when docker is trying to clean up intermediate containers. however they do not appear to cause failures. i'll close this for now(since the failure mode should be fixed by the s2i changes to allow the commit to take longer) and open a new issue for the docker build errors we're seeing.

bparees added component/containers kind/test-flake Categorizes issue or PR as related to test flakes. priority/P1 labels Feb 10, 2017

bparees assigned jwhonce Feb 10, 2017

sdodson mentioned this issue Feb 14, 2017

System container: add wrapper to read environment variables from the service file #12935

Merged

danwinship mentioned this issue Feb 15, 2017

dind: upgrade to fedora 25 #12965

Merged

juanvallejo mentioned this issue Feb 16, 2017

add closure that guarantees mutex unlock in loop #12980

Merged

fabianofranz mentioned this issue Feb 17, 2017

inform port is required as part of set-probe error when port missing #12759

Merged

juanvallejo mentioned this issue Feb 18, 2017

improve oc set volume --override flag description #12838

Merged

sdodson mentioned this issue Feb 21, 2017

install ceph-common pkg on origin to support rbd provisioning #12896

Merged

pweil- mentioned this issue Feb 22, 2017

Fix typos in router code #12983

Merged

legionus mentioned this issue Feb 23, 2017

Verify manifest with remote layers #13001

Merged

gabemontero mentioned this issue Mar 1, 2017

allow build request override of pipeline strategy envs #13160

Merged

tdawson mentioned this issue Mar 2, 2017

[1.4]Use posttrans for docker-excluder (#1404193) #13149

Merged

stevekuznetsov added the dependency/devicemapper label Mar 3, 2017

jwhonce assigned bparees and unassigned jwhonce Mar 6, 2017

bparees closed this as completed Mar 6, 2017

stevekuznetsov mentioned this issue Mar 6, 2017

Extended.[Conformance][registry][migration] manifest migration from etcd to registry storage registry can get access to manifest [local] #13183

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error removing intermediate container 20dbfe8b8d9d: Driver devicemapper failed to remove root filesystem #12923

Error removing intermediate container 20dbfe8b8d9d: Driver devicemapper failed to remove root filesystem #12923

bparees commented Feb 10, 2017 •

edited

Loading

stevekuznetsov commented Feb 10, 2017

jwhonce commented Feb 22, 2017

stevekuznetsov commented Feb 23, 2017

nanomad commented Feb 23, 2017 •

edited

Loading

nanomad commented Feb 23, 2017

stevekuznetsov commented Feb 24, 2017

nanomad commented Feb 24, 2017

stevekuznetsov commented Feb 24, 2017

jwhonce commented Feb 24, 2017

nanomad commented Feb 24, 2017

imcleod commented Mar 6, 2017

jwhonce commented Mar 6, 2017

bparees commented Mar 6, 2017

Error removing intermediate container 20dbfe8b8d9d: Driver devicemapper failed to remove root filesystem #12923

Error removing intermediate container 20dbfe8b8d9d: Driver devicemapper failed to remove root filesystem #12923

Comments

bparees commented Feb 10, 2017 • edited Loading

stevekuznetsov commented Feb 10, 2017

jwhonce commented Feb 22, 2017

stevekuznetsov commented Feb 23, 2017

nanomad commented Feb 23, 2017 • edited Loading

nanomad commented Feb 23, 2017

stevekuznetsov commented Feb 24, 2017

nanomad commented Feb 24, 2017

stevekuznetsov commented Feb 24, 2017

jwhonce commented Feb 24, 2017

nanomad commented Feb 24, 2017

imcleod commented Mar 6, 2017

jwhonce commented Mar 6, 2017

bparees commented Mar 6, 2017

bparees commented Feb 10, 2017 •

edited

Loading

nanomad commented Feb 23, 2017 •

edited

Loading