-
Notifications
You must be signed in to change notification settings - Fork 58
Attempt to mimic changes from upstream PR#27467 #212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to mimic changes from upstream PR#27467 #212
Conversation
Problem: if a process that's been started for 'docker exec' exits fast enough, the daemon can receive a 'process exited' update from containerd before it starts passing stdio data back and forth, losing its output. Why it happens: the 'process exited' state change message is only processed after acquiring a lock for the container, and while most of the exec() setup is done while holding that lock, the lock is freed when libcontainerd calls back to set up the passing of stdio data. Proposed change: call getExecConfig() before AddProcess(), and modify AddProcess() to take a callback that the daemon can hand it that remembers the exec.Config value that was retrieved. The effect should be the same as that of the changes in moby#27467, but without as much refactoring. Signed-off-by: Nalin Dahyabhai <[email protected]>
|
@ncdc could you test this out? |
|
Can we get a brew/koji build? |
|
@ncdc for which releases? |
|
@nalind I have Fedora 24, RHEL 7.2, and RHEL 7.3 VMs. You pick? 😄 |
|
Try http://koji.fedoraproject.org/koji/taskinfo?taskID=16242921 for f24 scratch builds. Thanks! |
|
retest this please |
|
@rh-atomic-bot retest this please |
|
#209 - a completely different PR 3 days ago produces the same fatal error on testing (I believe) |
|
@jwhonce tests are busted in RH CI, I usually run them locally before merging PR here |
|
@runcom So we have this busted-ness and the containers/image busted-ness. This is being tested elsewhere. Do you know if there is any magic to get the tests to re-run other than adding a new commit? |
|
@jwhonce containers/image isn't busted anymore (your PR over there just requires a rebase now and it will be good to go). As for the tests here, they are busted for various reason, such as, tests virtual machines running out of memory, cross compilation broken by some of our patches and whatnot. I would suggest having a Trello card to track a fix to our testing for projectatomic/docker and in the meantime just running tests locally as I do everytime. |
|
FYI I wasn't able to begin testing this until just now. I'll keep the infinite loop running as long as it will go and report if/when I see it break. |
1b53c8b to
63187af
Compare
|
So far so good. Still running. No flakes. |
|
(F24)
|
|
I'm not sure exactly how long it took, but my reproducer was able to reproduce the issue using the F24 RPMs listed above. |
|
Taking a closer look at the F24 candidate builds here: http://koji.fedoraproject.org/koji/taskinfo?taskID=16242921 I'm afraid it looks like they do not actually contain this fix. This commit is included in the SRPM as a patch but is not applied during the %prep process. I'm re-running a build with these patches included and applied on top of docker-latest-1.12.3 and will update with a link if it is successful. |
|
The x86_64 portion of the build has now finished successfully here: http://koji.fedoraproject.org/koji/taskinfo?taskID=16262294 You can confirm that the attach patch has applied in the build log: https://kojipkgs.fedoraproject.org//work/tasks/2294/16262294/build.log Apologies @ncdc but can you restart testing with this build? |
|
My apologies. I mistakenly assumed that the Fedora .spec used %autosetup, like the EL .spec does, and didn't double-check it. |
|
@nalind - No worries |
|
@ncdc - Is your reproducer something sufficiently well documented and/or scripted that some others of us could run it locally? |
|
Yes. See the docker github issue I opened. On Tuesday, November 1, 2016, Ian McLeod [email protected] wrote:
|
|
I just kicked off a new test using the new F24 RPM. |
|
Any update? |
|
I wasn't able to get it to flake On Tue, Nov 8, 2016 at 5:36 AM Antonio Murdaca [email protected]
|
|
it failed for me last night on rhel 7.3.0 and 1.12.3-4. Here is my setup and load: |
|
I commented on the bz. This is a different failure condition. |
Problem: if a process that's been started for 'docker exec' exits fast enough, the daemon can receive a 'process exited' update from containerd before it starts passing stdio data back and forth, losing its output.
Why it happens: the 'process exited' state change message is only processed after acquiring a lock for the container, and while most of the exec() setup is done while holding that lock, the lock is freed when libcontainerd calls back to set up the passing of stdio data.
Proposed change: call getExecConfig() before AddProcess(), and modify AddProcess() to take a callback that the daemon can hand it that remembers the exec.Config value that was retrieved.
The effect should be comparable to the changes in moby#27467 (in that we now hold the libcontainerd client lock, and make sure that we don't obtain the Container object's lock by calling getExecConfig()), but without additional refactoring.