-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[supervisor] set pod failure reason when supervisor is reaped #20318
Conversation
1267b7c
to
b706c36
Compare
b706c36
to
1b6bde8
Compare
New and removed dependencies detected. Learn more about Socket for GitHub ↗︎
🚮 Removed packages: golang/github.com/ramr/[email protected] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good, a brief suite of tests also didn't turn up anything unexpected. Left one relevant comment.
Nice work!
select { | ||
case <-ctx.Done(): // timeout | ||
case exitCode := <-handledByReaper: | ||
handleSupervisorExit(exitCode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sweet!
@filiptronicek Thanks for your review! I will revert debug commit later today and 🚢 |
} | ||
log.WithError(err).Error("supervisor run error") | ||
return | ||
} | ||
}() | ||
// start the reaper to clean up zombie processes | ||
reaper.Reap() | ||
reaper.Start(reaper.Config{ | ||
Pid: -1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this config mean? Would it make sense to add a comment for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This config comes from default reaper.Reap()
https://github.com/ramr/go-reaper/blob/d197ec784a27195ab4144ba839f450af47909776/reaper.go#L89-L98
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome find @mustard-mh ! 🚀 ✨
Test worked 10/10 ✔️
This reverts commit 1b6bde8.
Description
Root problem of CLC-877:
There can be a racing between
reaper.Reap()
andrunCommand.Wait()
. Our logic can processrunCommand.Wait()
well but once it's reaped byreaper.Reap()
we know nothing (like exitCode). And the pod will exit without message or proper exitCode and then reach this line to update pod failed reasongitpod/components/ws-manager-mk2/controllers/status.go
Line 386 in 1267b7c
We created a fork gitpod-io/go-reaper#1, temporary add a function ourselves, and after upstream https://github.com/ramr/go-reaper/tree/notifier to release, we need a follow-up PR
Related Issue(s)
Fixes CLC-877
How to test
Note
Please use https://github.com/gitpod-io/empty to open workspaces to save space usage of preview env
xxx timed out to start after 2 minutes
(as we don't know when the racing will happen, but it's very frequently with this case actually, ten times should be fine -> you could start 4 workspaces at the same time)gp timeout set 1m
, to see it doesn't break timed out casegp stop
Documentation
Preview status
Gitpod was successfully deployed to your preview environment.
Build Options
Build
Run the build with werft instead of GHA
Run Leeway with
--dont-test
Publish
Installer
Add desired feature flags to the end of the line above, space separated
Preview Environment / Integration Tests
If enabled this will build
install/preview
If enabled this will create the environment on GCE infra
Saves cost. Untick this only if you're really sure you need a non-preemtible machine.
Valid options are
all
,workspace
,webapp
,ide
,jetbrains
,vscode
,ssh
. If enabled,with-preview
andwith-large-vm
will be enabled./hold