-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure Machine#Wait() gurantee that the underlying Firecracker process is stopped #182
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
machine_test.go
Outdated
func isProcessAlive(pid int) (bool, error) { | ||
// Using kill(2) with signal=0 to check the existence of the process, | ||
// because os.FindProcess always returns a process, regardless of whether the process is | ||
// alive or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on this?
As far as I see go's FindProcess
does the same:
https://golang.org/src/os/exec_unix.go#L88
https://golang.org/src/os/exec_unix.go#L73
and its expected to receive err os: process already finished
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarification: you can use it like err := os.FindProcess(x).Signal(0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error from Process#Signal() is not typed. The only way to check the type of the error is string comparison. So I'm unsure that is reliable over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think you need to cast the error. Will be nice to get rid of the unneeded if block
https://github.com/golang/go/blob/753d56d3642eb83848aa39e65982a9fc77e722d7/src/os/exec_unix.go#L72-L74
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kill(2) returns EINVAL, EPERM or ESRCH as an error. Go's Signal() distinguish them, but doesn't provide a programmatic way of checking the type.
Would it be better to do something like below rather than using syscall directly? Personally I don't want to depend on the string representation because it might change.
if err.Error() == "os: process already finished" {
...
}
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to Signal
it will only return os: process already finished
error when syscall.ESRCH
occurs. So, to me I think it doesn't matter either way. I think the string comparison is more ugly than checking against ESRCH
, but I am fine with either solution.
edit: Apparently it sets done based on Wait
, which may return the already finished error, which complicates the checks. We have to know whether or not Wait
has been called and if it has, we cannot rely on ESRCH
. So a more simple approach is string comparison of the error. :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently it sets done based on Wait, which may return the already finished error, which complicates the checks.
And all of these are happening inside os.Process. syscall.Kill is much simpler I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use os.Process, but I agree that strings comparison is ugly. Unfortunately sometimes its hard to get away from it (for instance there is a similar checker in containerd: https://github.com/containerd/containerd/blob/b0821c801dc2225bb7478f91e967888a353fb60a/pkg/process/utils.go#L131)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. The last revision uses os.Process instead.
machine.go
Outdated
m.fatalErr = err | ||
} | ||
<-ctx.Done() | ||
m.stopVMM() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be ignoring this error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error from stopVMM(). Yeah, I think we should not ignore the error. Let me fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last commit logs the error. Passing that to errCh seems harder than I thought, since the channel will be closed after the underlying process is finished.
77dd681
to
05aa740
Compare
The method should guarantee that the underlying process is dead, but it doesn't. Signed-off-by: Kazuyoshi Kato <[email protected]>
The context is used to let the SDK kill its underlying Firecracker process. Closing the channel has to be happening after the actual termination of the process. Signed-off-by: Kazuyoshi Kato <[email protected]>
Regardless of how the machine is stopped, Wait() should gurantee that the process is not there. Signed-off-by: Kazuyoshi Kato <[email protected]>
VMCommandBuilder#Build()'s context directly controls exec.Cmd. To test the goroutines inside Machine#startVMM(), we cannot use VMCommandBuilder. Signed-off-by: Kazuyoshi Kato <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix below is needed to fix the issue. While the SDK version is 0.20.x, it should work with Firecracker 0.19.x as well. firecracker-microvm/firecracker-go-sdk#182 Signed-off-by: Kazuyoshi Kato <[email protected]>
The fix below is needed to fix the issue. While the SDK version is 0.20.x, it should work with Firecracker 0.19.x as well. firecracker-microvm/firecracker-go-sdk#182 Signed-off-by: Kazuyoshi Kato <[email protected]>
The fix below is needed to fix the issue. While the SDK version is 0.20.x, it should work with Firecracker 0.19.x as well. firecracker-microvm/firecracker-go-sdk#182 Signed-off-by: Kazuyoshi Kato <[email protected]>
The fix below is needed to fix the issue. While the SDK version is 0.20.x, it should work with Firecracker 0.19.x as well. firecracker-microvm/firecracker-go-sdk#182 Signed-off-by: Kazuyoshi Kato <[email protected]>
The fix below is needed to fix the issue. While the SDK version is 0.20.x, it should work with Firecracker 0.19.x as well. firecracker-microvm/firecracker-go-sdk#182 Signed-off-by: Kazuyoshi Kato <[email protected]>
Integrate fccontrol plugin with shim.
The method should guarantee that the underlying process is dead,
but it didn't.
Signed-off-by: Kazuyoshi Kato [email protected]
Issue #, if available:
Description of changes:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.