Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure Machine#Wait() gurantee that the underlying Firecracker process is stopped #182

Merged
merged 4 commits into from
Feb 4, 2020

Conversation

kzys
Copy link
Contributor

@kzys kzys commented Jan 31, 2020

The method should guarantee that the underlying process is dead,
but it didn't.

Signed-off-by: Kazuyoshi Kato [email protected]

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@kzys kzys marked this pull request as ready for review January 31, 2020 18:51
@kzys kzys changed the title Test that Machine#Wait() with context cancellation Make sure Machine#Wait() gurantee that the underlying Firecracker process is stopped Jan 31, 2020
Copy link
Contributor

@smira smira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@kzys kzys requested review from xibz, IRCody and mxpv February 3, 2020 19:02
machine_test.go Outdated
func isProcessAlive(pid int) (bool, error) {
// Using kill(2) with signal=0 to check the existence of the process,
// because os.FindProcess always returns a process, regardless of whether the process is
// alive or not.
Copy link
Contributor

@mxpv mxpv Feb 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on this?
As far as I see go's FindProcess does the same:
https://golang.org/src/os/exec_unix.go#L88
https://golang.org/src/os/exec_unix.go#L73
and its expected to receive err os: process already finished

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification: you can use it like err := os.FindProcess(x).Signal(0)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error from Process#Signal() is not typed. The only way to check the type of the error is string comparison. So I'm unsure that is reliable over time.

https://github.com/golang/go/blob/753d56d3642eb83848aa39e65982a9fc77e722d7/src/os/exec_unix.go#L53

Copy link
Contributor

@xibz xibz Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think you need to cast the error. Will be nice to get rid of the unneeded if block
https://github.com/golang/go/blob/753d56d3642eb83848aa39e65982a9fc77e722d7/src/os/exec_unix.go#L72-L74

Copy link
Contributor Author

@kzys kzys Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kill(2) returns EINVAL, EPERM or ESRCH as an error. Go's Signal() distinguish them, but doesn't provide a programmatic way of checking the type.

Would it be better to do something like below rather than using syscall directly? Personally I don't want to depend on the string representation because it might change.

if err.Error() == "os: process already finished" {
  ...
}

Thoughts?

Copy link
Contributor

@xibz xibz Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to Signal it will only return os: process already finished error when syscall.ESRCH occurs. So, to me I think it doesn't matter either way. I think the string comparison is more ugly than checking against ESRCH, but I am fine with either solution.

edit: Apparently it sets done based on Wait, which may return the already finished error, which complicates the checks. We have to know whether or not Wait has been called and if it has, we cannot rely on ESRCH. So a more simple approach is string comparison of the error. :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently it sets done based on Wait, which may return the already finished error, which complicates the checks.

And all of these are happening inside os.Process. syscall.Kill is much simpler I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use os.Process, but I agree that strings comparison is ugly. Unfortunately sometimes its hard to get away from it (for instance there is a similar checker in containerd: https://github.com/containerd/containerd/blob/b0821c801dc2225bb7478f91e967888a353fb60a/pkg/process/utils.go#L131)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. The last revision uses os.Process instead.

machine.go Outdated
m.fatalErr = err
}
<-ctx.Done()
m.stopVMM()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be ignoring this error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error from stopVMM(). Yeah, I think we should not ignore the error. Let me fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last commit logs the error. Passing that to errCh seems harder than I thought, since the channel will be closed after the underlying process is finished.

@kzys kzys force-pushed the wait-and-cancel branch 2 times, most recently from 77dd681 to 05aa740 Compare February 4, 2020 01:03
The method should guarantee that the underlying process is dead,
but it doesn't.

Signed-off-by: Kazuyoshi Kato <[email protected]>
The context is used to let the SDK kill its underlying Firecracker
process. Closing the channel has to be happening after the actual
termination of the process.

Signed-off-by: Kazuyoshi Kato <[email protected]>
Regardless of how the machine is stopped, Wait() should gurantee that
the process is not there.

Signed-off-by: Kazuyoshi Kato <[email protected]>
machine_test.go Outdated Show resolved Hide resolved
VMCommandBuilder#Build()'s context directly controls exec.Cmd.

To test the goroutines inside Machine#startVMM(), we cannot use
VMCommandBuilder.

Signed-off-by: Kazuyoshi Kato <[email protected]>
Copy link
Contributor

@xibz xibz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@mxpv mxpv merged commit c569328 into firecracker-microvm:master Feb 4, 2020
kzys added a commit to kzys/firecracker-containerd that referenced this pull request Feb 10, 2020
kzys added a commit to kzys/firecracker-containerd that referenced this pull request Feb 11, 2020
The fix below is needed to fix the issue. While the SDK version is
0.20.x, it should work with Firecracker 0.19.x as well.

firecracker-microvm/firecracker-go-sdk#182

Signed-off-by: Kazuyoshi Kato <[email protected]>
kzys added a commit to kzys/firecracker-containerd that referenced this pull request Feb 11, 2020
The fix below is needed to fix the issue. While the SDK version is
0.20.x, it should work with Firecracker 0.19.x as well.

firecracker-microvm/firecracker-go-sdk#182

Signed-off-by: Kazuyoshi Kato <[email protected]>
kzys added a commit to kzys/firecracker-containerd that referenced this pull request Feb 11, 2020
The fix below is needed to fix the issue. While the SDK version is
0.20.x, it should work with Firecracker 0.19.x as well.

firecracker-microvm/firecracker-go-sdk#182

Signed-off-by: Kazuyoshi Kato <[email protected]>
kzys added a commit to kzys/firecracker-containerd that referenced this pull request Feb 12, 2020
The fix below is needed to fix the issue. While the SDK version is
0.20.x, it should work with Firecracker 0.19.x as well.

firecracker-microvm/firecracker-go-sdk#182

Signed-off-by: Kazuyoshi Kato <[email protected]>
kzys added a commit to kzys/firecracker-containerd that referenced this pull request Feb 12, 2020
The fix below is needed to fix the issue. While the SDK version is
0.20.x, it should work with Firecracker 0.19.x as well.

firecracker-microvm/firecracker-go-sdk#182

Signed-off-by: Kazuyoshi Kato <[email protected]>
pendo324 pushed a commit to pendo324/firecracker-go-sdk that referenced this pull request Aug 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants