Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unstable vz driver network issue #1543

Merged
merged 1 commit into from
May 19, 2023
Merged

Conversation

balajiv113
Copy link
Member

This should hopefully fix the network freeze when using vz driver

#1200 (comment)

I didn't add fix issue id's mainly to try and see if this really fixes or not.

What is happening, why this change ?
When we use file.FD(), whenever the file gets closed or garbage collected (finaliser closing file) the FD give before becomes invalid.

In our case, we use file.FD() for both client and server connections. Since these are DGRAM connection, if we try to read on a closed FD we won't get any error instead it get stuck indefinitely. This could be the reason for the freeze of vz network

Before this change, i verified that that file getting garbage collected in ~2mins. So logically FD would become invalid at some point.

@AkihiroSuda AkihiroSuda added this to the v0.16.0 milestone May 17, 2023
AkihiroSuda
AkihiroSuda previously approved these changes May 17, 2023
Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@balajiv113
Copy link
Member Author

Lets hold the merge.

Unfortunately this doesn't fix it :( I tried a long running instance in M1 and the freeze still happened

@AkihiroSuda AkihiroSuda marked this pull request as draft May 17, 2023 10:02
@AkihiroSuda AkihiroSuda modified the milestones: v0.16.0, v1.0 (tentative) May 17, 2023
Signed-off-by: Balaji Vijayakumar <[email protected]>
@balajiv113
Copy link
Member Author

@AkihiroSuda
I have done a additional change to wrap gvisor-tap-vsock connection with UDPFileConn.go

In this we are calling setReadDeadline before every read. This will make sure if the UDP connection is not closed. Before trying to read. Reading on closed UDP connection will end up getting stuck in that read

We can merge these current set of changes if it looks good mainly to get more info when error happens again

@balajiv113
Copy link
Member Author

I am testing in my M1 from yesterday, with both long running VM and as well as load testing network with multiple torrent download. Unfortunately freeze didn't till now.

@AkihiroSuda
Copy link
Member

Thanks

Unfortunately freeze didn't till now.

s/Unfortunately/Fortunately/ ? 🙂

@balajiv113 balajiv113 marked this pull request as ready for review May 19, 2023 09:43
@balajiv113
Copy link
Member Author

s/Unfortunately/Fortunately/ ?

I was expecting it to freeze so unfortunate :D

// Check if the connection has been closed
if err := conn.SetReadDeadline(time.Time{}); err != nil {
if opErr, ok := err.(*net.OpError); ok && opErr.Err.Error() == "use of closed network connection" {
return 0, errors.New("UDPFileConn connection closed")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap the original err?

Suggested change
return 0, errors.New("UDPFileConn connection closed")
return 0, fmt.Errorf"the UDPFileConn connection closed: %w", err)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could but since we are already checking if error is equal to use of closed network connection

I thought we won't be getting anything new out of error object

@AkihiroSuda AkihiroSuda modified the milestones: v1.0 (tentative), v0.16.0 May 19, 2023
@AkihiroSuda AkihiroSuda merged commit 95767ad into lima-vm:master May 19, 2023
@balajiv113 balajiv113 deleted the fix-nw branch May 20, 2023 03:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants