Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove deprecated gRPC Dial calls #2676

Merged
merged 7 commits into from
Jul 29, 2024
Merged

Remove deprecated gRPC Dial calls #2676

merged 7 commits into from
Jul 29, 2024

Conversation

mtardy
Copy link
Member

@mtardy mtardy commented Jul 16, 2024

Fixes #2675.

Consider also that now with those patches, the retries flag is no longer doing anything, since it was only retrying on the initial connection (while timeout now properly works on all calls). This could be a follow-up PR I think: remove it or add a helper to perform retry on the actual call in case of gRPC errors (and not the actual underlying method error)?

@mtardy mtardy added the release-note/misc This PR makes changes that have no direct user impact. label Jul 16, 2024
@mtardy mtardy changed the title Pr/mtardy/tetra client g rpc Remove deprecated gRPC Dial calls Jul 16, 2024
@mtardy mtardy force-pushed the pr/mtardy/tetra-client-gRPC branch 2 times, most recently from 064de73 to e409abd Compare July 16, 2024 09:56
Copy link

netlify bot commented Jul 16, 2024

Deploy Preview for tetragon ready!

Name Link
🔨 Latest commit 06b85c9
🔍 Latest deploy log https://app.netlify.com/sites/tetragon/deploys/66a75d2b48d50800083039fa
😎 Deploy Preview https://deploy-preview-2676--tetragon.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@mtardy mtardy force-pushed the pr/mtardy/tetra-client-gRPC branch from e409abd to dca9dcc Compare July 16, 2024 16:22
@mtardy mtardy requested a review from willfindlay July 16, 2024 16:23
@mtardy mtardy marked this pull request as ready for review July 16, 2024 16:23
@mtardy mtardy requested a review from a team as a code owner July 16, 2024 16:23
@willfindlay
Copy link
Contributor

Nice, taking a look shortly.

Copy link
Contributor

@willfindlay willfindlay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DialContext got deprecated, so we switch to NewClient that does not
connect to the server, it waits for the first RPC to be called to
establish a connection, thus the retry mechanism at connect no longer
make sense. It could be reimplemented as a retry helper for the RPC
call.

This makes sense to me. Assuming this is true and we establish a new connection on the first RPC call (and not a new connection on every subsequent call), why don't we have the client call a healthcheck RPC or similar (I think we already have one) and put the old retry logic in there?

Comment on lines 59 to 78
func ResolveServerAddress() (string, error) {
if ServerAddress == "" {
sa, err := readActiveServerAddressFromFile(defaults.InitInfoFile)

if err != nil {
logger.GetLogger().WithError(err).WithField(
"defaultServerAddress", defaultServerAddress,
).Debug("failed to resolve server address reading init info file, using default value")
return defaultServerAddress, nil
}

logger.GetLogger().WithFields(logrus.Fields{
"InitInfoFile": defaults.InitInfoFile,
"ServerAddress": sa,
}).Debug("resolved server address using info file")
return sa, nil
}

return ServerAddress, nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're always returning nil here. Perhaps we could consider removing the error return in the type signature?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed thanks

@mtardy
Copy link
Member Author

mtardy commented Jul 19, 2024

why don't we have the client call a healthcheck RPC or similar (I think we already have one) and put the old retry logic in there

I would not do RPC checks that are not asked by the user, it would slow down and would be unexpected if it breaks no? I would maybe do the retry on the request itself or just not do it and let people use retry mechanism on tetra with external utility, using the status code returned for example.

@willfindlay
Copy link
Contributor

why don't we have the client call a healthcheck RPC or similar (I think we already have one) and put the old retry logic in there

I would not do RPC checks that are not asked by the user, it would slow down and would be unexpected if it breaks no? I would maybe do the retry on the request itself or just not do it and let people use retry mechanism on tetra with external utility, using the status code returned for example.

I don't have super strong feelings one way or another, but I do think it would be nice to support retries. Especially since we've had a lot of problems with connection timeouts in tetra in the past.

@mtardy mtardy force-pushed the pr/mtardy/tetra-client-gRPC branch from dca9dcc to c811061 Compare July 19, 2024 18:04
@mtardy
Copy link
Member Author

mtardy commented Jul 19, 2024

why don't we have the client call a healthcheck RPC or similar (I think we already have one) and put the old retry logic in there

I would not do RPC checks that are not asked by the user, it would slow down and would be unexpected if it breaks no? I would maybe do the retry on the request itself or just not do it and let people use retry mechanism on tetra with external utility, using the status code returned for example.

I don't have super strong feelings one way or another, but I do think it would be nice to support retries. Especially since we've had a lot of problems with connection timeouts in tetra in the past.

This is an idea on how it could be done, c811061. It would need a rewrite for those using CliRun that don't return the error out of the closure.

I'll see if I can make work gRPC build-in retry mechanism before trying to extend this. However this work as expected and works along interrupt and timeout which is nice (it wasn't the case before).

@mtardy mtardy force-pushed the pr/mtardy/tetra-client-gRPC branch from c811061 to a696b32 Compare July 23, 2024 11:33
@mtardy
Copy link
Member Author

mtardy commented Jul 23, 2024

but I do think it would be nice to support retries. Especially since we've had a lot of problems with connection timeouts in tetra in the past.

Okay! I spent some amount of time to satisfy you on this one, I deep dive into their built-in backoff mechanism, see a696b32. Now it should work and it's transparent!

@mtardy mtardy requested a review from willfindlay July 23, 2024 11:35
@willfindlay
Copy link
Contributor

willfindlay commented Jul 26, 2024

Now it should work and it's transparent!

That's amazing 😮

@mtardy mtardy force-pushed the pr/mtardy/tetra-client-gRPC branch from a696b32 to 06b85c9 Compare July 29, 2024 09:13
mtardy and others added 7 commits July 29, 2024 11:17
The goal is to remove the connect function later on when dropping
the deprecated DialContext usage in favor of new NewClient.

Signed-off-by: Mahe Tardy <[email protected]>
Switch from grpc.DialContext to grpc.NewClient which does not perform
any I/O and wait lazily for the first RPC call to be made to connect.
This is the way to go as DialContext is getting deprecated and the gRPC
team has been writing that using the other is an anti-pattern.

https://github.com/grpc/grpc-go/blob/master/Documentation/anti-patterns.md

Signed-off-by: Mahe Tardy <[email protected]>
DialContext got deprecated, so we switch to NewClient that does not
connect to the server, it waits for the first RPC to be called to
establish a connection, thus the retry mechanism at connect no longer
make sense. It could be reimplemented as a retry helper for the RPC
call.

Signed-off-by: Mahe Tardy <[email protected]>
Reuse the machinery created in tetra/common to use grpc.NewClient
instead of the deprecated DialContext.

Signed-off-by: Mahe Tardy <[email protected]>
Here we keep the previous behavior by waiting for the connection with
WaitForStateChange instead of connecting at a later stage when
performing a RPC.

Signed-off-by: Mahe Tardy <[email protected]>
Signed-off-by: cilium-renovate[bot] <134692979+cilium-renovate[bot]@users.noreply.github.com>
gRGC A6 - gRPC Retry Design (a.k.a. built in backoff retry)
https://github.com/grpc/proposal/blob/master/A6-client-retries.md was
implemented by grpc/grpc-go#2111 but unusable
for a long time since maxAttempts was limited to hardcoded 5
(grpc/grpc-go#4615), recent PR fixed that
grpc/grpc-go#7229.

It's transparent to the user, to see it in action, make sure the gRPC
server is unreachable (do not start tetragon for example), run tetra
with: GRPC_GO_LOG_SEVERITY_LEVEL=warning <tetra cmd>

Note that logs don't always have the time to be pushed before exit so
output might be a bit off but the number of retries is respected (you
can debug or synchronously print in the grpc/stream.c:shouldRetry or
:withRetry to verify).

Also note that the final backoff duration is completely random and
chosen between 0 and the final duration that was computed via to the
params: https://github.com/grpc/grpc-go/blob/v1.65.0/stream.go#L702

Signed-off-by: Mahe Tardy <[email protected]>
@mtardy mtardy force-pushed the pr/mtardy/tetra-client-gRPC branch from 06b85c9 to f5a92b6 Compare July 29, 2024 09:18
@mtardy mtardy merged commit ebae77f into main Jul 29, 2024
40 checks passed
@mtardy mtardy deleted the pr/mtardy/tetra-client-gRPC branch July 29, 2024 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/misc This PR makes changes that have no direct user impact.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace deprecated grpc.DialContext
2 participants