Add support for otel tracing of grpc calls#140
Add support for otel tracing of grpc calls#140k8s-ci-robot merged 5 commits intokubernetes-csi:masterfrom
Conversation
|
Hi @Fricounet. Thanks for your PR. I'm waiting for a kubernetes-csi member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/easycla |
f7e2b97 to
5bd2d56
Compare
|
/ok-to-test |
|
/cc |
|
Hey @gnufied, I see you did a PR recently #143 to allow disabling metrics on gRPC connection. You did something fairly similar to what I wanted to do here to add otel tracing. This made me realise that this approach lacks flexibility because we won't be able to have a gRPC connection with tracing without metrics for instance. Unless we write a separate function for each case which does not seem like a good idea. So I think this part need to be refactored. I was thinking about using the func Connect(address string, metricsManager metrics.CSIMetricsManager, options ...Option) (*grpc.ClientConn, error) {
if metricsManager != nil {
options = append(options, withMetrics(metricsManager))
}
return connect(address, []grpc.DialOption{grpc.WithTimeout(time.Second * 30)}, options)
}For your use case, instead of doing: |
|
Sure, Feel free to refactor your PR to design you proposed. And lets make sure this handles tracing changes you are introducing here too. |
Extend the existing `Option` functionality for connect to be more flexible. Configuring how `connect` behaves relies on new functions like `withMetrics` or `withTimeout` that need to be passed as options at the caller level. This will make it easier to add new configuration options without breaking the api.
Create a new `WithOtelTracing` option to be used by gRPC client when connecting to the CSI server with `Connect`. This function adds a gRPC interceptor that will provide basic tracing of the gRPC calls.
5bd2d56 to
46cded8
Compare
|
@gnufied I just pushed the refactor with the otel tracing on top if you want to take a look :) |
|
|
||
| // ConnectWithoutMetrics behaves exactly like Connect except no metrics are recorded. | ||
| func ConnectWithoutMetrics(address string, options ...Option) (*grpc.ClientConn, error) { | ||
| return connect(address, nil, []grpc.DialOption{grpc.WithTimeout(time.Second * 30)}, options) |
There was a problem hiding this comment.
I wouldn't remove ConnectWithoutMetrics yet, because it means at least one or more user of library will be broken.
There was a problem hiding this comment.
Yes sure, I had not realised a new version was released with this change available so I thought nobody could use it. I will put it back with a deprecation warning.
There was a problem hiding this comment.
Put the function back :)
| options = append(options, withMetrics(metricsManager)) | ||
| } | ||
| return connect(address, options) | ||
| } |
There was a problem hiding this comment.
I was thinking, a builder pattern might be more suitable for building options to connect. So something like:
type ConnectionBuilder struct {
address string
metricsManager metrics.CSIMetricsManager
timeout time.Duration
enableOtelTracing bool
}
func NewConnectionBuilder(address string) *ConnectionBuilder {
return &ConnectionBuilder{
address: address,
}
}
func (b *ConnectionBuilder) WithMetrics(mm metrics.CSIMetricsManager) *ConnectionBuilder {
b.metricsManager = mm
return b
}
func (b *ConnectionBuilder) WithTimeout(td time.Duration) *ConnectionBuilder {
b.timeout = td
return b
}
func (b *ConnectionBuilder) WithOtelTracing() *ConnectionBuilder {
b.enableOtelTracing = true
return b
}
func (b *ConnectionBuilder) Connect() (*grpc.ClientConn, error) {
return connect(b.address, b.metricsManager,....)
}So a caller would do something like:
connectionBuilder := NewConnectionBuilder("unix:////blah").WithMetrics(xxx).WithTimeout(xxx)
connection, err := connectionBuilder.Connect()
This might be more easier to extend in future. I do realize, this is new interface functions but it should be fine to use it. We can eventually deprecate both Connect and ConnectWithoutMetrics in future.
What do you think?
There was a problem hiding this comment.
I do not have a strong opinion on this but here are some thoughts:
- I am not a big fan of breaking the interface because it means people will have to update their code even if they do not want to change the behaviour of
Connect. Consideringcsi-lib-utilsis used in all sidecars and some csi-drivers that makes a fair amount of 'clients' that will have to update their code. - initially, I sticked to the
Optionspattern because I saw a similar patterns in other CSI repos (metrics in csi-lib-utils, aws-ebs-csi-driver, azuredisk-csi-driver, ...) so I felt like sticking to the same pattern was a good idea for code consistency. - on the other hand, I did not see any other builder pattern in CSI repos (if you have examples, feel free to share them) so it does not feel very consistent with the existing code.
But if you feel like a builder pattern is truly the way to go here, I'm fine with implementing it.
Also, maybe we can ask other folks' opinion on this? Not too familiar with who would be the right person though :/
There was a problem hiding this comment.
okay. I can probably live with function(option) pattern. I have a minor question below. Another thing is - we are about to start cutting CSI sidecars. Were you hoping to merge this in sidecars and start adding CLI option? Do you want to do this after this release or in current release? (I pinged you on slack with same question).
There was a problem hiding this comment.
Btw supplying a variadic number of arguments for setting options vs builder pattern are basically two sides of the same coin and not all that much different. So basically:
foo = Connect("foo", withBar(), withBaz(), withFoo())
vs
foo = NewConnect("foo").WithBar().WithBaz().WithFoo()I am not a fan of functions that takes too many parameters. Also it may just be me, but those functions that returns functions, only for setting a field in a option struct seems slightly complicated for its own good (it takes a minute for me to parse those), where as builder seems more straightforward.
I agree that we shouldn't break existing interfaces and we can leave Connect as it is, but I do tend to prefer the API of builder vs functions that takes options.
| return func(o *options) { | ||
| o.enableOtelTracing = true | ||
| } | ||
| } |
There was a problem hiding this comment.
btw why are withTimeout and withMetrics unexported but WIthOtelxxx is exported?
There was a problem hiding this comment.
Mainly because they were already set when calling Connect with the previous implementation and I did not want to let the user override it (because there was no need until now) and if the need were to arise, the functions could simply be exported.
But I can also directly export them and let them override the default I set in Connect if you prefer.
There was a problem hiding this comment.
I exported the function and let them override the default set in Connect
…m/otel-tracing-grpc
To not break the interface for existing clients using the function.
If they are set, they will override the default values that are set when calling `Connect`.
|
I think I addressed all your comments @gnufied |
|
/lgtm |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Fricounet, xing-yang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Can we use a version of go.opentelemetry.io metrics that is compatible among component-base, csi-lib-util and external-provisioner? |
|
Hey @sunnylovestiramisu, from what I see, the versions used are as follow: How do you want to go about it? Personally, I'd prefer all versions to be upgraded to v0.38.0, however I can try to raise a PR to downgrade the version to v0.31.0 in this repo if you prefer? |
|
Looking at if I can downgrade the version to v0.31.0 here, I fear that won't be possible. The dependency on |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Connectto rely more on Options to configure how the gRPC client connect to the server. Configuring howconnectbehaves relies on new functions likewithMetricsorwithTimeoutthat need to be passed as options at the caller level. This will make it easier to add new configuration options without breaking the api in the future.WithOtelTracingoption to be used by gRPC client when connecting to the CSI server withConnect. This function adds a gRPC interceptor that will provide basic tracing of the gRPC calls.Once it is merged and released, I will go through the different CSI sidecars to allow the use of this option behind a feature flag
--enable-otel-tracing.Below is an example of what it looks like when using this change on a CSI sidecar. I have otel tracing activated on the CSI driver (server) and the sidecar (client). We can see a trace composed of 2 spans with their duration.

Which issue(s) this PR fixes:
Fixes #139
Special notes for your reviewer:
I split the PR in 2 commits (dependency change first and code change next) to make it easier to review. If you prefer to only merge one commit, I can squash commits once the review is done.
Does this PR introduce a user-facing change?: