Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add CacheSchemas option to Serializer #1151

Merged
merged 1 commit into from
Mar 15, 2024

Conversation

fzmoment
Copy link
Contributor

This adds code to the Serializer that caches the SchemaInfo for each FileDescriptor, which improves performance by over an order of magnitude by caching the string manipulation bottleneck (see flamegraph here). In order to do this, there was also some refactoring to isolate the "cachable" code (getting SchemaInfo from a proto).

Benchmark results:

go test -bench . -benchmem
goos: darwin
goarch: arm64
pkg: github.com/confluentinc/confluent-kafka-go/v2/schemaregistry/serde/protobuf
BenchmarkProtobufSerWithReference-8         	   30640	     38625 ns/op	  21778 B/op	     696 allocs/op
BenchmarkProtobufSerWithReferenceCached-8   	 1000000	      1047 ns/op	   1089 B/op	      10 allocs/op
BenchmarkProtobufDeserWithReference-8       	 2469417	       490.1 ns/op	    304 B/op	       8 allocs/op
PASS

This is analogous to the approach take in #1128, but the main difference is that this caching is gated behind a config which defaults to false (i.e. default no behavior change). I believe this optimization can be used by most use cases, but there is a specific scenario where it is not appropriate which is why it is gated behind a config:

  • The caching uses the file descriptor as key, which reflects the state of the protobuf file a message comes from
  • Thus, the exact same protobuf file/message with a different version of a dependency will result in a duplicate key
  • So caching should not be used for applications that are not creating new protobuf messages themselves (e.g. proxying proto messages from some data source), or that explicitly include multiple versions of the same set of bindings

Since this caching can be enabled for most users (generally users are generating one set of protobuf bindings and creating messages based on that and thus there won't be multiple versions of protobufs within one run of the application) I think it's appropriate to have this upstreamed into the confluent-kafka-go library

Test: ran existing tests with config enabled; added benchmarks

@rayokota
Copy link
Member

Thanks @fzmoment , I'll take a look

Copy link
Member

@rayokota rayokota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fzmoment , LGTM!

@rayokota rayokota merged commit f1230c0 into confluentinc:master Mar 15, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants