Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CASSGO-1 CASSGO-30 Native Protocol 5 Support #1822

Open
wants to merge 1 commit into
base: trunk
Choose a base branch
from

Conversation

worryg0d
Copy link
Contributor

@worryg0d worryg0d commented Sep 16, 2024

Overview

C* 4.0 introduced Native Protocol 5, so we should make gocql to support it.
This PR consists of other my PRs, which also provides support for the v5 version of the protocol.

This PR provides support of Native Protocol 5 features as:

@worryg0d worryg0d force-pushed the native-proto-5-support-all-merged branch 4 times, most recently from d67fcb1 to 93933c6 Compare September 19, 2024 12:49
@worryg0d worryg0d force-pushed the native-proto-5-support-all-merged branch 3 times, most recently from 47a3ae9 to b617638 Compare October 1, 2024 07:03
@joao-r-reis joao-r-reis added the protocol_v5 Tickets related to supporting protocol version 5 label Oct 1, 2024
@worryg0d
Copy link
Contributor Author

worryg0d commented Oct 1, 2024

This PR is merged from #1778, #1773, #1782, and #1781, so we could close them in favor of this one

@joao-r-reis
Copy link
Contributor

What about #1783

Copy link
Member

@lukasz-antoniak lukasz-antoniak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied unsolved comments fromCopied unresolved comments from previous PRs.

@lukasz-antoniak
Copy link
Member

What about #1783

@worryg0d, IMHO changes in #1783 are still needed and improve code maintainability.

@worryg0d
Copy link
Contributor Author

worryg0d commented Oct 9, 2024

Thanks a lot for reaching out!

This PR is lacking changes from #1783. I will ask @testisnullus if we could merge it into this PR.

@testisnullus
Copy link

@worryg0d @lukasz-antoniak I think we can keep the current implementation and merge the #1783 PR into this one.

@worryg0d worryg0d force-pushed the native-proto-5-support-all-merged branch 2 times, most recently from 3f50a90 to fbdaa2b Compare October 10, 2024 12:14
@worryg0d
Copy link
Contributor Author

I merged changes from #1783 into this one.

Copy link
Contributor

@joao-r-reis joao-r-reis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass review. I'll need some extra time to properly review the envelope/message format changes.

@worryg0d
Copy link
Contributor Author

Hey, @joao-r-reis. Thanks a lot for your feedback! I'm pleased this PR is moving. I will take a look at your comments and suggestions tomorrow. I hope everything will be fine here.

@worryg0d
Copy link
Contributor Author

Hello, @joao-r-reis. I added some changes following your comments and suggestions. Could you please take a look at this one again? Also, I have a question regarding to cache of prepared statements. Could you please look at this one too?

Thanks a lot.

@ribaraka
Copy link

@joao-r-reis
Copy link
Contributor

I'm reviewing the protocol/envelope changes for the new protocol v5 frame format and one thing I'm noticing is that the name frame can either reference the "old frame" (called envelope in v5) or the "new frame" and you can't really tell easily which one it is when reading a particular function that references that name (one has to read the "surrounding" code to understand where it is being called and if it refers to the protocol v5 format or not).

In the java driver this was avoided by calling the new v5 frame a "segment" while "frame" essentially always refers to the envelope (so "frame" remains the same as a concept). It's not ideal to use a word that is not even referenced in the protocol spec but I think having this clear distinction between the two would be very helpful to make the code easier to read (also prevents bugs by someone conflating the two while modifying the code).

@worryg0d
Copy link
Contributor Author

I'm reviewing the protocol/envelope changes for the new protocol v5 frame format and one thing I'm noticing is that the name frame can either reference the "old frame" (called envelope in v5) or the "new frame" and you can't really tell easily which one it is when reading a particular function that references that name (one has to read the "surrounding" code to understand where it is being called and if it refers to the protocol v5 format or not).

In the java driver this was avoided by calling the new v5 frame a "segment" while "frame" essentially always refers to the envelope (so "frame" remains the same as a concept). It's not ideal to use a word that is not even referenced in the protocol spec but I think having this clear distinction between the two would be very helpful to make the code easier to read (also prevents bugs by someone conflating the two while modifying the code).

I agree because I had the same problems distinguishing frames and envelopes. I wanted to call it another way, but according to proto spec, it should be called as frames so I decided to leave this part until it got reviewed. Then we could call it Segment instead of frame. Also, it makes it easy for other contributors (e.a. from java-driver) to distinguish them.

Copy link
Contributor

@joao-r-reis joao-r-reis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, added a few comments. Still working on the v5 frame format review (I'm reading up on the format spec and also reading some java driver code for comparison)

// 2.3.1 Initial Handshake
// In order to support both v5 and earlier formats, the v5 framing format is not
// applied to message exchanges before an initial handshake is completed.
startupCompleted := &atomic.Bool{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause the minimum Go version to be 1.19 (instead of the current 1.17 according to my experiments). I raised this on the ASF slack channel to see if people think it's ok otherwise we might have to use atomic.Value instead...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I missed it. It's kinda a shame that we can't use more distinguishing atomic types instead of generic Value... However, the code base already has some similar things, so changing it is not a very big problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd rather use atomic.Bool and 1.19 is 3 versions behind what we currently officially support anyway so personally I don't think it is a big deal to bump the version on go.mod to 1.19 but let's see if others chime in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I'll leave it by the time when we receive some feedback on it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to keep this one open until we have a second reviewer to look at this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think forcing 1.19 is fine.

Copy link
Contributor

@joao-r-reis joao-r-reis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of my comments are NITs related to code readability. Feel free to close any of these specific ones that you don't agree with. The implementation itself looks very solid 👍

@worryg0d
Copy link
Contributor Author

@joao-r-reis I adjusted the CI in this PR to run integration tests over proto 5 and the lz4 compressor. Could you please approve running the workflow to see if nothing breaks?

@worryg0d
Copy link
Contributor Author

Hello. This PR has already been floating for quite a while. Is there anything I can do to push it forward?

@joao-r-reis
Copy link
Contributor

Hello. This PR has already been floating for quite a while. Is there anything I can do to push it forward?

I'd like to get a +1 from another committer before merging this one because of its size. I think @jameshartig will take a look soon

go.mod Outdated
gopkg.in/yaml.v3 v3.0.1 // indirect
)

replace github.com/gocql/gocql/lz4 => ./lz4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed. Go should know what the module is based on the module line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I was trying to run tests over lz4 pkg with changes I faced compilation errors. It looks like go uses vendored cassandra-gocql-driver/lz4 instead of local, so I decided to add this. There is PR #1842 that moves lz4 to gocql module. If it is merged first, It will enable me to remove this replace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we should merge #1842 first, can you take a quick look @jameshartig ? It is very small

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be good to merge

@jameshartig
Copy link
Contributor

@worryg0d apologies for the delay. I only had a few comments. I'm mostly concerned about not testing the old protocol and compressor for anyone not using proto 5.

@worryg0d
Copy link
Contributor Author

@jameshartig Thanks a lot for the feedback! I'll try to update the CI matrix to run tests on proto 4 + snappy and proto 5 + lz4

@worryg0d worryg0d force-pushed the native-proto-5-support-all-merged branch 2 times, most recently from 80e9056 to f6b73d9 Compare February 20, 2025 13:44
@worryg0d
Copy link
Contributor Author

Hello @jameshartig @joao-r-reis. I added to the CI matrix proto version and snappy compressor to run tests over both versions and compressors.

I also squashed all commits and changed a commit message.

One thing I left is replacing in go.mod until #1842 is merged.

Could you please review the CI changes and approve the workflow run? Thanks!

Comment on lines +3259 to +3262
// Forcing to run this test without any compression.
// If compressor is presented, then CI will fail when snappy compression is enabled, since
// protocol v5 doesn't support it.
cluster.Compressor = nil
Copy link

@ribaraka ribaraka Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've exluded the snappy from proto v5 in the CI, then everyting should work fine. This block of code is likely unnecessary, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test verifies the protocol discovery mechanism by explicitly setting cluster.ProtoVersion = 0, so it means that this test doesn't depend on the CI matrix.

The idea behind protocol discovery is to send a query to C* with the newest protocol version that the driver supports, which in this case is proto 5. For instance, the latest proto version supported by Cassandra 3 is proto 4, but the driver supports 5. The driver will send a query with proto 5 version, and C* 3 will return an error with the lowest and greatest proto versions. Then the driver picks the greatest one, which is v4 for C* 3.

The issue comes when CI runs this test with matrix.proto_version = 4 and matrix.compressor = snappy. Proto version is ignored, but not the compressor. So, the driver will send queries with max proto version of 5 and snappy compressor, which causes Cassandra to return an error about unsupported compressor type for protocol v5.

Comment on lines 3523 to 3535
func TestPrepareExecuteMetadataChangedFlag(t *testing.T) {
// This test ensures that the whole Metadata_changed flow
// is handled properly.
//
// To trigger C* to return Metadata_changed we should do:
// 1. Create a table
// 2. Prepare stmt which uses the created table
// 3. Change the table schema in order to affect prepared stmt (e.g. add a column)
// 4. Execute prepared stmt. As a result C* should return RESULT/ROWS response with
// Metadata_changed flag, new metadata id and updated metadata resultset.
//
// The driver should handle this by updating its prepared statement inside the cache
// when it receives RESULT/ROWS with Metadata_changed flag

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move descriptions above the function declaration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ribaraka
Copy link

The updated CI looks goot to me.

@worryg0d worryg0d force-pushed the native-proto-5-support-all-merged branch from f6b73d9 to 549f36c Compare February 21, 2025 12:42
Copy link
Contributor

@jameshartig jameshartig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and kicked off CI

@worryg0d
Copy link
Contributor Author

Thanks a lot!

I also noticed that CI doesn't run tests without any compression. Should I add it to the matrix? With proto v5 it makes sense to add them because we have different segment format for compressed and uncompressed segments

Native Protocol 5 was introduced with the release of C* 4.0. This PR provides full support
for a newer version including new format frames (segments), and new fields for QUERY, BATCH, and EXECUTE messages.
Also, this PR brings changes to the Compressor interface to follow an append-like design.
One more thing, it bumps Go version to the newer 1.19.

Patch by Bohdan Siryk; Reviewed by João Reis, James Hartig for CASSGO-1 CASSGO-30
@worryg0d worryg0d force-pushed the native-proto-5-support-all-merged branch from 549f36c to a7aa3c5 Compare February 27, 2025 15:22
@worryg0d
Copy link
Contributor Author

Fixed conflicts caused by #1842 and deleted replace in go.mod

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
protocol_v5 Tickets related to supporting protocol version 5 semver-major
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants