-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[compat] [server] [client] [test] Global RT DIV: Chunking Support #1385
base: main
Are you sure you want to change the base?
Conversation
7ee8cd9
to
d3ae029
Compare
internal/venice-common/src/main/java/com/linkedin/venice/writer/VeniceWriter.java
Outdated
Show resolved
Hide resolved
internal/venice-common/src/test/java/com/linkedin/venice/writer/VeniceWriterUnitTest.java
Outdated
Show resolved
Hide resolved
@@ -1479,7 +1482,7 @@ protected void updateOffsetMetadataInOffsetRecord(PartitionConsumptionState part | |||
upstreamTopic = versionTopic; | |||
} | |||
if (upstreamTopic.isRealTime()) { | |||
offsetRecord.resetUpstreamOffsetMap(partitionConsumptionState.getLatestProcessedUpstreamRTOffsetMap()); | |||
offsetRecord.mergeUpstreamOffsets(partitionConsumptionState.getLatestProcessedUpstreamRTOffsetMap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you do this change? Personally I think this makes less explicit about the offset map we are tracking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I feel that reset()
is not the right word for it. It's not clearing the state or replacing the map entirely. It's updating the existing map with the values of the new map and retaining the values missing from the new map.
I can rename it to updateUpstreamOffsets()
.
@@ -21,7 +21,8 @@ | |||
*/ | |||
public enum MessageType implements VeniceEnumValue { | |||
PUT(0, Constants.PUT_KEY_HEADER_BYTE), DELETE(1, Constants.PUT_KEY_HEADER_BYTE), | |||
CONTROL_MESSAGE(2, Constants.CONTROL_MESSAGE_KEY_HEADER_BYTE), UPDATE(3, Constants.UPDATE_KEY_HEADER_BYTE); | |||
CONTROL_MESSAGE(2, Constants.CONTROL_MESSAGE_KEY_HEADER_BYTE), UPDATE(3, Constants.UPDATE_KEY_HEADER_BYTE), | |||
GLOBAL_RT_DIV(4, Constants.GLOBAL_RT_DIV_KEY_HEADER_BYTE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it is ok to not add it into KME. I am thinking for example, Da Vinci and CDC, if the code is not upgraded, how can it be handling the global rt div? Or should we completely mute them from processing RT div message?
Will this controlled by store config or just server config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, Davinci and CDC don't need to process the RT div messages. Only the next leader has to read and process it, right. They can simply ignore them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's controlled by server config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, with this code change, we will eventually enable the feature to produce RT DIV snapshot to VT topic. This will be processed by follower, which also applicable to DVC. I know DVC won't be leader, but will the existing code handle unknown new msg type gracefully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. The reader of the msg type (to be able to ignore it) has to be deployed before enabling this feature. i don't know if there is other way to walk around it, feel free to suggest alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't look very deep, but I think that the existing code will throw an exception when encountering an unknown message type.
Handling an unknown message type will be implemented in a future PR. Since we control the feature flag, we can choose not to enable the feature flag until all server instances support the new message type so there won't be any messages with this message type floating around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to deploy the changes to all VT consumers before enabling this feature, which includes:
- Server.
- DaVinci.
- ETL.
- CDC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think it is better to treat the new KME type as a Control Message
, and I would like to learn why if we can't do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to call out another item we didn't discuss in the review meeting:
- Cleanup the Global RT DIV messages from VT.
This new type of message can be large and with chunking support, we are leaking the data chunks as chunk id is unique.
I was thinking some strategy to let follower to send out a Kafka delete or Kafka message with empty value for the previous key after consuming a new Global RT DIV message
, and if we don't do the cleanup, the size of Kafka topic might grow a lot depending on the sending frequency.
Maybe we don't need to implement such cleanup in the MVP, but I think eventually, we need some way to clean them up from the version topics.
...ient/src/main/java/com/linkedin/davinci/kafka/consumer/LeaderFollowerStoreIngestionTask.java
Show resolved
Hide resolved
...ts/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/StoreIngestionTask.java
Outdated
Show resolved
Hide resolved
...ts/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/StoreIngestionTask.java
Outdated
Show resolved
Hide resolved
...ts/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/StoreIngestionTask.java
Outdated
Show resolved
Hide resolved
...ts/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/StoreIngestionTask.java
Outdated
Show resolved
Hide resolved
...ts/da-vinci-client/src/main/java/com/linkedin/davinci/kafka/consumer/StoreIngestionTask.java
Show resolved
Hide resolved
...inci-client/src/main/java/com/linkedin/davinci/kafka/consumer/StoreIngestionTaskFactory.java
Outdated
Show resolved
Hide resolved
@@ -21,7 +21,8 @@ | |||
*/ | |||
public enum MessageType implements VeniceEnumValue { | |||
PUT(0, Constants.PUT_KEY_HEADER_BYTE), DELETE(1, Constants.PUT_KEY_HEADER_BYTE), | |||
CONTROL_MESSAGE(2, Constants.CONTROL_MESSAGE_KEY_HEADER_BYTE), UPDATE(3, Constants.UPDATE_KEY_HEADER_BYTE); | |||
CONTROL_MESSAGE(2, Constants.CONTROL_MESSAGE_KEY_HEADER_BYTE), UPDATE(3, Constants.UPDATE_KEY_HEADER_BYTE), | |||
GLOBAL_RT_DIV(4, Constants.GLOBAL_RT_DIV_KEY_HEADER_BYTE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to deploy the changes to all VT consumers before enabling this feature, which includes:
- Server.
- DaVinci.
- ETL.
- CDC.
@@ -21,7 +21,8 @@ | |||
*/ | |||
public enum MessageType implements VeniceEnumValue { | |||
PUT(0, Constants.PUT_KEY_HEADER_BYTE), DELETE(1, Constants.PUT_KEY_HEADER_BYTE), | |||
CONTROL_MESSAGE(2, Constants.CONTROL_MESSAGE_KEY_HEADER_BYTE), UPDATE(3, Constants.UPDATE_KEY_HEADER_BYTE); | |||
CONTROL_MESSAGE(2, Constants.CONTROL_MESSAGE_KEY_HEADER_BYTE), UPDATE(3, Constants.UPDATE_KEY_HEADER_BYTE), | |||
GLOBAL_RT_DIV(4, Constants.GLOBAL_RT_DIV_KEY_HEADER_BYTE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I think it is better to treat the new KME type as a Control Message
, and I would like to learn why if we can't do that.
"type": "map", | ||
"doc": "A map that maps producer GUID -> producer state for realtime data.", | ||
"values": { | ||
"name": "ProducerPartitionState", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to reference the existing definition in PartitionState.avsc
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sushantmane
I recalled you did some experiment regarding referring existing schema definition, and how does that go?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At compile time, everything worked fine, but at runtime, I ran into schema not found exceptions. I didn’t look into it further and chose to stick with the existing pattern.
b27282d
to
ad77fc1
Compare
* Renamed `resetUpstreamOffsetMap()` to `mergeUpstreamOffsets()`. 😮💨 * Minor refactor around `validateMessage()` 🦯🦯 * Copied `KafkaMessageEnvelope.avsc` for a new protocol version. 🥣 * Created `GlobalRtDiv` new `MessageType` that is based on the `Put` messages, and renamed `GlobalDivState` avro object to `GlobalRtDivState`. 🌯 * Unified the code paths of `bufferAndAssembleRecord()` with deserialization . 🥓 * Revised `GlobalRtDiv` chunking support in `VeniceWriter` and adjacent. 🍭 * Reusing `Put` instead of creating new message type `GlobalRtDiv`. 🫐 1. Fixed `shouldProcessRecord()` condition in LFSIT. 🍊 2. Split `ChunkAssembler` for RT DIV into its own object. 🍐 3. `GlobalRtDiv` serializer is per-message to be safe, because it doesn't seem to be thread-safe. 🍋🟩 4. Fixed spotbugs. 🌶️ 5. Fixed `divChunkAssembler` for the SIT unit test. 🫨 * The `MessageType` for the KME needs to be `PUT`. Only the `KafkaKey` will have `GLOBAL_RT_DIV` as the `MessageType`. 🏯 * Minor cleanup in the `VeniceWriterUnitTest` and `Put` rather than `Object` in the `sendMessageFunction`. 🪀
ad77fc1
to
1a22b2f
Compare
Summary
Continuation from #1257.
This PR mainly focuses on adding chunking support for DIV messages when they are produced to Kafka topics, as the size of the DIV message can surpass the ~1MB Kafka message limit. The existing chunking mechanism is reused, including the
CHUNK
andCHUNKED_VALUE_MANIFEST
values in the message'sschemaId
:Every DIV message has
GLOBAL_RT_DIV
for the header byte in itsKafkaKey
. The correspondingKafkaMessageEnvelope
has aPut
payload utilizing theputValue
field containing theGlobalRtDiv
data, and which has the followingschemaId
:schemaId
is set to the current protocol version ofGLOBAL_RT_DIV
.schemaId
is set toCHUNK
.schemaId
is set toCHUNKED_VALUE_MANIFEST
. TheschemaId
of theChunkedValueManifest
will be the current protocol version ofGLOBAL_RT_DIV
.ChunkAssembler
is adapted, on the receiver side, to buffer, assemble, and deserialize DIV messages (chunked/non-chunked).Changes
MessageType
calledGlobalRtDiv
, which reuses thePut
message type format and objects. When the Venice server encounters a message withKafkaKey
containing theGlobalRtDiv
header byte, it will know to process this message differently from a regularPut
.GlobalRtDiv
message type is the header byte inKafkaKey
. Otherwise, it's identical to a regularPut
.KafkaMessageEnvelope.avsc
will not be updated to avoid the unnecessary risk of incompatible avro formats when upgrading the cluster.GlobalRtDiv
message type in KME is that theGlobalRtDiv
objects will be processed as user records and stored in the storage engine, which seems to be much less scary than a cluster upgrade issue.bufferAndAssembleRecord()
inChunkAssembler
to use a deserializationFunction
instead of aRecordDeserializer
. This is becauseRecordDeserializer
is on a separate inheritance path to theInternalAvroSpecificSerializer
.GlobalRtDiv
messages should not be processed if they originate from remote VT and RT, because those are invalid scenarios. These two conditions are checked.Minor Changes
resetUpstreamOffsetMap()
tomergeUpstreamOffsets()
inOffsetRecord
.toString()
inKafkaKey
, which incorrectly assumed all messages would beControlMessage
,Put
, orDelete
. This missesUpdate
messages and the newGlobalRtDiv
message that is being added.buildPutPayload()
andbuildManifestPayload()
) inVeniceWriter
for creating thePut
payloads and when chunking is involved.Testing
testGlobalRtDivChunking()
inVeniceWriterUnitTest
testShouldProcessRecordForGlobalRtDivMessage()
inStoreIngestionTaskTest
testProcessGlobalRtDivMessage()
inStoreIngestionTaskTest
testChunkedDiv()
inTestGlobalRtDiv
Does this PR introduce any user-facing changes?