-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[HUDI-1648] Added custom kafka meta fields and custom kafka avro decoder. #2598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2598 +/- ##
============================================
- Coverage 51.55% 9.18% -42.38%
+ Complexity 3282 48 -3234
============================================
Files 445 55 -390
Lines 20317 2037 -18280
Branches 2102 251 -1851
============================================
- Hits 10475 187 -10288
+ Misses 8976 1837 -7139
+ Partials 866 13 -853
Flags with carried forward coverage won't be shown. Click here to find out more. |
|
@vburenin thanks for this. can we create a JIRA for this? |
|
the goals here all look good to me. Please give me and @nsivabalan sometime to review |
It is in process, I may need to create multiple tickets. What I am really interested in right now is a code review so I do not spent too much time going forward and writing unit tests and then endup rolling it back. |
| result = bytes; | ||
| } else { | ||
| int start = buffer.position() + buffer.arrayOffset(); | ||
| DatumReader reader = new GenericDatumReader(schema, sourceSchema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may I know whats the plan in keeping this file in sync with AbstractKafkaAvroDeserializer with version upgrades?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work.
@Override
protected Object deserialize(
boolean includeSchemaAndVersion,
String topic,
Boolean isKey,
byte[] payload,
Schema readerSchema)
throws SerializationException {
super.deserialize(includeSchemaAndVersion,topic,isKey,payload, sourceSchema);
// pass sourceSchema as last arg instead of readerSchema
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It actually should work, there was a history behind this change that predates me that ended up with only sourceSchema change as I currently see. So yah, it will drastically simplify things. Thanks for catching it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more clarification @vburenin. Is this approach (passing sourceSchema instead of reader schema) is going to be useful for everyone(kakfa source users), even if they are not interested in meta fields to assist in schema evolution? If yes, wondering if we should make this customer deserializer as default kafka deserializer for hudi going forward?
Can you throw some light here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is definitely going to be useful for everybody. Kafka fields were added after this change was introduced. We had to keep up with the schema changes in the registry and this was the way to do that. The primary assumption, that is most likely true for every user, is that a schema is evolving... so, this is the way to keep up with the schema evolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
...tilities/src/main/java/org/apache/hudi/utilities/sources/helpers/AvroKafkaSourceHelpers.java
Outdated
Show resolved
Hide resolved
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaRegistryProvider.java
Show resolved
Hide resolved
|
|
||
| public static final String INJECT_KAFKA_FIELDS = "hoodie.deltastreamer.source.inject_kafka_fields"; | ||
|
|
||
| public static final String KAFKA_PARTITION = "_hudi_kafka_partition"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we plan to provide this as generic solution for everyone, lets have consensus on meta fields naming. @vinothchandar .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, we also can make it configurable.
hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaRegistryProvider.java
Outdated
Show resolved
Hide resolved
… with proper parameters. Renamed KAFKA_FIELD to KAFKA_META_FIELD
…ters is null, forced to add another constructor.
|
Is being split into multiple PRs. Closing. |
|
Hello. I notice that "[HUDI-1650] Custom avro kafka deserializer." has been merged. Is there another PR related to "custom kafka meta fields" ? Expecting to have this feature. |
|
There is no related PR yet. I've been planning to contribute this feature to upstream in Q2, so far it seems possible. |
What is the purpose of the pull request
This PR adds ability to:
Brief change log
Verify this pull request
This change added tests and can be verified as follows:
Committer checklist
Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.