[HUDI-837]: implemented custom deserializer for AvroKafkaSource #1562

pratyakshsharma · 2020-04-25T15:27:22Z

Tips

Thank you very much for contributing to Apache Hudi.
Please review https://hudi.apache.org/contributing.html before opening a pull request.

What is the purpose of the pull request

When we read data from Kafka, we want to always read with the latest schema.
This allows us to make assumption throughout the rest of the pipeline that every record has the same schema.
We create a custom KafkaAvroDecoder that use the latest schema as read schema.
This does not work with all SchemaProvider yet.

Brief change log

Implemented HoodieAvroKafkaDeserializer for supplying readerSchema as per user's need.
Introduced a property to configure "value.deserializer" property for AvroKafkaSource.

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end.
Added HoodieClientWriteTest to verify the change.
Manually verified the change by running a job locally.

Committer checklist

Has a corresponding JIRA in PR title & commit
Commit message is descriptive of the change
CI is green
Necessary doc changes done or have another open PR
For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

pratyakshsharma · 2020-04-25T15:29:59Z

Thinking of writing test cases for this, but unable to simulate because AbstractKafkaAvroDeserializer expects a working schema-registry url. Not sure of how to mock the same here since it is library class.

vinothchandar · 2020-04-25T16:56:52Z

@afilipchik @umehrot2 help review this? :)

vinothchandar · 2020-04-27T19:55:57Z

@afilipchik interested in reviewing this?

...ities/src/main/java/org/apache/hudi/utilities/sources/serde/HoodieAvroKafkaDeserializer.java

hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/AvroKafkaSource.java

pratyakshsharma · 2020-05-14T13:11:51Z

@vinothchandar @afilipchik Let us close this? :)

vinothchandar

Can we add a test around this ? I was kind of surprised that the reading of the avro records based on latest schema is not happening using the existing deserializer..

Does everyone out there write this code themselveS? or is there a deserializer that we can use alreayd?

hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/AvroKafkaSource.java

hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java

vinothchandar · 2020-05-14T21:03:21Z

Not sure of how to mock the same here since it is library class.

We can just mock the response it will send into a test SchemaProvider.. We need not mock SchemaRegistry itself

vinothchandar · 2020-05-14T21:36:16Z

@n3nash can you review this and take it home?

pratyakshsharma · 2020-05-16T20:21:32Z

@n3nash please take a look. It is good to merge now. All the comments are addressed.

pratyakshsharma · 2020-05-20T22:33:44Z

@n3nash got a chance to look at this? :)

pratyakshsharma · 2020-06-27T21:26:34Z

@n3nash @vinothchandar I guess we can merge this? :)

...ities/src/main/java/org/apache/hudi/utilities/sources/serde/HoodieAvroKafkaDeserializer.java

pratyakshsharma · 2020-07-10T08:56:51Z

@n3nash Please take a pass.

vinothchandar · 2021-03-14T07:56:51Z

Closing this in favor of #2619

…e#1562) Co-authored-by: Tien <[email protected]>

[HUDI-837]: implemented custom deserializer for AvroKafkaSource

4447a56

pratyakshsharma mentioned this pull request Apr 25, 2020

[WIP] Fix KafkaAvroSource to use the latest schema #765

Closed

vinothchandar self-assigned this Apr 27, 2020

afilipchik reviewed Apr 28, 2020

View reviewed changes

...ities/src/main/java/org/apache/hudi/utilities/sources/serde/HoodieAvroKafkaDeserializer.java Show resolved Hide resolved

afilipchik reviewed May 5, 2020

View reviewed changes

vinothchandar assigned n3nash May 5, 2020

[HUDI-837]: code review comments addressed

911d35d

vinothchandar reviewed May 14, 2020

View reviewed changes

vinothchandar removed their assignment May 14, 2020

pratyakshsharma added 2 commits May 16, 2020 23:16

[HUDI-837]: added test cases

9f80200

[HUDI-837]: small changes in DummySchemaProvider

684e5da

n3nash reviewed Jul 8, 2020

View reviewed changes

...ities/src/main/java/org/apache/hudi/utilities/sources/serde/HoodieAvroKafkaDeserializer.java Outdated Show resolved Hide resolved

[HUDI-837]: code review comments addressed

e798873

vinothchandar added the area:schema Schema evolution and data types label Oct 9, 2020

nsivabalan mentioned this pull request Dec 25, 2020

[Hudi 73] Adding support for vanilla AvroKafkaSource #2380

Closed

5 tasks

vinothchandar added the priority:critical Production degraded; pipelines stalled label Feb 11, 2021

vinothchandar closed this Mar 14, 2021

kroushan-nit pushed a commit to kroushan-nit/hudi-oss-fork that referenced this pull request Nov 13, 2025

[ENG-32251][HUDI-9757]: support TLS authentication for Datahub (apach…

6fbdfe3

…e#1562) Co-authored-by: Tien <[email protected]>

[HUDI-837]: implemented custom deserializer for AvroKafkaSource #1562

[HUDI-837]: implemented custom deserializer for AvroKafkaSource #1562

Uh oh!

Conversation

pratyakshsharma commented Apr 25, 2020

Tips

What is the purpose of the pull request

Brief change log

Verify this pull request

Committer checklist

Uh oh!

pratyakshsharma commented Apr 25, 2020

Uh oh!

vinothchandar commented Apr 25, 2020

Uh oh!

vinothchandar commented Apr 27, 2020

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pratyakshsharma commented May 14, 2020

Uh oh!

vinothchandar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vinothchandar commented May 14, 2020

Uh oh!

vinothchandar commented May 14, 2020

Uh oh!

pratyakshsharma commented May 16, 2020

Uh oh!

pratyakshsharma commented May 20, 2020

Uh oh!

pratyakshsharma commented Jun 27, 2020

Uh oh!

Uh oh!

pratyakshsharma commented Jul 10, 2020

Uh oh!

vinothchandar commented Mar 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants