Skip to content

Conversation

@pratyakshsharma
Copy link
Contributor

Tips

What is the purpose of the pull request

  • When we read data from Kafka, we want to always read with the latest schema.
  • This allows us to make assumption throughout the rest of the pipeline that every record has the same schema.
  • We create a custom KafkaAvroDecoder that use the latest schema as read schema.
  • This does not work with all SchemaProvider yet.

Brief change log

  • Implemented HoodieAvroKafkaDeserializer for supplying readerSchema as per user's need.
  • Introduced a property to configure "value.deserializer" property for AvroKafkaSource.

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@pratyakshsharma
Copy link
Contributor Author

Thinking of writing test cases for this, but unable to simulate because AbstractKafkaAvroDeserializer expects a working schema-registry url. Not sure of how to mock the same here since it is library class.

@vinothchandar
Copy link
Member

@afilipchik @umehrot2 help review this? :)

@vinothchandar vinothchandar self-assigned this Apr 27, 2020
@vinothchandar
Copy link
Member

@afilipchik interested in reviewing this?

@pratyakshsharma
Copy link
Contributor Author

@vinothchandar @afilipchik Let us close this? :)

Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test around this ? I was kind of surprised that the reading of the avro records based on latest schema is not happening using the existing deserializer..

Does everyone out there write this code themselveS? or is there a deserializer that we can use alreayd?

@vinothchandar
Copy link
Member

Not sure of how to mock the same here since it is library class.

We can just mock the response it will send into a test SchemaProvider.. We need not mock SchemaRegistry itself

@vinothchandar
Copy link
Member

@n3nash can you review this and take it home?

@vinothchandar vinothchandar removed their assignment May 14, 2020
@pratyakshsharma
Copy link
Contributor Author

@n3nash please take a look. It is good to merge now. All the comments are addressed.

@pratyakshsharma
Copy link
Contributor Author

@n3nash got a chance to look at this? :)

@pratyakshsharma
Copy link
Contributor Author

@n3nash @vinothchandar I guess we can merge this? :)

@pratyakshsharma
Copy link
Contributor Author

@n3nash Please take a pass.

@vinothchandar vinothchandar added the area:schema Schema evolution and data types label Oct 9, 2020
@vinothchandar vinothchandar added the priority:critical Production degraded; pipelines stalled label Feb 11, 2021
@vinothchandar
Copy link
Member

Closing this in favor of #2619

kroushan-nit pushed a commit to kroushan-nit/hudi-oss-fork that referenced this pull request Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:schema Schema evolution and data types priority:critical Production degraded; pipelines stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants