Skip to content

Conversation

@veenaypatil
Copy link
Contributor

@veenaypatil veenaypatil commented Jun 16, 2021

What is the purpose of the pull request

This change commits the offset to Kafka after a successful Hudi commit, this will help in monitoring the consumer lag

Brief change log

  • Added commitOffsetToKafka function in KafkaOffsetGen
  • Added config KafkaOffsetGen.Config.ENABLE_KAFKA_COMMIT_OFFSET to be set in properties file

Verify this pull request

This change added tests and can be verified as follows:

  • Added testCommitOffsetToKafka test in TestKafkaSource

Testing on Actual Environment

  • Tested this change by building from this branch and deployed the job on YARN in continuous mode
  • Added the following config in kafka-source.properties file for delta streamer
hoodie.deltastreamer.source.enable.kafka.commit.offset=true
  • Pushed 9M records to Kafka topic, in this case offset will be committed in two different batches.

Screenshot 2021-06-20 at 9 48 25 PM

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@veenaypatil
Copy link
Contributor Author

@n3nash can you pls review and provide initial thoughts on the approach. I am yet to add test cases, facing few issues in testing this out locally after running TestKafkaSource

12571 [pool-18-thread-3] ERROR hive.log  - Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=anonymous, access=WRITE, inode="/var/folders/pd/pnhdkpt97c58vvm2y7_x73dh0000gp/T/1623828588133-3088184389848693840/testdb1.db":vinaypatil:supergroup:drwxr-xr-x

@hudi-bot
Copy link
Collaborator

hudi-bot commented Jun 16, 2021

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run travis re-run the last Travis build
  • @hudi-bot run azure re-run the last Azure build

@n3nash
Copy link
Contributor

n3nash commented Jun 16, 2021

@veenaypatil High level lgtm

@codecov-commenter
Copy link

codecov-commenter commented Jun 20, 2021

Codecov Report

Merging #3092 (f3cc39f) into master (f73bedd) will decrease coverage by 0.20%.
The diff coverage is 82.75%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3092      +/-   ##
============================================
- Coverage     46.36%   46.15%   -0.21%     
+ Complexity     5390     5368      -22     
============================================
  Files           920      921       +1     
  Lines         39939    39953      +14     
  Branches       4309     4288      -21     
============================================
- Hits          18517    18440      -77     
- Misses        19543    19631      +88     
- Partials       1879     1882       +3     
Flag Coverage Δ
hudicli 39.95% <ø> (ø)
hudiclient 30.45% <ø> (ø)
hudicommon 47.56% <ø> (-0.02%) ⬇️
hudiflink 59.86% <ø> (-1.64%) ⬇️
hudihadoopmr 51.29% <ø> (ø)
hudisparkdatasource 67.06% <ø> (ø)
hudisync 54.05% <ø> (ø)
huditimelineservice 64.36% <ø> (ø)
hudiutilities 58.40% <82.75%> (-0.69%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...apache/hudi/utilities/sources/AvroKafkaSource.java 0.00% <0.00%> (ø)
...java/org/apache/hudi/utilities/sources/Source.java 87.50% <ø> (-2.50%) ⬇️
...hudi/utilities/sources/helpers/KafkaOffsetGen.java 87.09% <90.00%> (-2.96%) ⬇️
.../hudi/utilities/callback/SourceCommitCallback.java 100.00% <100.00%> (ø)
...apache/hudi/utilities/deltastreamer/DeltaSync.java 70.94% <100.00%> (-0.34%) ⬇️
...i/utilities/deltastreamer/SourceFormatAdapter.java 86.84% <100.00%> (ø)
...apache/hudi/utilities/sources/JsonKafkaSource.java 100.00% <100.00%> (ø)
...org/apache/hudi/utilities/HoodieClusteringJob.java 61.42% <0.00%> (-1.59%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f73bedd...f3cc39f. Read the comment docs.

@veenaypatil veenaypatil changed the title [HUDI-1910] [WIP] Commit Offset to Kafka after successful Hudi commit [HUDI-1910] Commit Offset to Kafka after successful Hudi commit Jun 20, 2021
@veenaypatil
Copy link
Contributor Author

@n3nash @vinothchandar all the changes are done and tested, can you please review

@vinothchandar vinothchandar self-assigned this Jun 21, 2021
@vinothchandar vinothchandar added the priority:blocker Production down; release blocker label Jun 21, 2021
Copy link
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass comments. Will take a deeper pass.

@veenaypatil veenaypatil requested a review from n3nash June 25, 2021 14:04
@veenaypatil
Copy link
Contributor Author

@hudi-bot run travis

Copy link
Contributor

@n3nash n3nash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'll let @leesf review and land it

@leesf leesf merged commit 039aeb6 into apache:master Jun 28, 2021
@sbernauer
Copy link
Contributor

Hi together, this PR broke/overwrote the new change introduced in #3111

      return !prop.toString().startsWith("hoodie.")
              // We need to pass some properties to kafka client so that KafkaAvroSchemaDeserializer can use it
              || prop.toString().startsWith("hoodie.deltastreamer.source.kafka.value.deserializer.");

We need to pass the hoodie.deltastreamer.source.kafka.value.deserializer.* configs to the kafka client.
Happily i noticed it ;) I opened #3172 that will revert your change to include the needed properties again, so no further action needed besides looking at my PR ;)

@veenaypatil
Copy link
Contributor Author

@sbernauer ohh, my bad 😞

@sbernauer
Copy link
Contributor

@veenaypatil no problem ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants