Skip to content

Conversation

@nsivabalan
Copy link
Contributor

@nsivabalan nsivabalan commented Jan 28, 2023

Change Logs

We added multi-writer capability to spark streaming w/ this #7383 patch.
This introduced a identifier config(hoodie.datasource.write.streaming.checkpoint.identifier) that each writer should set. For a single writer, we should set some default and not ask user to explicitly set the configs. As of now, there is no default and even a single writer is expected to set some value for this if they wish to ensure duplicate records may not get ingested to hudi.

for multi-writer scenarios, its fair to expect users to set different identifiers for diff writers.

Also added validation that for multi-writer, user is expected to explicitly set value for hoodie.datasource.write.streaming.checkpoint.identifier.

Impact

Simplifying usage of spark streaming for a single writer.
For multi-writers, we do expect users to explicitly set value for hoodie.datasource.write.streaming.checkpoint.identifier. But for single writer, internally we deduce a default value.

Risk level (write none, low medium or high below)

low.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@nsivabalan nsivabalan added the priority:blocker Production down; release blocker label Jan 28, 2023
@nsivabalan nsivabalan force-pushed the singleWriterStreamingCheckpoint branch from c22aa63 to ce62ebd Compare January 28, 2023 20:44
@nsivabalan nsivabalan force-pushed the singleWriterStreamingCheckpoint branch from ce62ebd to 29a2f75 Compare January 28, 2023 22:21
@nsivabalan
Copy link
Contributor Author

@hudi-bot run azure

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codope codope merged commit a1ba929 into apache:master Jan 29, 2023
yihua pushed a commit that referenced this pull request Jan 30, 2023
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Jan 31, 2023
fengjian428 pushed a commit to fengjian428/hudi that referenced this pull request Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:blocker Production down; release blocker

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

4 participants