Skip to content

Conversation

@danny0405
Copy link
Contributor

@danny0405 danny0405 commented Feb 15, 2022

…exist in metadata table active timeline

Tips

What is the purpose of the pull request

After this change, when the compaction metadata commits successfully but the data set commit state switch fails, the metadata table may bookkeep the compaction files which has been rolledback(removed), but the odds are far less than the case that metadata and data set commits never happens.

And the compaction files are actually idempotent.

Have no good solution to fix this completely, maybe we should check the archive timeline for accurate check whether the instant to rollback is archived.

Finally i add a new filtering condition for metadata table archiving to address this problem.

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@nsivabalan nsivabalan added the priority:critical Production degraded; pipelines stalled label Feb 16, 2022
@danny0405 danny0405 force-pushed the HUDI-3435 branch 2 times, most recently from 1e329eb to b4f50a9 Compare February 18, 2022 03:02
if (config.isMetadataTableEnabled()) {
try (HoodieTableMetadata tableMetadata = HoodieTableMetadata.create(table.getContext(), config.getMetadataConfig(),
config.getBasePath(), FileSystemViewStorageConfig.SPILLABLE_DIR.defaultValue())) {
Option<String> latestCompactionTime = tableMetadata.getLatestCompactionTime();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @nsivabalan , i see that you put the code here, is there any special reason that the data set timeline archival should be in front of the metadata latest compaction time ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.
CC @prashantwason
We have documented the reasoning here https://issues.apache.org/jira/browse/HUDI-2458

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current metadata table design, we validate the deltacommits to read for metadata table based on which completed commits are present in the dataset. So dataset should never be archiving the instants before compaction on metadata table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I read the code and this patch does not break the original limitations. Because the metadata table commits before dataset, with the new patch, the instants on the dataset timeline must also be in the metadata timeline, so we can still do the check for dataset instant existence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, @nsivabalan can you help confirm this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be the case where deltacommit on the metadata table succeeds but commit on dataset fails (inflight -> complete transition fails).

When the job restarts, the last commit would be rollbacked first right ? And the view of the metadata table would be fixed then, so this should not be a problem.

In any case, the metadata table was read with latest filesystem view which includes the delta commits logs files, i don't know why we address metadata compaction here because compaction does not affect the table records.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the job restarts, the last commit would be rollbacked first right ?
The last commit failed on the dataset but succeed on the metadata table. So yes it will be rolled back on the dataset eventually - depends on the settings (EAGER vs LAZY rollbacks).

Also we need to support the readers - they need to ignore the deltacommit. There can be a delay between the failed job and the retry and readers should read consistent data during that time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we need to support the readers - they need to ignore the deltacommit

That is not the case for current metadata table reader i guess, and in personal i think this restriction is too much limited, doesn't the fail rollback already fixed the metadata table then after the rollback metadata was synced ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be the case where deltacommit on the metadata table succeeds but commit on dataset fails (inflight -> complete transition fails).

When the job restarts, the last commit would be rollbacked first right ? And the view of the metadata table would be fixed then, so this should not be a problem.

Last commit may not be rolled back immediately for the scenarios of single writer with async table services and multi-writer, since LAZY should be set for hoodie.cleaner.policy.failed.writes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry i don't understand why the cleaner policy can affect this, if the rollback metadata was synced, the metadata file list can be trusted right ? And the corrupt files should then be ignored automatically.

@danny0405
Copy link
Contributor Author

@hudi-bot run azure

@danny0405
Copy link
Contributor Author

@nsivabalan The PR is ready, please review if you have time ~

@danny0405
Copy link
Contributor Author

@hudi-bot run azure

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @yihua : guess this patch is addressing the issue you hit during the testing.

if (config.isMetadataTableEnabled()) {
try (HoodieTableMetadata tableMetadata = HoodieTableMetadata.create(table.getContext(), config.getMetadataConfig(),
config.getBasePath(), FileSystemViewStorageConfig.SPILLABLE_DIR.defaultValue())) {
Option<String> latestCompactionTime = tableMetadata.getLatestCompactionTime();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.
CC @prashantwason
We have documented the reasoning here https://issues.apache.org/jira/browse/HUDI-2458

if (config.isMetadataTableEnabled()) {
try (HoodieTableMetadata tableMetadata = HoodieTableMetadata.create(table.getContext(), config.getMetadataConfig(),
config.getBasePath(), FileSystemViewStorageConfig.SPILLABLE_DIR.defaultValue())) {
Option<String> latestCompactionTime = tableMetadata.getLatestCompactionTime();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current metadata table design, we validate the deltacommits to read for metadata table based on which completed commits are present in the dataset. So dataset should never be archiving the instants before compaction on metadata table.

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left comments

@danny0405
Copy link
Contributor Author

Hello @nsivabalan , i think after this path, the compaction instant constraint can be removed, can you double check this ?

.archiveCommitsWith(3, 4)
.retainCommits(1)
.build())
.withMarkersType("DIRECT")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for adding this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misadded can be removed.

@nsivabalan
Copy link
Contributor

@danny0405 : sorry, I don't think we can remove the data table relying on metadata table compaction. can you help me understand. Myself and @yihua did jam on this.

here is our claim
In metadata table, we filter out any additional commits which is complete is MDT, but not yet complete in Data table. so, if we loosen the dependency, there could be a partially failed commit in data table which gets archived. In which case when we inpsect the commit in MDT, we might assume that its already committed in data table as well which could lead to wrong results.

If I am missing something, let me know.

@danny0405
Copy link
Contributor Author

there could be a partially failed commit in data table which gets archived

For 'partially failed commit' do you mean the commit in the data set table ? Then it should not goes to the step of the archive i think, we do archive post of a successful data set commit.

And when we have a commit t1, and t1 commit to metadata table successfully but failed for data set table, when restarts the job again, t1 would be rolled backed both in data set table and metadata table, so what's the problem here ?

@nsivabalan
Copy link
Contributor

@danny0405 : here is the scenario.
Lets say multi-writer is enabled and hence rollbacks are lazy. there is a commit C5 which got committed to MDT, but crashed before committing to data table. and the user restarts the pipeline. due to multi-writer, there are more commits added, but rollback will be triggered lazily by cleaner. Lets say cleaner configs are very easy (say 100 commits). So, by this time, archival could clean up the partially failed commit.

@danny0405
Copy link
Contributor Author

@danny0405 : here is the scenario. Lets say multi-writer is enabled and hence rollbacks are lazy. there is a commit C5 which got committed to MDT, but crashed before committing to data table. and the user restarts the pipeline. due to multi-writer, there are more commits added, but rollback will be triggered lazily by cleaner. Lets say cleaner configs are very easy (say 100 commits). So, by this time, archival could clean up the partially failed commit.

Can we disable lazy cleaner for restarts/bootstrap then ? The lazy cleaning makes sense for normal commit but it just make things complex for boostrap/restarts and it even does not gains much.

@yihua
Copy link
Contributor

yihua commented Mar 10, 2022

@danny0405 : here is the scenario. Lets say multi-writer is enabled and hence rollbacks are lazy. there is a commit C5 which got committed to MDT, but crashed before committing to data table. and the user restarts the pipeline. due to multi-writer, there are more commits added, but rollback will be triggered lazily by cleaner. Lets say cleaner configs are very easy (say 100 commits). So, by this time, archival could clean up the partially failed commit.

Can we disable lazy cleaner for restarts/bootstrap then ? The lazy cleaning makes sense for normal commit but it just make things complex for boostrap/restarts and it even does not gains much.

For multi-writer scenario, we must have lazy cleaning since the job cannot tell if the inflight commit is due to failed write or actual inflight commit from another writer. So the job relies on the heartbeat timeout for determining the failed writes and lazily cleans up failed commits later. This is the whole point of having the guard we are discussing.

Copy link
Contributor

@yihua yihua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the critical fix!

@danny0405
Copy link
Contributor Author

@hudi-bot run azure

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix. LGTM

@apache apache deleted a comment from hudi-bot Mar 24, 2022
@apache apache deleted a comment from hudi-bot Mar 25, 2022
@danny0405
Copy link
Contributor Author

@hudi-bot run azure

@danny0405 danny0405 force-pushed the HUDI-3435 branch 2 times, most recently from 5f5a782 to 2f1368d Compare March 26, 2022 01:00
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit 0c09a97 into apache:master Mar 26, 2022
vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:critical Production degraded; pipelines stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants