Skip to content

Conversation

@danny0405
Copy link
Contributor

…tic concurrency

Tips

What is the purpose of the pull request

(For example: This pull request adds quick-start document.)

Brief change log

(for example:)

  • Modify AnnotationLocation checkstyle rule in checkstyle.xml

Verify this pull request

(Please pick either of the following options)

This pull request is a trivial rework / code cleanup without any test coverage.

(or)

This pull request is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end.
  • Added HoodieClientWriteTest to verify the change.
  • Manually verified the change by running a job locally.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

Map<String, Option<HoodiePendingRollbackInfo>> pendingRollbacks = getPendingRollbackInfos(metaClient);
instantsToRollback.forEach(entry -> pendingRollbacks.putIfAbsent(entry, Option.empty()));
Map<String, Option<HoodiePendingRollbackInfo>> pendingRollbacks = getPendingRollbackInfos(metaClient);
instantsToRollback.forEach(entry -> pendingRollbacks.putIfAbsent(entry, Option.empty()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pre-upgrade rollback was introduced in #2374, with a pre-condition that when the write concurrency mode is based on optimistic lock.

While in #4114, the pre-condition was removed, which i think is a regression because we already do the rollback when starting new instant/commit before write. In single writer mode, there is no need to do redundant rollback again before upgrade.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@n3nash : change is ok with me. Do you see any issues as such.
CC @alexeykudinkin

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One case that I can think of is when the CLEAN_FAILED_FILES policy is set to LAZY and writer config is set to a single writer - I believe there can be older inflight files lying around. In this case if the rollback is done only for Optimistic Concurrency, some older inflight metadata files may be lying around.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some older inflight metadata files may be lying around

That's true, but what is the problem here in single writer upgrade ? It seems that the problem is not about upgrade but about the metadata table rollback, but shouldn't the metadata table also has MVCC/SI transaction semantics here ? The metadata table reader can not see these inflight meta logs right ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danny0405 I'm not fully following your comment. From this PR, what I gather is that we want to special case rollback for Optimistic Concurrency only since for Single Writer this will be redundant given rollback is supposed to happen at the start of the commit. I'm merely saying that's not necessarily the case due to the LAZY cleaning policy, does that make sense ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean for LAZY cleaning strategy, we do not rollback failed commits at the start of commits. I totally agree with that.

I'm actually confused here whether there are any issues here for metadata table rollback for single-writer mode, if there is no problem, then this patch is an improvement ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @danny0405 : I am confused as to whats the regression here. I thought this is more of an optimization. From your first comment "While in #4114, the pre-condition was removed, which i think is a regression because we already do the rollback when starting new instant/commit before write. In single writer mode, there is no need to do redundant rollback again before upgrade."

can you help me understand why would be do redundant rollback. once we rolledback a given commit, next time if we check if something needs to be rolled back, the same partially failed commit should not appear right. I mean, if the rollback was complete the first time, it should be be part of the list of commits to rollback the 2nd time right. or am I missing something here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nsivabalan , i agree with your point, but at least, there is no need for redundant metadata checking if the rollback strategy is EAGER.

And still, no one answers the question why we need the rollback (for data set files and metadata files) before upgrade ? What is exactly the purpose here ?

@n3nash say that it is because we need to rollback the metadata table if it is enabled, but we can not see any clues that the rollback is only for metadata. And even if we do not rollback the metadata, the content read from the metadata should still be correct based on that the metadata table itself is a Hudi table and has SI visibility ?

Seems it is clear that the rollback here is not for data set files, correct me if i'm wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @nsivabalan @n3nash , i have refactored the code a little to

  1. move the failed writes cleaning all together to when starting the new commit
  2. the cleaning strategy for commit actions as tweaked: force to clean if metadata table is enabled in case we commit metadata table successfully but data set table failed

Please review again if you have time, thanks so much in advance ~

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I analyze the code a little and find that we would do the metadata table sync when it is initialized, so for the case "commit metadata successfully but data set table failed", it should be fine.

Also i didn't find any upgrade/downgrade code that needs the data set/metadata files, from my perspective, the rollback is totally unnecessary, so i just remove it here.

@danny0405
Copy link
Contributor Author

@manojpec @n3nash Can you please confirm this change ?

@danny0405
Copy link
Contributor Author

Also cc @alexeykudinkin , can you review if you have time :)

Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets wait to hear from nishith

@nsivabalan nsivabalan added the priority:critical Production degraded; pipelines stalled label May 11, 2022
@danny0405 danny0405 force-pushed the HUDI-4062 branch 2 times, most recently from 30ac93b to 35a1fed Compare May 19, 2022 07:52
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@alexeykudinkin
Copy link
Contributor

@danny0405 can you please update PR description to capture the intent of this change. Adding context to the description helps tremendously in the review and beyond to understand the original intent.

@alexeykudinkin
Copy link
Contributor

@danny0405 let's take a step back here: can you please outline what's the problem we're trying to solve for here?

Do you see any issues introduced by #4114 or is this purely targeting an optimization?
If it's an optimization, can you please outline what's the end goal of this optimization?

@danny0405
Copy link
Contributor Author

It is a optimization, but i think i made the logic more complicated, so i will close this PR.

@danny0405 danny0405 closed this May 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:critical Production degraded; pipelines stalled

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants