Skip to content

[HUDI-52] Enabling savepoint and restore for MOR table#4507

Merged
codope merged 2 commits intoapache:masterfrom
nsivabalan:restoreMORTable
Jan 6, 2022
Merged

[HUDI-52] Enabling savepoint and restore for MOR table#4507
codope merged 2 commits intoapache:masterfrom
nsivabalan:restoreMORTable

Conversation

@nsivabalan
Copy link
Copy Markdown
Contributor

@nsivabalan nsivabalan commented Jan 5, 2022

What is the purpose of the pull request

Adding/Enabling restore for MOR table

Brief change log

  • Enabling restore for MOR table.

Verify this pull request

This change added tests and can be verified as follows:

  • TestHoodieSparkMergeOnReadTableRollback.testMORTableRestore

  • Verified the below cases manually (without metadata enabled) via hudi-cli
    a. insert, upsert, upsert, savepoint, upsert, rollback. validate records.
    b. insert, upsert, upsert, savepoint, upsert, upsert, rollback. validate records.
    c. insert, upsert, upsert, savepoint, upsert, insert, rollback. validate records.
    d. insert, upsert, upsert, savepoint, upsert, upsert, compact, rollback. validate records.
    e. insert, upsert, upsert, savepoint, upsert, upsert, compact, upsert, rollback. validate records. upsert, validate records.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@nsivabalan nsivabalan changed the title [HUDI-52] Enabling restore for MOR table [HUDI-52] Enabling savepoint and restore for MOR table Jan 5, 2022
@nsivabalan nsivabalan added the priority:critical Production degraded; pipelines stalled label Jan 6, 2022
Copy link
Copy Markdown
Member

@codope codope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change looks good to me. Can you please check the CI failure? The same test failed even after retrying.

@hudi-bot
Copy link
Copy Markdown
Collaborator

hudi-bot commented Jan 6, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@codope codope merged commit 2954027 into apache:master Jan 6, 2022
@xushiyan xushiyan removed the priority:critical Production degraded; pipelines stalled label Jan 11, 2022
@vinishjail97 vinishjail97 mentioned this pull request Jan 24, 2022
5 tasks
vingov pushed a commit to vingov/hudi that referenced this pull request Jan 26, 2022
* Enabling restore for MOR table

* Fixing savepoint for compaction commits in MOR
nsivabalan added a commit to onehouseinc/hudi that referenced this pull request Jan 28, 2022
* Enabling restore for MOR table

* Fixing savepoint for compaction commits in MOR
liusenhua pushed a commit to liusenhua/hudi that referenced this pull request Mar 1, 2022
* Enabling restore for MOR table

* Fixing savepoint for compaction commits in MOR
Copy link
Copy Markdown
Member

@vinothchandar vinothchandar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was kind of (pleasantly) surprised how simple this was.
How about these scenarios?

  • If we perform a savepoint from the CLI, while there are inflight writes or pending table services?
  • In line 92, we only work with the base file view? how would this correctly restore the log data?

@vinothchandar
Copy link
Copy Markdown
Member

Key thing to verify here is that - compaction or cleaning does not affect the file slice that is part of the save point

@nsivabalan
Copy link
Copy Markdown
Contributor Author

If we perform a savepoint from the CLI, while there are inflight writes or pending table services?
A: Do you mean "savepoint" or "restore" in above question? If its "savepoint", savepoint can be done only on completed commit. so should not matter if there are any new write in flight. if you meant "restore", as we know, restore is a destructive operation and users are advised to stop all pipelines before they trigger restore. or can expect all queries to fail when they trigger restore.

In line 92, we only work with the base file view? how would this correctly restore the log data?
A: yes, even I was surprised. Its mainly because of the way our cleaning and rollback works. Both works at file slice level. i..e cleaner will clean up only entire file slice(if entire file slice is eligible to be cleaned up). and rollback will remove/delete data and log files only if entire file slice is expected to be rolledback. If not, rollback will just do a append of log blocks. And so, we don't need to do any special handling for log files in general wrt savepoint and restore.

@nsivabalan
Copy link
Copy Markdown
Contributor Author

Key thing to verify here is that - compaction or cleaning does not affect the file slice that is part of the save point
A: I understand cleaning should not affect the file slice as part of the savepoint. but don't quite get you why compaction should not ? can you please clarify.

vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022
* Enabling restore for MOR table

* Fixing savepoint for compaction commits in MOR
@nsivabalan
Copy link
Copy Markdown
Contributor Author

Tested savepoint/restore for all cases that I can think of. documented the cases here https://issues.apache.org/jira/browse/HUDI-3705

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants