Skip to content

Conversation

@watermelon12138
Copy link
Contributor

Change Logs

The result of filter will be NOT. Therefore, an instant whose value is less than firstSavepoint is selected for archiving. The expected result is that an instant greater than firstSavepoint is selected for archiving.

Impact

When a savepoint exists, the archiving condition is incorrect.

Risk level: none | low | medium | high

Choose one. If medium or high, explain what verification was done to mitigate the risks.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan
Copy link
Contributor

@watermelon12138 : sorry. I am not sure I understand the fix.
archival is expected to stop at the first savepointed commit.
i.e.

C1, C2, C3, C4, SP_C4, C5, C6, C7.

if archival kicks in, only C1 -> C3 can be archived. we can't proceed further until the savepoint is deleted. this is the expected behavior.

@nsivabalan nsivabalan self-assigned this Sep 26, 2022
@nsivabalan nsivabalan added priority:high Significant impact; potential bugs status:in-progress Work in progress labels Sep 26, 2022
@watermelon12138
Copy link
Contributor Author

@nsivabalan
Thank you for your reply. SP_C4 will retain the state formed from C1 to C4. Why is C1->C3 still can be archived? Archiving C1->C3 may affect clean.
For the MOR table, at hudi 0.11, I tested that it could not delete the savepoint, which would result in the mor table never being archived.

@nsivabalan
Copy link
Contributor

by SP_C4, what it essentially means is that, if at C4, if someone does snaphot query, hudi serves data from N data files. those N data files will be tracked by SP_C4 and cleaner will never touch those files.
so even if archival cleans up C1 -> C3 timeline files, we should still be able to perform point in time query for C4. should not be an issue.

wrt MOR delete savepoint, yes, we had a bug which we fixed it later. #6744

@nsivabalan
Copy link
Contributor

@watermelon12138 : let me know if the patch still makes sense. I can take another look.

@nsivabalan nsivabalan added the release-0.12.2 Patches targetted for 0.12.2 label Dec 6, 2022
@codope codope removed the release-0.12.2 Patches targetted for 0.12.2 label Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:high Significant impact; potential bugs status:in-progress Work in progress

Projects

Status: 👤 User Action
Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

4 participants