-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix live WALs purged while file deletions disabled #3341
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
btw the diff looks much simpler on internal phabricator. |
@ajkr has updated the pull request. View: changes, changes since last import |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
there's a problem where files scheduled for archiving will be skipped, then there'd be a gap in the returned WALs between the live WAL files and the archived WAL files :( |
have another idea: how about we track unmatched calls to sent out #3350 to make this possible |
2af2e46
to
4c60749
Compare
@ajkr has updated the pull request. |
I updated the PR and description with a new approach now that #3350 has landed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
// files are going to be purged. Additional purges won't be scheduled as | ||
// long as deletions are disabled (so the below loop must terminate). | ||
InstrumentedMutexLock l(&mutex_); | ||
while (disable_delete_obsolete_files_ > 0 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to also make sure bg_cv_ is signaled when disable_delete_obsolete_files_ goes to zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@ajkr has updated the pull request. View: changes, changes since last import |
09ac69b
to
5722768
Compare
@ajkr has updated the pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajkr is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: When calling `DisableFileDeletions` followed by `GetSortedWalFiles`, we guarantee the files returned by the latter call won't be deleted until after file deletions are re-enabled. However, `GetSortedWalFiles` didn't omit files already planned for deletion via `PurgeObsoleteFiles`, so the guarantee could be broken. We fix it by making `GetSortedWalFiles` wait for the number of pending purges to hit zero if file deletions are disabled. This condition is eventually met since `PurgeObsoleteFiles` is guaranteed to be called for the existing pending purges, and new purges cannot be scheduled while file deletions are disabled. Once the condition is met, `GetSortedWalFiles` simply returns the content of DB and archive directories, which nobody can delete (except for deletion scheduler, for which I plan to fix this bug later) until deletions are re-enabled. Closes facebook#3341 Differential Revision: D6681131 Pulled By: ajkr fbshipit-source-id: 90b1e2f2362ea9ef715623841c0826611a817634
Summary: facebook#3341 guaranteed that upon return of `GetSortedWalFiles` after `DisableFileDeletions`, all pending purges of previously obsolete WAL files will have finished. However, the addition of avoid_unnecessary_blocking_io in #XXXX opened a hole in the code making that assurance, which can lead to files to be copied for checkpoint or backup going missing before being copied, with that option enabled. This change patches the hole. Test Plan: apparent fix to backups in crash test observed. Will work on a unit test for another commit
…7369) Summary: #3341 guaranteed that upon return of `GetSortedWalFiles` after `DisableFileDeletions`, all pending purges of previously obsolete WAL files will have finished. However, the addition of avoid_unnecessary_blocking_io in #5043 opened a hole in the code making that assurance, which can lead to files to be copied for checkpoint or backup going missing before being copied, with that option enabled. This change patches the hole. Pull Request resolved: #7369 Test Plan: apparent fix to backups in crash test observed. Will work on a unit test for another commit Reviewed By: ajkr Differential Revision: D23620258 Pulled By: pdillinger fbshipit-source-id: bea36b461a5b719c3e3ef802f967bc3e8ae71614
…acebook#7369) Summary: facebook#3341 guaranteed that upon return of `GetSortedWalFiles` after `DisableFileDeletions`, all pending purges of previously obsolete WAL files will have finished. However, the addition of avoid_unnecessary_blocking_io in facebook#5043 opened a hole in the code making that assurance, which can lead to files to be copied for checkpoint or backup going missing before being copied, with that option enabled. This change patches the hole. Pull Request resolved: facebook#7369 Test Plan: apparent fix to backups in crash test observed. Will work on a unit test for another commit Reviewed By: ajkr Differential Revision: D23620258 Pulled By: pdillinger fbshipit-source-id: bea36b461a5b719c3e3ef802f967bc3e8ae71614
When calling
DisableFileDeletions
followed byGetSortedWalFiles
, we guarantee the files returned by the latter call won't be deleted until after file deletions are re-enabled. However,GetSortedWalFiles
didn't omit files already planned for deletion viaPurgeObsoleteFiles
, so the guarantee could be broken.We fix it by making
GetSortedWalFiles
wait for the number of pending purges to hit zero if file deletions are disabled. This condition is eventually met sincePurgeObsoleteFiles
is guaranteed to be called for the existing pending purges, and new purges cannot be scheduled while file deletions are disabled. Once the condition is met,GetSortedWalFiles
simply returns the content of DB and archive directories, which nobody can delete (except for deletion scheduler, for which I plan to fix this bug later) until deletions are re-enabled.Test Plan: