[Remote Translog] Deleting remote translog considering latest remote metadata#5869
[Remote Translog] Deleting remote translog considering latest remote metadata#5869gbbafna wants to merge 5 commits intoopensearch-project:mainfrom
Conversation
Gradle Check (Jenkins) Run Completed with:
|
Codecov Report
@@ Coverage Diff @@
## main #5869 +/- ##
============================================
+ Coverage 70.88% 70.91% +0.02%
- Complexity 58720 58776 +56
============================================
Files 4768 4768
Lines 280575 280584 +9
Branches 40514 40516 +2
============================================
+ Hits 198881 198971 +90
+ Misses 65334 65327 -7
+ Partials 16360 16286 -74
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
server/src/main/java/org/opensearch/index/translog/RemoteFsTranslog.java
Show resolved
Hide resolved
server/src/main/java/org/opensearch/index/translog/RemoteFsTranslog.java
Outdated
Show resolved
Hide resolved
| } | ||
| } | ||
|
|
||
| public void trimUnreferencedReaders() throws IOException { |
| logger.trace("delete remote translog generation file [{}], not referenced by metadata anymore", generation); | ||
| deleteRemoteGeneration(generation); | ||
| } else { | ||
| break; |
There was a problem hiding this comment.
Why are we breaking out of the for loop?
I can think of that we may want to retry the uploads - in which case should we start the for loop from minimum generation number that exists?
There was a problem hiding this comment.
this would increase the flush times to a very high number. The clean up can be taken as an async job on its own .
| String translogFilename = Translog.getFilename(generation); | ||
| if (fileTransferTracker.uploaded(translogFilename)) { | ||
| logger.trace("delete remote translog generation file [{}], not referenced by metadata anymore", generation); | ||
| deleteRemoteGeneration(generation); |
There was a problem hiding this comment.
deleteRemoteGeneration method uses the current primaryTerm. This probably would not be handled when there is failover and the primary term has increased. Could we rely on metadata to fetch the right generation to primary term mapping for cleaning up the files?
There was a problem hiding this comment.
That's a great catch @ashking94 . The issue here is the latest metadata doesn't know about the right primary term for this generation. I will create a backlog item for cleaning up the older primary term translog files . This can be done on every failover or an async job . I don't want to complicate the usual deletion flow due to this corner case .
9be02e5 to
246924d
Compare
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
Gradle Check (Jenkins) Run Completed with:
|
2422c10 to
950c2bc
Compare
Gradle Check (Jenkins) Run Completed with:
|
Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
Signed-off-by: Ashish Singh <ssashish@amazon.com>
950c2bc to
61344d4
Compare
Gradle Check (Jenkins) Run Completed with:
|
|
Created PR #6086 for rest of the development due to access issue. |
Signed-off-by: Gaurav Bafna gbbafna@amazon.com
Description
We should delete remote translog considering the latest metadata file uploaded. The tlog/ckp files referenced by that metadata file cannot be deleted . By doing this we will be able to restore translog from that metadata.
Issues Resolved
#5845
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.