HADOOP-18242. ABFS Rename Failure when tracking metadata is in an incomplete state#4331
Conversation
|
CC: @mukund-thakur @steveloughran |
@snvijaya please provide some info on this if you could. Thanks. |
steveloughran
left a comment
There was a problem hiding this comment.
made some suggestions. to really test that resilience, you will need to use mockito to inject a failure
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Show resolved
Hide resolved
| } | ||
|
|
||
| @Test | ||
| public void testRenameWithNoDestinationParentDir() throws Exception { |
There was a problem hiding this comment.
add similar test case for resilient rename api, here or in ITestAbfsManifestStoreOperations
snvijaya
left a comment
There was a problem hiding this comment.
Please have a look at the code comment given in AbfsClient.
As mentioned by Steve, driver test code uses mockito to recreate negative test cases.
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Outdated
Show resolved
Hide resolved
|
Thanks for the review @snvijaya, had some pending changes, going to address your comments in the next commit alongside the change required to the condition of httpStatus code in the metadata incomplete scenario. |
steveloughran
left a comment
There was a problem hiding this comment.
reviewed. the less we use mockito, the less to break. so use new() over mocking where possible
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Outdated
Show resolved
Hide resolved
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClientResult.java
Outdated
Show resolved
Hide resolved
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClientResult.java
Outdated
Show resolved
Hide resolved
...ls/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/TestAbfsRenameRetryRecovery.java
Outdated
Show resolved
Hide resolved
...ls/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/TestAbfsRenameRetryRecovery.java
Outdated
Show resolved
Hide resolved
...ls/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/TestAbfsRenameRetryRecovery.java
Show resolved
Hide resolved
...ls/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/TestAbfsRenameRetryRecovery.java
Outdated
Show resolved
Hide resolved
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemRename.java
Outdated
Show resolved
Hide resolved
steveloughran
left a comment
There was a problem hiding this comment.
+1
Happy with is. Mehakmeet, is this ready to go in?
...ools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClientResult.java
Outdated
Show resolved
Hide resolved
mukund-thakur
left a comment
There was a problem hiding this comment.
few minor feedbacks.
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AbfsStatistic.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsClient.java
Outdated
Show resolved
Hide resolved
...hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemRename.java
Outdated
Show resolved
Hide resolved
...-azure/src/test/java/org/apache/hadoop/fs/azurebfs/services/TestAbfsRenameRetryRecovery.java
Outdated
Show resolved
Hide resolved
@steveloughran |
steveloughran
left a comment
There was a problem hiding this comment.
LGTM. +1
how about we merge this and see if it is enough to eliminate the problem -if we see different failures with the same fix, we can extend it.
...tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java
Outdated
Show resolved
Hide resolved
steveloughran
left a comment
There was a problem hiding this comment.
was about to merge, did a final review, and realised that we should pick up the source etag and use it in the inner rename call
|
|
||
| // Doing a HEAD call resolves the incomplete metadata state and | ||
| // then we can retry the rename operation. | ||
| getPathStatus(source, false, tracingContext); |
There was a problem hiding this comment.
I've had one more thought here. the path status contains the etag, doesn't it? so if sourceEtag was null, now we can set it. That way if the rename failure is followed
immediately buy the rename-failure-but-it-really-happened event of HADOOP-18163, we are lined up for recovery
|
really close to merge, just had one final thought before we hit the merge button. already updated the PR text in preparation for the merge (i was writing it up while looking at the code) |
|
🎊 +1 overall
This message was automatically generated. |
|
due diligence question: what did you test against? |
|
+1, merging. tested myself against azure cardiff; transient failure of TestAbfsClientThrottlingAnalyzer in the parallel run; went away on a standalone one. that test is too brittle. |
…omplete state (apache#4331) ABFS rename fails intermittently when the Storage-blob tracking metadata is in an incomplete state. This surfaces as the error code 404 and an error message of "RenameDestinationParentPathNotFound" To mitigate this issue, when a request fails with this response. the ABFS client issues a HEAD call on the source file and then retries the rename operation again ABFS filesystem statistics track when this occurs with new counters rename_recovery metadata_incomplete_rename_failures rename_path_attempts This is very rare occurrence and appears to be triggered under certain heavy load conditions, just as with HADOOP-18163. Contributed by Mehakmeet Singh.
…omplete state (apache#4331) ABFS rename fails intermittently when the Storage-blob tracking metadata is in an incomplete state. This surfaces as the error code 404 and an error message of "RenameDestinationParentPathNotFound" To mitigate this issue, when a request fails with this response. the ABFS client issues a HEAD call on the source file and then retries the rename operation again ABFS filesystem statistics track when this occurs with new counters rename_recovery metadata_incomplete_rename_failures rename_path_attempts This is very rare occurrence and appears to be triggered under certain heavy load conditions, just as with HADOOP-18163. Contributed by Mehakmeet Singh.
ABFS rename fails intermittently when the Storage-blob tracking
metadata is in an incomplete state. This surfaces as the error code
404 and an error message of "RenameDestinationParentPathNotFound"
To mitigate this issue, when a request fails with this response.
the ABFS client issues a HEAD call on the source file
and then retry the rename op.
ABFS filesystem statistics track when this occurs with new counters
rename_recovery
metadata_incomplete_rename_failures
rename_path_attempts
This is very rare occurrence and appears to be triggered under certain
heavy load conditions, just as HADOOP-18163 is.
Contributed by Mehakmeet Singh.