-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-10704. Do not fail read of EC block if the last chunk is empty #6540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sodonnel for the patch, LGTM.
If the last chunk is empty, this should not stop the block from being empty.
I assume this should be: "... should not stop the block from being read".
@adoroszlai Yea, you are correct. I have updated the PR description to fix that typo. Thanks for the review! |
…pache#6540) (cherry picked from commit 4f9b86e)
…pache#6540) (cherry picked from commit 4f9b86e)
…pache#6540) (cherry picked from commit 4f9b86e)
…pache#6540) (cherry picked from commit 4f9b86e)
…pache#6540) (cherry picked from commit 4f9b86e) (cherry picked from commit 962a72d)
…pache#6540) (cherry picked from commit 4f9b86e)
…k is empty (apache#6540) (cherry picked from commit 4f9b86e) Change-Id: If2d87bdea7be0292c2dde1d5556dccc0ff1c30ff
What changes were proposed in this pull request?
Due to HDDS-10682 some EC blocks in a cluster could have an empty final chunk. These blocks will fail to read and could cause data to become unavailable, even though it is still present on disk.
If the last chunk is empty, this should not stop the block from being read.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10704
How was this patch tested?
Adding a unit test for this issue is not easy within a sensible time.
I tested this manually in a Docker cluster.
First, I created a block with the problem in a build without the fix for HDDS-10682
Then attempted to read the block in a docker-compose cluster and validated the log message was produced:
Removed the docker containers for datanode-2 and datanode-5 and allow reconstruction to happen (this creates the zero length final chunk).
Then read the block - note the added log message is produced:
Finally, I removed the docker containers for 1 and 3 to force reconstruction using the blocks with zero length chunks. Previously reconstruction would have failed forever in this situation. The containers were reconstructed and the expected log was present on the datanodes: