HDDS-10704. Do not fail read of EC block if the last chunk is empty #6540

sodonnel · 2024-04-16T15:35:27Z

What changes were proposed in this pull request?

Due to HDDS-10682 some EC blocks in a cluster could have an empty final chunk. These blocks will fail to read and could cause data to become unavailable, even though it is still present on disk.

If the last chunk is empty, this should not stop the block from being read.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10704

How was this patch tested?

Adding a unit test for this issue is not easy within a sensible time.

I tested this manually in a Docker cluster.

First, I created a block with the problem in a build without the fix for HDDS-10682

Then attempted to read the block in a docker-compose cluster and validated the log message was produced:

bash-4.2$ dd if=/dev/random of=4mb bs=1024 count=4096
4096+0 records in
4096+0 records out
4194304 bytes (4.2 MB) copied, 0.401662 s, 10.4 MB/s
bash-4.2$ 
bash-4.2$ ozone sh volume create /vol1
bash-4.2$ ozone sh bucket create /vol1/bucket1
bash-4.2$ 
bash-4.2$ ozone sh key put --type=EC --replication=rs-3-2-1024k /vol1/bucket1/4mb 4mb
bash-4.2$ 
bash-4.2$ ozone admin container close 1
bash-4.2$ 
bash-4.2$ ozone admin container info 1
Container id: 1
Pipeline id: 4a6259e9-b22b-422f-9e53-f43df1f4596c
Write PipelineId: 32e2dd3a-d3fe-44bd-918c-78efd1e7afab
Write Pipeline State: OPEN
Container State: CLOSED
Datanodes: [6d9b61b2-60f1-47e0-b33c-31e5fb82c0f9/ozone-datanode-4.ozone_default,
5bf013b0-8bf9-49c5-bec1-a69c702c6764/ozone-datanode-2.ozone_default,
f11ff464-6007-4bac-b79c-33b784e53cc3/ozone-datanode-5.ozone_default,
2261af6f-ecd4-4a18-8eb1-7988155cfc63/ozone-datanode-1.ozone_default,
ae9b7d62-2eb2-43c3-ad68-aa67d8deab70/ozone-datanode-3.ozone_default]
Replicas: [State: CLOSED; ReplicaIndex: 1; Origin: 6d9b61b2-60f1-47e0-b33c-31e5fb82c0f9; Location: 6d9b61b2-60f1-47e0-b33c-31e5fb82c0f9/ozone-datanode-4.ozone_default,
State: CLOSED; ReplicaIndex: 2; Origin: ae9b7d62-2eb2-43c3-ad68-aa67d8deab70; Location: ae9b7d62-2eb2-43c3-ad68-aa67d8deab70/ozone-datanode-3.ozone_default,
State: CLOSED; ReplicaIndex: 3; Origin: f11ff464-6007-4bac-b79c-33b784e53cc3; Location: f11ff464-6007-4bac-b79c-33b784e53cc3/ozone-datanode-5.ozone_default,
State: CLOSED; ReplicaIndex: 4; Origin: 2261af6f-ecd4-4a18-8eb1-7988155cfc63; Location: 2261af6f-ecd4-4a18-8eb1-7988155cfc63/ozone-datanode-1.ozone_default,
State: CLOSED; ReplicaIndex: 5; Origin: 5bf013b0-8bf9-49c5-bec1-a69c702c6764; Location: 5bf013b0-8bf9-49c5-bec1-a69c702c6764/ozone-datanode-2.ozone_default]

Removed the docker containers for datanode-2 and datanode-5 and allow reconstruction to happen (this creates the zero length final chunk).

Then read the block - note the added log message is produced:

bash-4.2$ export OZONE_ROOT_LOGGER=INFO,console
bash-4.2$ ozone sh key get /vol1/bucket1/4mb 4mb_copy5
2024-04-16 15:23:14,426 [main] INFO protocolPB.OmTransportFactory: Loading OM transport implementation org.apache.hadoop.ozone.om.protocolPB.Hadoop3OmTransportFactory as specified by configuration.
2024-04-16 15:23:15,055 [main] INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties
2024-04-16 15:23:15,099 [main] INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2024-04-16 15:23:15,100 [main] INFO impl.MetricsSystemImpl: XceiverClientMetrics metrics system started
2024-04-16 15:23:15,481 [main] WARN storage.BlockInputStream: The last chunk is empty for container/block 1/113750153625600001 with an offset of the block length. Likely due to HDDS-10682. This is safe to ignore.
2024-04-16 15:23:15,502 [main] WARN storage.BlockInputStream: The last chunk is empty for container/block 1/113750153625600001 with an offset of the block length. Likely due to HDDS-10682. This is safe to ignore.

Finally, I removed the docker containers for 1 and 3 to force reconstruction using the blocks with zero length chunks. Previously reconstruction would have failed forever in this situation. The containers were reconstructed and the expected log was present on the datanodes:

ozone % docker-compose logs | grep "last chunk is empty for"
datanode-9   | 2024-04-16 15:24:52,181 [b434f715-8b74-4b12-a6c5-ee583a27a087-ec-reconstruct-reader-TID-2] WARN storage.BlockInputStream: The last chunk is empty for container/block 1/113750153625600001 with an offset of the block length. Likely due to HDDS-10682. This is safe to ignore.
datanode-9   | 2024-04-16 15:24:52,181 [b434f715-8b74-4b12-a6c5-ee583a27a087-ec-reconstruct-reader-TID-1] WARN storage.BlockInputStream: The last chunk is empty for container/block 1/113750153625600001 with an offset of the block length. Likely due to HDDS-10682. This is safe to ignore.

adoroszlai

Thanks @sodonnel for the patch, LGTM.

If the last chunk is empty, this should not stop the block from being empty.

I assume this should be: "... should not stop the block from being read".

sodonnel · 2024-04-17T08:25:25Z

I assume this should be: "... should not stop the block from being read".

@adoroszlai Yea, you are correct. I have updated the PR description to fix that typo. Thanks for the review!

…pache#6540) (cherry picked from commit 4f9b86e)

…pache#6540) (cherry picked from commit 4f9b86e) (cherry picked from commit 962a72d)

…pache#6540) (cherry picked from commit 4f9b86e)

…k is empty (apache#6540) (cherry picked from commit 4f9b86e) Change-Id: If2d87bdea7be0292c2dde1d5556dccc0ff1c30ff

S O'Donnell added 2 commits April 16, 2024 16:31

HDDS-10704. Do not fail read of EC block if the last chunk is empty

cdf6704

Make variable final

1b12b88

adoroszlai reviewed Apr 16, 2024

View reviewed changes

sodonnel merged commit 4f9b86e into apache:master Apr 17, 2024

sodonnel added a commit to sodonnel/hadoop-ozone that referenced this pull request Apr 17, 2024

HDDS-10704. Do not fail read of EC block if the last chunk is empty (a…

962a72d

…pache#6540) (cherry picked from commit 4f9b86e)

xichen01 pushed a commit to xichen01/ozone that referenced this pull request Apr 17, 2024

HDDS-10704. Do not fail read of EC block if the last chunk is empty (a…

63e91b9

…pache#6540) (cherry picked from commit 4f9b86e)

xichen01 pushed a commit to xichen01/ozone that referenced this pull request Apr 18, 2024

HDDS-10704. Do not fail read of EC block if the last chunk is empty (a…

c05ce79

…pache#6540) (cherry picked from commit 4f9b86e)

xichen01 pushed a commit to xichen01/ozone that referenced this pull request Apr 18, 2024

HDDS-10704. Do not fail read of EC block if the last chunk is empty (a…

333b142

…pache#6540) (cherry picked from commit 4f9b86e)

xichen01 mentioned this pull request Apr 18, 2024

[DO NOT MERGE] Backport some fixes from master to ozone-1.4 #6553

Merged

sodonnel added a commit to sodonnel/hadoop-ozone that referenced this pull request Apr 18, 2024

HDDS-10704. Do not fail read of EC block if the last chunk is empty (a…

1f16f43

…pache#6540) (cherry picked from commit 4f9b86e) (cherry picked from commit 962a72d)

jojochuang pushed a commit to jojochuang/ozone that referenced this pull request May 29, 2024

HDDS-10704. Do not fail read of EC block if the last chunk is empty (a…

c8eebeb

…pache#6540) (cherry picked from commit 4f9b86e)

swamirishi pushed a commit to swamirishi/ozone that referenced this pull request Jun 10, 2024

CDPD-68924. HDDS-10704. Do not fail read of EC block if the last chun…

86dfe71

…k is empty (apache#6540) (cherry picked from commit 4f9b86e) Change-Id: If2d87bdea7be0292c2dde1d5556dccc0ff1c30ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HDDS-10704. Do not fail read of EC block if the last chunk is empty #6540

HDDS-10704. Do not fail read of EC block if the last chunk is empty #6540

Uh oh!

sodonnel commented Apr 16, 2024 •

edited

Loading

Uh oh!

adoroszlai left a comment •

edited

Loading

Uh oh!

sodonnel commented Apr 17, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HDDS-10704. Do not fail read of EC block if the last chunk is empty #6540

HDDS-10704. Do not fail read of EC block if the last chunk is empty #6540

Uh oh!

Conversation

sodonnel commented Apr 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

What is the link to the Apache JIRA

How was this patch tested?

Uh oh!

adoroszlai left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sodonnel commented Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sodonnel commented Apr 16, 2024 •

edited

Loading

adoroszlai left a comment •

edited

Loading

sodonnel commented Apr 17, 2024 •

edited

Loading