Skip to content

HDFS-16544. EC decoding failed due to invalid buffer#4179

Merged
tasanuma merged 1 commit intoapache:trunkfrom
liubingxing:HDFS-16544
Apr 20, 2022
Merged

HDFS-16544. EC decoding failed due to invalid buffer#4179
tasanuma merged 1 commit intoapache:trunkfrom
liubingxing:HDFS-16544

Conversation

@liubingxing
Copy link
Contributor

@liubingxing liubingxing commented Apr 15, 2022

In HDFS-16538 , we found an EC file decoding bug if more than one data block read failed. 

Currently, we found another bug trigger by #StatefulStripeReader.decode.

If we read an EC file which length more than one stripe, and this file have one data block and the first parity block corrupted, this error will happen.

org.apache.hadoop.HadoopIllegalArgumentException: Invalid buffer found, not allowing null    
at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkOutputBuffers(ByteBufferDecodingState.java:132)
    at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.<init>(ByteBufferDecodingState.java:48)
    at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
    at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
    at org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:435)
    at org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
    at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:392)
    at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:315)
    at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:408)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:918) 

Let's say we use ec(6+3) and the data block[0] and the first parity block[6] are corrupted.

  1. The readers for block[0] and block[6] will be closed after reading the first stripe of an EC file;
  2. When the client reading the second stripe of the EC file, it will trigger #prepareParityChunk for block[6].
  3. The decodeInputs[6] will not be constructed because the reader for block[6] was closed.
boolean prepareParityChunk(int index) {
  Preconditions.checkState(index >= dataBlkNum
      && alignedStripe.chunks[index] == null);
  if (readerInfos[index] != null && readerInfos[index].shouldSkip) {
    alignedStripe.chunks[index] = new StripingChunk(StripingChunk.MISSING);
    // we have failed the block reader before
    return false;
  }
  final int parityIndex = index - dataBlkNum;
  ByteBuffer buf = dfsStripedInputStream.getParityBuffer().duplicate();
  buf.position(cellSize * parityIndex);
  buf.limit(cellSize * parityIndex + (int) alignedStripe.range.spanInBlock);
  decodeInputs[index] =
      new ECChunk(buf.slice(), 0, (int) alignedStripe.range.spanInBlock);
  alignedStripe.chunks[index] =
      new StripingChunk(decodeInputs[index].getBuffer());
  return true;
} 

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 55s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 40s Maven dependency ordering for branch
+1 💚 mvninstall 28m 15s trunk passed
+1 💚 compile 6m 41s trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 compile 6m 17s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 17s trunk passed
+1 💚 mvnsite 2m 26s trunk passed
+1 💚 javadoc 1m 46s trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 2m 12s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 6m 15s trunk passed
+1 💚 shadedclient 25m 52s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 22s Maven dependency ordering for patch
+1 💚 mvninstall 2m 8s the patch passed
+1 💚 compile 6m 34s the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javac 6m 34s the patch passed
+1 💚 compile 6m 6s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 6m 6s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 8s the patch passed
+1 💚 mvnsite 2m 17s the patch passed
+1 💚 javadoc 1m 30s the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 1m 59s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 6m 6s the patch passed
+1 💚 shadedclient 25m 44s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 18s hadoop-hdfs-client in the patch passed.
-1 ❌ unit 359m 26s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 42s The patch does not generate ASF License warnings.
511m 13s
Reason Tests
Failed junit tests hadoop.hdfs.TestReadStripedFileWithDecoding
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4179/1/artifact/out/Dockerfile
GITHUB PR #4179
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 6eebe1778dfc 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 19:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / f1cd9cd8bafc2688da218dfd34504cf73d12fa55
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4179/1/testReport/
Max. process+thread count 2098 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4179/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@jojochuang
Copy link
Contributor

@tasanuma @ferhui fyi

1 similar comment
@jojochuang
Copy link
Contributor

@tasanuma @ferhui fyi

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 54s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 42s Maven dependency ordering for branch
+1 💚 mvninstall 28m 21s trunk passed
+1 💚 compile 6m 50s trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 compile 6m 27s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 31s trunk passed
+1 💚 mvnsite 2m 50s trunk passed
+1 💚 javadoc 2m 9s trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 2m 32s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 6m 36s trunk passed
+1 💚 shadedclient 25m 48s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 26s Maven dependency ordering for patch
+1 💚 mvninstall 2m 16s the patch passed
+1 💚 compile 6m 41s the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javac 6m 41s the patch passed
+1 💚 compile 6m 21s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 6m 21s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 16s the patch passed
+1 💚 mvnsite 2m 24s the patch passed
+1 💚 javadoc 1m 42s the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 2m 9s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 6m 24s the patch passed
+1 💚 shadedclient 25m 54s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 27s hadoop-hdfs-client in the patch passed.
-1 ❌ unit 370m 32s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 1m 6s The patch does not generate ASF License warnings.
527m 12s
Reason Tests
Failed junit tests hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
hadoop.hdfs.TestClientProtocolForPipelineRecovery
hadoop.hdfs.TestReplaceDatanodeFailureReplication
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4179/2/artifact/out/Dockerfile
GITHUB PR #4179
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 542720fc08b8 4.15.0-166-generic #174-Ubuntu SMP Wed Dec 8 19:07:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 2b6adbc
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4179/2/testReport/
Max. process+thread count 2053 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4179/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Member

@tasanuma tasanuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liubingxing Thanks for finding and fixing the issue. I went into the source code, and the fix makes sense to me. +1.

@tasanuma tasanuma merged commit 76bbd17 into apache:trunk Apr 20, 2022
tasanuma pushed a commit that referenced this pull request Apr 20, 2022
tasanuma pushed a commit that referenced this pull request Apr 20, 2022
@liubingxing
Copy link
Contributor Author

Thanks @tasanuma for the merged and thanks @jojochuang

HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
…che#4179)

(cherry picked from commit 76bbd17)
(cherry picked from commit 4fed01e)
Change-Id: I061978cbd2ba29f09c2cadceb046e51bfa13e612
LiuGuH pushed a commit to LiuGuH/hadoop that referenced this pull request Mar 26, 2024
LiuGuH pushed a commit to LiuGuH/hadoop that referenced this pull request Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants