Skip to content

HDFS-16538. EC decoding failed due to not enough valid inputs#4167

Merged
tasanuma merged 1 commit intoapache:trunkfrom
liubingxing:HDFS-16538
Apr 19, 2022
Merged

HDFS-16538. EC decoding failed due to not enough valid inputs#4167
tasanuma merged 1 commit intoapache:trunkfrom
liubingxing:HDFS-16538

Conversation

@liubingxing
Copy link
Contributor

@liubingxing liubingxing commented Apr 13, 2022

Currently, we found this error if the #StripeReader.readStripe() have more than one block read failed.
We use the EC policy ec(6+3) in our cluster.

Caused by: org.apache.hadoop.HadoopIllegalArgumentException: No enough valid inputs are provided, not recoverable
        at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.checkInputBuffers(ByteBufferDecodingState.java:119)
        at org.apache.hadoop.io.erasurecode.rawcoder.ByteBufferDecodingState.<init>(ByteBufferDecodingState.java:47)
        at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:86)
        at org.apache.hadoop.io.erasurecode.rawcoder.RawErasureDecoder.decode(RawErasureDecoder.java:170)
        at org.apache.hadoop.hdfs.StripeReader.decodeAndFillBuffer(StripeReader.java:462)
        at org.apache.hadoop.hdfs.StatefulStripeReader.decode(StatefulStripeReader.java:94)
        at org.apache.hadoop.hdfs.StripeReader.readStripe(StripeReader.java:406)
        at org.apache.hadoop.hdfs.DFSStripedInputStream.readOneStripe(DFSStripedInputStream.java:327)
        at org.apache.hadoop.hdfs.DFSStripedInputStream.readWithStrategy(DFSStripedInputStream.java:420)
        at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:892)
        at java.base/java.io.DataInputStream.read(DataInputStream.java:149)
        at java.base/java.io.DataInputStream.read(DataInputStream.java:149) 
while (!futures.isEmpty()) {
  try {
    StripingChunkReadResult r = StripedBlockUtil
        .getNextCompletedStripedRead(service, futures, 0);
    dfsStripedInputStream.updateReadStats(r.getReadStats());
    DFSClient.LOG.debug("Read task returned: {}, for stripe {}",
        r, alignedStripe);
    StripingChunk returnedChunk = alignedStripe.chunks[r.index];
    Preconditions.checkNotNull(returnedChunk);
    Preconditions.checkState(returnedChunk.state == StripingChunk.PENDING);

    if (r.state == StripingChunkReadResult.SUCCESSFUL) {
      returnedChunk.state = StripingChunk.FETCHED;
      alignedStripe.fetchedChunksNum++;
      updateState4SuccessRead(r);
      if (alignedStripe.fetchedChunksNum == dataBlkNum) {
        clearFutures();
        break;
      }
    } else {
      returnedChunk.state = StripingChunk.MISSING;
      // close the corresponding reader
      dfsStripedInputStream.closeReader(readerInfos[r.index]);

      final int missing = alignedStripe.missingChunksNum;
      alignedStripe.missingChunksNum++;
      checkMissingBlocks();

      readDataForDecoding();
      readParityChunks(alignedStripe.missingChunksNum - missing);
    } 

This error can be trigger by #StatefulStripeReader.decode.

The reason is that:

  1. If there are more than one data block read failed, the #readDataForDecoding will be called multiple times;
  2. The decodeInputs array will be initialized repeatedly.
  3. The parity data in decodeInputs array which filled by #readParityChunks previously will be set to null.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 42s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 38m 25s trunk passed
+1 💚 compile 1m 2s trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 compile 0m 54s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 34s trunk passed
+1 💚 mvnsite 1m 0s trunk passed
+1 💚 javadoc 0m 50s trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 40s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 2m 46s trunk passed
+1 💚 shadedclient 22m 25s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 49s the patch passed
+1 💚 compile 0m 54s the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javac 0m 54s the patch passed
+1 💚 compile 0m 47s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 47s the patch passed
+1 💚 blanks 0m 1s The patch has no blanks issues.
+1 💚 checkstyle 0m 19s the patch passed
+1 💚 mvnsite 0m 50s the patch passed
+1 💚 javadoc 0m 34s the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 32s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 2m 28s the patch passed
+1 💚 shadedclient 21m 47s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 22s hadoop-hdfs-client in the patch passed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
100m 29s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4167/1/artifact/out/Dockerfile
GITHUB PR #4167
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux f5521b8832b5 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 8846746dd03b9b54a7db1d7d79f2835eb1c6adb6
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4167/1/testReport/
Max. process+thread count 543 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4167/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 45s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 39m 1s trunk passed
+1 💚 compile 1m 2s trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 compile 0m 55s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 0m 35s trunk passed
+1 💚 mvnsite 1m 1s trunk passed
+1 💚 javadoc 0m 50s trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 40s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 2m 47s trunk passed
+1 💚 shadedclient 22m 45s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 0m 50s the patch passed
+1 💚 compile 0m 53s the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javac 0m 53s the patch passed
+1 💚 compile 0m 45s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 0m 45s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 19s the patch passed
+1 💚 mvnsite 0m 48s the patch passed
+1 💚 javadoc 0m 34s the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 0m 32s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 2m 30s the patch passed
+1 💚 shadedclient 21m 49s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 23s hadoop-hdfs-client in the patch passed.
+1 💚 asflicense 0m 38s The patch does not generate ASF License warnings.
101m 35s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4167/2/artifact/out/Dockerfile
GITHUB PR #4167
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux 6a5ad419e32a 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 0507620c617b7868361a484773d3f74f0a1dd8dc
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4167/2/testReport/
Max. process+thread count 548 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client U: hadoop-hdfs-project/hadoop-hdfs-client
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4167/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

@liubingxing
Copy link
Contributor Author

@tasanuma Please take a look at this.

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 1m 0s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 15m 57s Maven dependency ordering for branch
+1 💚 mvninstall 27m 48s trunk passed
+1 💚 compile 6m 37s trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 compile 6m 16s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 checkstyle 1m 25s trunk passed
+1 💚 mvnsite 2m 43s trunk passed
+1 💚 javadoc 2m 0s trunk passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 2m 17s trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 6m 40s trunk passed
+1 💚 shadedclient 25m 54s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 31s Maven dependency ordering for patch
+1 💚 mvninstall 2m 15s the patch passed
+1 💚 compile 6m 46s the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javac 6m 46s the patch passed
+1 💚 compile 6m 9s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 javac 6m 9s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 10s the patch passed
+1 💚 mvnsite 2m 18s the patch passed
+1 💚 javadoc 1m 36s the patch passed with JDK Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04
+1 💚 javadoc 2m 7s the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
+1 💚 spotbugs 6m 26s the patch passed
+1 💚 shadedclient 26m 7s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 2m 27s hadoop-hdfs-client in the patch passed.
+1 💚 unit 234m 43s hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 51s The patch does not generate ASF License warnings.
389m 33s
Subsystem Report/Notes
Docker ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4167/3/artifact/out/Dockerfile
GITHUB PR #4167
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell
uname Linux a7b6b3da85bb 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 22359a9
Default Java Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.14.1+1-Ubuntu-0ubuntu1.20.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4167/3/testReport/
Max. process+thread count 3058 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs-client hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4167/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org

This message was automatically generated.

Copy link
Member

@tasanuma tasanuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liubingxing Thanks for finding the bug and fixing it. +1. I'm a bit surprised that we didn't cover this common situation.

@tasanuma tasanuma merged commit 52e152f into apache:trunk Apr 19, 2022
tasanuma pushed a commit that referenced this pull request Apr 19, 2022
Co-authored-by: liubingxing <liubingxing@bigo.sg>
(cherry picked from commit 52e152f)
tasanuma pushed a commit that referenced this pull request Apr 19, 2022
Co-authored-by: liubingxing <liubingxing@bigo.sg>
(cherry picked from commit 52e152f)
@liubingxing
Copy link
Contributor Author

liubingxing commented Apr 19, 2022

@tasanuma Thanks for the review and merged. I found another bug related to EC decoding in HDFS-16544 , Please take a look. Thanks you very much.

Xushaohong pushed a commit to Xushaohong/hadoop that referenced this pull request Nov 10, 2022
…nputs apache#4167 (merge request !715)

Squash merge branch 'THADOOP-402' into 'release-3.2.1-tq-0.2'
HarshitGupta11 pushed a commit to HarshitGupta11/hadoop that referenced this pull request Nov 28, 2022
jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
…puts (apache#4167)

Co-authored-by: liubingxing <liubingxing@bigo.sg>
(cherry picked from commit 52e152f)
(cherry picked from commit d993d22)
Change-Id: I7e6b6054110dc4fcbcd959819c727e7f5b0d9eec
LiuGuH pushed a commit to LiuGuH/hadoop that referenced this pull request Mar 26, 2024
LiuGuH pushed a commit to LiuGuH/hadoop that referenced this pull request Mar 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments