Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS-17374. EC: StripedBlockReader#newConnectedPeer should set SO_TIMEOUT and SO_KEEPALIVE #6536

Open
wants to merge 2 commits into
base: trunk
Choose a base branch
from

Conversation

hfutatzhanghb
Copy link
Contributor

@hfutatzhanghb hfutatzhanghb commented Feb 7, 2024

Description of PR

Refer to HDFS-17374.
We met a strange and serious problem on the one of product cluster using EC.
The problem can be reproduced every time when writing mass data into this EC cluster along with the network card is full. After writing, there are many half-open connection and can never release by themselves until we restart datanode.
After digging into some logs and codes, we suspect that it was caused by StripedBlockReader#newConnectedPeer without setting tcp_keepalive.
This problem is very serious, because it can use up datanode‘s available port.

image

image

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 41m 38s trunk passed
+1 💚 compile 1m 19s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 1m 11s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 1m 9s trunk passed
+1 💚 mvnsite 1m 18s trunk passed
+1 💚 javadoc 1m 5s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 35s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 3m 14s trunk passed
+1 💚 shadedclient 34m 20s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 7s the patch passed
+1 💚 compile 1m 11s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 1m 11s the patch passed
+1 💚 compile 1m 5s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 1m 5s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 0m 57s the patch passed
+1 💚 mvnsite 1m 9s the patch passed
+1 💚 javadoc 0m 52s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 28s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 3m 16s the patch passed
+1 💚 shadedclient 34m 40s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 261m 3s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 42s The patch does not generate ASF License warnings.
395m 31s
Reason Tests
Failed junit tests hadoop.hdfs.TestWriteRead
hadoop.hdfs.TestStripedFileAppend
hadoop.hdfs.TestReadStripedFileWithDecodingCorruptData
hadoop.hdfs.TestDFSStripedOutputStream
hadoop.hdfs.TestErasureCodingPolicyWithSnapshotWithRandomECPolicy
hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy
hadoop.hdfs.TestDFSStripedInputStream
hadoop.hdfs.server.datanode.TestDirectoryScanner
hadoop.hdfs.TestFileAppend2
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6536/1/artifact/out/Dockerfile
GITHUB PR #6536
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 6495c596dca5 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 8d2e864
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6536/1/testReport/
Max. process+thread count 3435 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6536/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hfutatzhanghb
Copy link
Contributor Author

@Hexiaoqiao @zhangshuyan0 @tomscut @tasanuma Sir, PTAL when you have free time, thanks a lot~

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 32s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 44m 27s trunk passed
+1 💚 compile 1m 22s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 1m 18s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 1m 11s trunk passed
+1 💚 mvnsite 1m 23s trunk passed
+1 💚 javadoc 1m 5s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 31s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 3m 13s trunk passed
+1 💚 shadedclient 35m 45s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 15s the patch passed
+1 💚 compile 1m 13s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 1m 13s the patch passed
+1 💚 compile 1m 13s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 1m 13s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 2s the patch passed
+1 💚 mvnsite 1m 13s the patch passed
+1 💚 javadoc 0m 51s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 28s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 3m 16s the patch passed
+1 💚 shadedclient 36m 37s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 242m 52s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 45s The patch does not generate ASF License warnings.
384m 23s
Reason Tests
Failed junit tests hadoop.hdfs.server.namenode.TestDefaultBlockPlacementPolicy
hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor
hadoop.hdfs.server.namenode.top.window.TestRollingWindow
hadoop.hdfs.TestDFSStripedOutputStream
hadoop.hdfs.TestReconstructStripedFileWithRandomECPolicy
hadoop.hdfs.TestReadStripedFileWithDNFailure
hadoop.hdfs.server.namenode.TestAddStripedBlocks
hadoop.hdfs.TestErasureCodingPolicies
hadoop.hdfs.TestDecommissionWithBackoffMonitor
hadoop.hdfs.TestFileChecksum
hadoop.hdfs.server.namenode.TestCacheDirectivesWithViewDFS
hadoop.hdfs.server.namenode.TestFSEditLogLoader
hadoop.hdfs.TestReconstructStripedFile
hadoop.hdfs.server.namenode.TestHostsFiles
hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6536/3/artifact/out/Dockerfile
GITHUB PR #6536
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux b1487e605ff0 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / f775e5a
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6536/3/testReport/
Max. process+thread count 3292 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6536/3/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 32s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 43m 15s trunk passed
+1 💚 compile 1m 18s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 1m 13s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 checkstyle 1m 6s trunk passed
+1 💚 mvnsite 1m 20s trunk passed
+1 💚 javadoc 1m 2s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 31s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 3m 19s trunk passed
+1 💚 shadedclient 35m 37s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 1m 9s the patch passed
+1 💚 compile 1m 9s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 1m 9s the patch passed
+1 💚 compile 1m 6s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 1m 6s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 checkstyle 1m 3s the patch passed
+1 💚 mvnsite 1m 15s the patch passed
+1 💚 javadoc 0m 51s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 1m 28s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 spotbugs 3m 30s the patch passed
+1 💚 shadedclient 36m 20s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 268m 5s /patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt hadoop-hdfs in the patch passed.
+1 💚 asflicense 0m 48s The patch does not generate ASF License warnings.
407m 16s
Reason Tests
Failed junit tests hadoop.hdfs.TestDecommissionWithStriped
hadoop.hdfs.server.namenode.TestQuotaWithStripedBlocksWithRandomECPolicy
hadoop.hdfs.TestDistributedFileSystemWithECFile
hadoop.hdfs.server.namenode.TestRefreshBlockPlacementPolicy
hadoop.hdfs.tools.offlineEditsViewer.TestOfflineEditsViewer
hadoop.hdfs.TestDFSStripedInputStream
hadoop.hdfs.tools.TestDFSZKFailoverController
hadoop.hdfs.TestReconstructStripedFile
hadoop.hdfs.server.namenode.TestBlockUnderConstruction
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6536/2/artifact/out/Dockerfile
GITHUB PR #6536
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 6935c41c5d00 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / e89126c
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6536/2/testReport/
Max. process+thread count 3211 (vs. ulimit of 5500)
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6536/2/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@tasanuma
Copy link
Member

Thanks for the PR, @hfutatzhanghb. It makes sense to me. Did you solve the problem in your cluster with this PR?

Although I don't think that the failed tests are related, just to be sure, I reran CI.

@hfutatzhanghb
Copy link
Contributor Author

@tasanuma Hi, sir. Thanks for your reviewing. Now i can not conclude whether it solve this problem on our cluster due to Spring Festival‘s low peak of throughput. Please allow me to reply you a few days later~ Thanks a lot sir.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
_ Prechecks _
+1 💚 dupname 0m 01s No case conflicting files found.
+0 🆗 spotbugs 0m 00s spotbugs executables are not available.
+0 🆗 codespell 0m 00s codespell was not available.
+0 🆗 detsecrets 0m 00s detect-secrets was not available.
+1 💚 @author 0m 01s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 00s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 93m 48s trunk passed
+1 💚 compile 6m 30s trunk passed
+1 💚 checkstyle 5m 09s trunk passed
+1 💚 mvnsite 7m 06s trunk passed
+1 💚 javadoc 6m 30s trunk passed
+1 💚 shadedclient 154m 24s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+1 💚 mvninstall 4m 59s the patch passed
+1 💚 compile 3m 42s the patch passed
+1 💚 javac 3m 42s the patch passed
+1 💚 blanks 0m 00s The patch has no blanks issues.
+1 💚 checkstyle 2m 34s the patch passed
+1 💚 mvnsite 4m 43s the patch passed
+1 💚 javadoc 3m 47s the patch passed
+1 💚 shadedclient 167m 33s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 asflicense 5m 52s The patch does not generate ASF License warnings.
444m 11s
Subsystem Report/Notes
GITHUB PR #6536
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname MINGW64_NT-10.0-17763 1b55e27aca49 3.4.10-87d57229.x86_64 2024-02-14 20:17 UTC x86_64 Msys
Build tool maven
Personality /c/hadoop/dev-support/bin/hadoop.sh
git revision trunk / f775e5a
Default Java Azul Systems, Inc.-1.8.0_332-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6536/1/testReport/
modules C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6536/1/console
versions git=2.44.0.windows.1
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants