Skip to content

Conversation

@JiaLiangC
Copy link
Contributor

Description of PR

https://issues.apache.org/jira/browse/HDFS-17287

Here's the translation of the Hadoop PR description into English:

Hadoop Parallel Compilation Submission Logic

  1. Reasons for Parallel Compilation Failure

    • In sequential compilation, as modules are compiled one by one in order, there are no errors because the compilation follows the module sequence.
    • However, in parallel compilation, all modules are compiled simultaneously. The compilation order during multi-module concurrent compilation depends on the inter-module dependencies. If Module A depends on Module B, then Module B will be compiled before Module A. This ensures that the compilation order follows the dependencies between modules.

    But when Hadoop compiles in parallel, for example, compiling hadoop-yarn-project, the dependencies between modules are correct. The issue arises during the dist package stage. dist packages all other compiled modules.

    Behavior of hadoop-yarn-project in Serial Compilation:

    • In serial compilation, it compiles modules in the pom one by one in sequence. After all modules are compiled, it compiles hadoop-yarn-project. During the prepare-package stage, the maven-assembly-plugin plugin is executed for packaging. All packages are repackaged according to the description in hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml.

    Behavior of hadoop-yarn-project in Parallel Compilation:

    • Parallel compilation compiles modules according to the dependency order among them. If modules do not declare dependencies on each other through dependency, they are compiled in parallel. According to the dependency definition in the pom of hadoop-yarn-project, the dependencies are compiled first, followed by hadoop-yarn-project, executing its maven-assembly-plugin.
    • However, the files needed for packaging in hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml are not all included in the dependency of hadoop-yarn-project. Therefore, when compiling hadoop-yarn-project and executing maven-assembly-plugin, not all required modules are built yet, leading to errors in parallel compilation.

    Solution:

    • The solution is relatively straightforward: organize all modules from hadoop-assemblies/src/main/resources/assemblies/hadoop-yarn-dist.xml, and then declare them as dependencies in the pom of hadoop-yarn-project.

How was this patch tested?

manual test on centos8
image

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 30s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+0 🆗 mvndep 13m 54s Maven dependency ordering for branch
+1 💚 mvninstall 33m 57s trunk passed
+1 💚 compile 16m 56s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 16m 41s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 mvnsite 5m 4s trunk passed
+1 💚 javadoc 4m 59s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 4m 17s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 shadedclient 132m 4s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 30s Maven dependency ordering for patch
+1 💚 mvninstall 3m 59s the patch passed
+1 💚 compile 17m 50s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 17m 50s the patch passed
+1 💚 compile 17m 30s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 17m 30s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 mvnsite 4m 50s the patch passed
+1 💚 javadoc 5m 3s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 4m 20s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 shadedclient 53m 31s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 239m 50s hadoop-yarn-project in the patch passed.
+1 💚 unit 161m 34s hadoop-mapreduce-project in the patch passed.
+1 💚 asflicense 1m 14s The patch does not generate ASF License warnings.
633m 7s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6373/1/artifact/out/Dockerfile
GITHUB PR #6373
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint
uname Linux 603f9313796a 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 9410696
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6373/1/testReport/
Max. process+thread count 2776 (vs. ulimit of 5500)
modules C: hadoop-yarn-project hadoop-mapreduce-project U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6373/1/console
versions git=2.25.1 maven=3.6.3
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@Hexiaoqiao Hexiaoqiao changed the title HDFS-17287: Parallel Maven Build Support for Apache Hadoop HADOOP-19019: Parallel Maven Build Support for Apache Hadoop Dec 29, 2023
@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 32s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 40s Maven dependency ordering for branch
+1 💚 mvninstall 35m 22s trunk passed
+1 💚 compile 16m 58s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 15m 4s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 mvnsite 4m 48s trunk passed
+1 💚 javadoc 4m 57s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 4m 23s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 shadedclient 133m 44s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 31s Maven dependency ordering for patch
+1 💚 mvninstall 3m 43s the patch passed
+1 💚 compile 17m 23s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 17m 23s the patch passed
+1 💚 compile 17m 6s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 17m 6s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 mvnsite 5m 12s the patch passed
+1 💚 javadoc 5m 2s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 4m 21s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 shadedclient 54m 22s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 238m 49s /patch-unit-hadoop-yarn-project.txt hadoop-yarn-project in the patch passed.
+1 💚 unit 161m 3s hadoop-mapreduce-project in the patch passed.
+1 💚 asflicense 1m 14s The patch does not generate ASF License warnings.
632m 56s
Reason Tests
Failed junit tests hadoop.yarn.server.applicationhistoryservice.webapp.TestAHSWebServices
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6373/2/artifact/out/Dockerfile
GITHUB PR #6373
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint
uname Linux d64615eda07c 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / 9410696
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6373/2/testReport/
Max. process+thread count 2703 (vs. ulimit of 5500)
modules C: hadoop-yarn-project hadoop-mapreduce-project U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6373/2/console
versions git=2.25.1 maven=3.6.3
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

@JiaLiangC
Copy link
Contributor Author

@Hexiaoqiao Could you help review this pr?

@Hexiaoqiao
Copy link
Contributor

@JiaLiangC Thanks for your work and involve me here. It is very interesting improvement. I want to know if any time cost save when change to parallel build. Another side, beside hadoop-yarn module, any other modules need to set dependency explicitly? Thanks again.

@JiaLiangC
Copy link
Contributor Author

JiaLiangC commented Jan 4, 2024

@Hexiaoqiao
Test environment: CentOS 8 x86_64, 16GB RAM, SSD.
Tested on Hadoop 3.3.6.
The initial serial compilation took almost 3 hours due to slow dependency downloads. With parallel compilation (-2C), the initial compilation took about 1 hour, approximately 2 times faster.
For subsequent compilations, with dependencies already downloaded locally, the overall parallel compilation time for Hadoop was 13 minutes, while serial compilation took 37 minutes.
image

@Hexiaoqiao
Copy link
Contributor

Great! Thanks @JiaLiangC , Let's wait if anymore folks would like to give another review here.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 34s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+0 🆗 xmllint 0m 0s xmllint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+0 🆗 mvndep 14m 50s Maven dependency ordering for branch
+1 💚 mvninstall 35m 22s trunk passed
+1 💚 compile 18m 15s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 compile 16m 39s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 mvnsite 4m 48s trunk passed
+1 💚 javadoc 4m 49s trunk passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 4m 20s trunk passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 shadedclient 135m 36s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 31s Maven dependency ordering for patch
+1 💚 mvninstall 4m 7s the patch passed
+1 💚 compile 17m 52s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javac 17m 52s the patch passed
+1 💚 compile 16m 9s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 javac 16m 9s the patch passed
+1 💚 blanks 0m 0s The patch has no blanks issues.
+1 💚 mvnsite 4m 42s the patch passed
+1 💚 javadoc 4m 51s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 4m 27s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~20.04-b08
+1 💚 shadedclient 48m 41s patch has no errors when building and testing our client artifacts.
_ Other Tests _
+1 💚 unit 241m 44s hadoop-yarn-project in the patch passed.
+1 💚 unit 162m 20s hadoop-mapreduce-project in the patch passed.
+1 💚 asflicense 1m 14s The patch does not generate ASF License warnings.
633m 12s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6373/3/artifact/out/Dockerfile
GITHUB PR #6373
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient codespell detsecrets xmllint
uname Linux 08a9c77cafe4 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / b9656c7
Default Java Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~20.04-b08
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6373/3/testReport/
Max. process+thread count 2702 (vs. ulimit of 5500)
modules C: hadoop-yarn-project hadoop-mapreduce-project U: .
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6373/3/console
versions git=2.25.1 maven=3.6.3
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from me.

Your compile time is still pretty slow; does this include the test runs, or is it the hadoop client and javadocs taking the time?

@steveloughran
Copy link
Contributor

Who is going to merge this? @Hexiaoqiao?

@Hexiaoqiao
Copy link
Contributor

If no more other concerns, I will check this PR into trunk for a short while. @steveloughran

Copy link
Contributor

@Hexiaoqiao Hexiaoqiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. +1.

@Hexiaoqiao Hexiaoqiao merged commit b2fac14 into apache:trunk Jan 23, 2024
@Hexiaoqiao
Copy link
Contributor

Committed to trunk. Thanks @JiaLiangC and @steveloughran .

jiajunmao pushed a commit to jiajunmao/hadoop-MLEC that referenced this pull request Feb 6, 2024
…6373). Contributed by JiaLiangC.

Signed-off-by: Steve Loughran <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
ayushtkn pushed a commit to ayushtkn/hadoop that referenced this pull request Sep 2, 2024
…6373). Contributed by JiaLiangC.

Signed-off-by: Steve Loughran <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
ayushtkn pushed a commit to ayushtkn/hadoop that referenced this pull request Sep 4, 2024
…6373). Contributed by JiaLiangC.

Signed-off-by: Steve Loughran <[email protected]>
Signed-off-by: He Xiaoqiao <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants