Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Jun 9, 2025

What changes were proposed in this pull request?

Hadoop 3.4.2 Release Note.

Why are the changes needed?

Keep the Hadoop client up to date.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass GHA.

Was this patch authored or co-authored using generative AI tooling?

No.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced by HADOOP-19348

@github-actions github-actions bot added the CORE label Jun 9, 2025
@dongjoon-hyun
Copy link
Member

Thank you, @pan3793 .

@pan3793
Copy link
Member Author

pan3793 commented Jun 10, 2025

No surprise so far, waiting for Hadoop 3.4.2 next RC.

@pan3793 pan3793 changed the title Test Hadoop 3.4.2 [SPARK-51168][BUILD] Test Hadoop 3.4.2 Jun 17, 2025
@dongjoon-hyun
Copy link
Member

Is there any update, @pan3793 ?

@pan3793
Copy link
Member Author

pan3793 commented Jun 24, 2025

still waiting for the next RC

@pan3793 pan3793 force-pushed the hadoop-3.4.2 branch 3 times, most recently from e54e614 to c0d49b6 Compare August 18, 2025 06:06
@dongjoon-hyun
Copy link
Member

Thank you for testing RC3, @pan3793 . The result looks good, right? Maybe, could you re-trigger the flaky ones?

@pan3793
Copy link
Member Author

pan3793 commented Aug 28, 2025

@dongjoon-hyun RC3 is almost a repackage of RC2 without code changes. The current test results are good, the K8s IT fails consistently, likely due to being unable to download the package from the maven staging repo (I haven't investigated the root cause)

@dongjoon-hyun
Copy link
Member

Ya, I agree with you. I casted +1 for Apache Hadoop 3.4.2 RC3 after testing with Apache ORC 2.3.0-SNAPSHOT, too.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot removed the CORE label Aug 28, 2025
@pan3793 pan3793 changed the title [SPARK-51168][BUILD] Test Hadoop 3.4.2 [SPARK-51168][BUILD] Upgrade to Hadoop 3.4.2 Aug 28, 2025
aliyun-java-sdk-kms/2.11.0//aliyun-java-sdk-kms-2.11.0.jar
aliyun-java-sdk-ram/3.1.0//aliyun-java-sdk-ram-3.1.0.jar
aliyun-sdk-oss/3.13.2//aliyun-sdk-oss-3.13.2.jar
analyticsaccelerator-s3/1.2.1//analyticsaccelerator-s3-1.2.1.jar
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this won't be present at the official release artifacts, so we don't need to change the LICENSE/NOTICE files

Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? If we don't want to change LICENSE/NOTICE, we need to exclude this explicitly.

Copy link
Member Author

@pan3793 pan3793 Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is: the LICENSE/NOTICE should match the content of the artifact.

For the source release tarball, in addition to Spark code itself, the LICENSE/NOTICE only reflects the source code we included from outside of the Spark project.

For the binary release tarball, the LICENSE-binary/NOTICE-binary should only reflect the included Spark and third-party libs, so LICENSE/NOTICE of testing deps and other optional libs is not necessary.

I'm not an expert in this area, the above is what I learned from several incubating projects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ dev/make-distribution.sh -Pyarn -Pkubernetes -Phadoop-3 -Phive -Phive-thriftserver
...
$ ls dist/jars | grep analyticsaccelerator
<no ouptut>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I got your meaning. So, you mean we found a bug of dev/test-dependencies.sh, right? Could you file a bug JIRA issue for this independently?

Copy link
Member Author

@pan3793 pan3793 Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dev/test-dependencies.sh also gathers deps from optional modules, for example, the official release tarball does not include the hadoop-cloud module, thus it won't pull those transitive deps.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it~ It makes sense.

@pan3793 pan3793 marked this pull request as ready for review August 28, 2025 17:15
@pan3793
Copy link
Member Author

pan3793 commented Aug 28, 2025

@dongjoon-hyun thanks for the information, updated.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (Pending CIs).

@dongjoon-hyun
Copy link
Member

Document is published too.

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.1.0. Thank you so much, @pan3793 .

dongjoon-hyun added a commit that referenced this pull request Oct 14, 2025
…Scala 2.13.17

### What changes were proposed in this pull request?

This PR aims to regenerate benchmark results after upgrading to Scala 2.13.17.

### Why are the changes needed?

Since last update, we change important libraries, not only Scala, but also Hadoop, ORC, ZSTD libraries. This PR aims to make the benchmark result up-to-date as a way to detect any performance regression.

- #52509
- #51127
- #52478
- #52591

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #52600 from dongjoon-hyun/SPARK-53893.

Lead-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: dongjoon-hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…Scala 2.13.17

### What changes were proposed in this pull request?

This PR aims to regenerate benchmark results after upgrading to Scala 2.13.17.

### Why are the changes needed?

Since last update, we change important libraries, not only Scala, but also Hadoop, ORC, ZSTD libraries. This PR aims to make the benchmark result up-to-date as a way to detect any performance regression.

- apache#52509
- apache#51127
- apache#52478
- apache#52591

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52600 from dongjoon-hyun/SPARK-53893.

Lead-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: dongjoon-hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants