Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Nov 4, 2025

Second attempt of #52874, fixed the grpcio deps version for Python 3.14, which was missed in the previous attempt.

What changes were proposed in this pull request?

Bump gRPC from 1.67 to 1.76, with additional Python package upgrades for consistency:

  • googleapis-common-protos==1.71.0
  • protobuf==6.33.0

And buf v33.0

Fix the shading leaks of the spark-connect jar

before

$ jar tf spark-connect_2.13-4.1.0-preview3.jar | grep '.class$' | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF'
javax/annotation/Generated.class
...
javax/ejb/EJB.class
...
javax/persistence/PersistenceContext.class
...
javax/xml/ws/WebServiceRef.class
...
com/google/shopping/type/Price$Builder.class
...
com/google/apps/card/v1/Widget$DataCase.class
...

after

$ jar tf spark-connect_2.13-4.2.0-SNAPSHOT.jar | grep '.class$' | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF'
<no-output>

Why are the changes needed?

For Python:

For Java:

Check full release notes at: https://github.com/grpc/grpc/releases

Does this PR introduce any user-facing change?

Maybe, reduce the potential conflict risks between Spark and user classes.

How was this patch tested?

Pass GHA, plus manual checks (see above sections).

Was this patch authored or co-authored using generative AI tooling?

No.

### What changes were proposed in this pull request?

Bump gRPC from 1.67 to 1.76, with additional Python package upgrades for consistency:

- `googleapis-common-protos==1.71.0`
- `protobuf==6.33.0`

And `buf v33.0`

Fix the shading leaks of the `spark-connect` jar

before
```
$ jar tf spark-connect_2.13-4.1.0-preview3.jar | grep '.class$' | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF'
javax/annotation/Generated.class
...
javax/ejb/EJB.class
...
javax/persistence/PersistenceContext.class
...
javax/xml/ws/WebServiceRef.class
...
com/google/shopping/type/Price$Builder.class
...
com/google/apps/card/v1/Widget$DataCase.class
...
```

after

```
$ jar tf spark-connect_2.13-4.2.0-SNAPSHOT.jar | grep '.class$' | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF'
<no-output>
```

### Why are the changes needed?

For Python:

- [grpcio v1.75.1](https://github.com/grpc/grpc/releases/tag/v1.75.1) addes official Python 3.14 support
- googleapis-common-proto v1.71.0 addes official Python 3.14 support, see googleapis/google-cloud-python#14699

For Java:

- v1.74 removes dependency on Tomcat's annotation API, see grpc/grpc-java#9179

Check full release notes at: https://github.com/grpc/grpc/releases

### Does this PR introduce _any_ user-facing change?

Maybe, reduce the potential conflict risks between Spark and user classes.

### How was this patch tested?

Pass GHA, plus manual checks (see above sections).

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52874 from pan3793/SPARK-54177.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@pan3793
Copy link
Member Author

pan3793 commented Nov 4, 2025

cc @dongjoon-hyun. the diff from previous attempt is 6fcab50

@pan3793
Copy link
Member Author

pan3793 commented Nov 4, 2025

I confirm the issue has gone by building the image locally.

$ docker build -f dev/spark-test-image/python-314/Dockerfile .
[+] Building 554.5s (11/11) FINISHED                                                                                                                                                                  docker:orbstack
 => [internal] load build definition from Dockerfile                                                                                                                                                             0.0s
 => => transferring dockerfile: 3.00kB                                                                                                                                                                           0.0s
 => [internal] load metadata for docker.io/library/ubuntu:jammy-20240911.1                                                                                                                                       4.2s
 => [internal] load .dockerignore                                                                                                                                                                                0.0s
 => => transferring context: 2B                                                                                                                                                                                  0.0s
 => [1/7] FROM docker.io/library/ubuntu:jammy-20240911.1@sha256:0e5e4a57c2499249aafc3b40fcd541e9a456aab7296681a3994d631587203f97                                                                                 3.9s
 => => resolve docker.io/library/ubuntu:jammy-20240911.1@sha256:0e5e4a57c2499249aafc3b40fcd541e9a456aab7296681a3994d631587203f97                                                                                 0.0s
 => => sha256:0e5e4a57c2499249aafc3b40fcd541e9a456aab7296681a3994d631587203f97 6.69kB / 6.69kB                                                                                                                   0.0s
 => => sha256:7c75ab2b0567edbb9d4834a2c51e462ebd709740d1f2c40bcd23c56e974fe2a8 424B / 424B                                                                                                                       0.0s
 => => sha256:981912c48e9a89e903c89b228be977e23eeba83d42e2c8e0593a781a2b251cba 2.31kB / 2.31kB                                                                                                                   0.0s
 => => sha256:a186900671ab62e1dea364788f4e84c156e1825939914cfb5a6770be2b58b4da 27.36MB / 27.36MB                                                                                                                 3.2s
 => => extracting sha256:a186900671ab62e1dea364788f4e84c156e1825939914cfb5a6770be2b58b4da                                                                                                                        0.6s
 => [2/7] RUN apt-get update && apt-get install -y     build-essential     ca-certificates     curl     gfortran     git     gnupg     libcurl4-openssl-dev     libfontconfig1-dev     libfreetype6-dev     l  256.7s
 => [3/7] RUN add-apt-repository ppa:deadsnakes/ppa                                                                                                                                                             10.0s
 => [4/7] RUN apt-get update && apt-get install -y     python3.14     && apt-get autoremove --purge -y     && apt-get clean     && rm -rf /var/lib/apt/lists/*                                                  28.0s
 => [5/7] RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.14                                                                                                                                         5.2s
 => [6/7] RUN python3.14 -m pip install --ignore-installed blinker>=1.6.2 # mlflow needs this                                                                                                                    1.4s
 => [7/7] RUN python3.14 -m pip install numpy pyarrow>=22.0.0 six==1.16.0 pandas==2.3.3 scipy plotly<6.0.0 coverage matplotlib openpyxl memory-profiler>=0.61.0 scikit-learn>=1.3.2 unittest-xml-reporting gr  240.2s
 => exporting to image                                                                                                                                                                                           4.7s
 => => exporting layers                                                                                                                                                                                          4.6s
 => => writing image sha256:c1e8723daec426d2799cf829a747cb5e08087d8b3326ed8edb8fa639628275a1

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Merged to master only first. Thank you, @pan3793 .

@dongjoon-hyun
Copy link
Member

I'll monitor the master branch CI.

@pan3793
Copy link
Member Author

pan3793 commented Nov 5, 2025

@dongjoon-hyun, thank you for taking care of this, and once you think it's stable, I can create the backport PR to branch-4.1

dongjoon-hyun pushed a commit that referenced this pull request Nov 5, 2025
Second attempt of #52874, fixed the grpcio deps version for Python 3.14, which was missed in the previous attempt.

### What changes were proposed in this pull request?

Bump gRPC from 1.67 to 1.76, with additional Python package upgrades for consistency:

- `googleapis-common-protos==1.71.0`
- `protobuf==6.33.0`

And `buf v33.0`

Fix the shading leaks of the `spark-connect` jar

before
```
$ jar tf spark-connect_2.13-4.1.0-preview3.jar | grep '.class$' | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF'
javax/annotation/Generated.class
...
javax/ejb/EJB.class
...
javax/persistence/PersistenceContext.class
...
javax/xml/ws/WebServiceRef.class
...
com/google/shopping/type/Price$Builder.class
...
com/google/apps/card/v1/Widget$DataCase.class
...
```

after

```
$ jar tf spark-connect_2.13-4.2.0-SNAPSHOT.jar | grep '.class$' | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF'
<no-output>
```

### Why are the changes needed?

For Python:

- [grpcio v1.75.1](https://github.com/grpc/grpc/releases/tag/v1.75.1) addes official Python 3.14 support
- googleapis-common-proto v1.71.0 addes official Python 3.14 support, see googleapis/google-cloud-python#14699

For Java:

- v1.74 removes dependency on Tomcat's annotation API, see grpc/grpc-java#9179

Check full release notes at: https://github.com/grpc/grpc/releases

### Does this PR introduce _any_ user-facing change?

Maybe, reduce the potential conflict risks between Spark and user classes.

### How was this patch tested?

Pass GHA, plus manual checks (see above sections).

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #52879 from pan3793/SPARK-54177-2.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit 83e49b7)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Thank you. I cherry-picked it directly. :)

huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
Second attempt of apache#52874, fixed the grpcio deps version for Python 3.14, which was missed in the previous attempt.

### What changes were proposed in this pull request?

Bump gRPC from 1.67 to 1.76, with additional Python package upgrades for consistency:

- `googleapis-common-protos==1.71.0`
- `protobuf==6.33.0`

And `buf v33.0`

Fix the shading leaks of the `spark-connect` jar

before
```
$ jar tf spark-connect_2.13-4.1.0-preview3.jar | grep '.class$' | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF'
javax/annotation/Generated.class
...
javax/ejb/EJB.class
...
javax/persistence/PersistenceContext.class
...
javax/xml/ws/WebServiceRef.class
...
com/google/shopping/type/Price$Builder.class
...
com/google/apps/card/v1/Widget$DataCase.class
...
```

after

```
$ jar tf spark-connect_2.13-4.2.0-SNAPSHOT.jar | grep '.class$' | grep -v 'org/apache/spark' | grep -v 'org/sparkproject' | grep -v 'META-INF'
<no-output>
```

### Why are the changes needed?

For Python:

- [grpcio v1.75.1](https://github.com/grpc/grpc/releases/tag/v1.75.1) addes official Python 3.14 support
- googleapis-common-proto v1.71.0 addes official Python 3.14 support, see googleapis/google-cloud-python#14699

For Java:

- v1.74 removes dependency on Tomcat's annotation API, see grpc/grpc-java#9179

Check full release notes at: https://github.com/grpc/grpc/releases

### Does this PR introduce _any_ user-facing change?

Maybe, reduce the potential conflict risks between Spark and user classes.

### How was this patch tested?

Pass GHA, plus manual checks (see above sections).

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#52879 from pan3793/SPARK-54177-2.

Authored-by: Cheng Pan <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants