Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Nov 23, 2025

What changes were proposed in this pull request?

This PR aims to remove a broken and misleading build_python_connect35.yml GitHub Action job.

Note that this doesn't mean that they are incompatible. This is literally the GitHub Action infra issue.

Why are the changes needed?

  1. build_python_connect35 has been broken over 4 months since 2025-07-23.

    Screenshot 2025-11-22 at 20 03 46
  2. The root cause is PySpark 4.1 and PySpark 3.5 requirements are different. Especially, pyarrow, grpcio, and googleapis-common-protos.

    • PySpark 3.5.7

    ========================== ========================= ======================================================================================
    Package Supported version Note
    ========================== ========================= ======================================================================================
    `py4j` >=0.10.9.7 Required
    `pandas` >=1.0.5 Required for pandas API on Spark and Spark Connect; Optional for Spark SQL
    `pyarrow` >=4.0.0,<13.0.0 Required for pandas API on Spark and Spark Connect; Optional for Spark SQL
    `numpy` >=1.15 Required for pandas API on Spark and MLLib DataFrame-based API; Optional for Spark SQL
    `grpcio` >=1.48,<1.57 Required for Spark Connect
    `grpcio-status` >=1.48,<1.57 Required for Spark Connect
    `googleapis-common-protos` ==1.56.4 Required for Spark Connect
    ========================== ========================= ======================================================================================

    • PySpark 4.1.0-preview4

    ========================== ================= ==========================
    Package Supported version Note
    ========================== ================= ==========================
    `pandas` >=2.2.0 Required for Spark Connect
    `pyarrow` >=15.0.0 Required for Spark Connect
    `grpcio` >=1.76.0 Required for Spark Connect
    `grpcio-status` >=1.76.0 Required for Spark Connect
    `googleapis-common-protos` >=1.71.0 Required for Spark Connect
    `zstandard` >=0.25.0 Required for Spark Connect
    `graphviz` >=0.20 Optional for Spark Connect
    ========================== ================= ==========================

  3. Since there is no way to install both dependencies in a single Python, we had better remove this for now. We need an alternative way.

Does this PR introduce any user-facing change?

No behavior change because this is a removal of already broken CI. This CI is only invokable in apache/spark repository.

if: github.repository == 'apache/spark'

How was this patch tested?

Manual review.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

cc @HyukjinKwon , @zhengruifeng

@dongjoon-hyun
Copy link
Member Author

Let me merge this PR to remove misleading CIs from master branch until we found a correct way to do this for Apache Spark 4.2.0.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-54465 branch November 23, 2025 05:57
@HyukjinKwon
Copy link
Member

I think we should add a new job test compat between 4.0 and latest branch now. Let me take a quick look to add it.

@dongjoon-hyun
Copy link
Member Author

Thank you so much, @HyukjinKwon !

huangxiaopingRD pushed a commit to huangxiaopingRD/spark that referenced this pull request Nov 25, 2025
…n job

### What changes were proposed in this pull request?

This PR aims to remove a broken and misleading `build_python_connect35.yml` GitHub Action job.

Note that this doesn't mean that they are incompatible. This is literally the GitHub Action infra issue.

### Why are the changes needed?

1. `build_python_connect35` has been broken over 4 months since 2025-07-23.

    - https://github.com/apache/spark/actions/workflows/build_python_connect35.yml

    <img width="861" height="97" alt="Screenshot 2025-11-22 at 20 03 46" src="https://github.com/user-attachments/assets/aa1c6f5a-8edc-4c96-9189-ac736163a49c" />

2. The root cause is PySpark 4.1 and PySpark 3.5 requirements are different. Especially, `pyarrow`, `grpcio`, and `googleapis-common-protos`.

    - **PySpark 3.5.7**

    https://github.com/apache/spark/blob/ed00d046951a7ecda6429accd3b9c5b2dc792b65/python/docs/source/getting_started/install.rst?plain=1#L155-L165

    - **PySpark 4.1.0-preview4**

    https://github.com/apache/spark/blob/c125aea395b37ec1fa3b4e8b5a3e9bee270203c2/python/docs/source/getting_started/install.rst?plain=1#L225-L235

3. Since there is no way to install both dependencies in a single Python, we had better remove this for now. We need an alternative way.

### Does this PR introduce _any_ user-facing change?

No behavior change because this is a removal of already broken CI. This CI is only invokable in `apache/spark` repository.

https://github.com/apache/spark/blob/7f5478c7d389d91be19a97e661824f4b6a09cafc/.github/workflows/build_python_connect35.yml#L33

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#53178 from dongjoon-hyun/SPARK-54465.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants