Skip to content

[SPARK-45990][SPARK-45987][PYTHON][CONNECT] Upgrade protobuf to 4.25.1 to support Python 3.11#43885

Closed
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-45990
Closed

[SPARK-45990][SPARK-45987][PYTHON][CONNECT] Upgrade protobuf to 4.25.1 to support Python 3.11#43885
dongjoon-hyun wants to merge 1 commit intoapache:masterfrom
dongjoon-hyun:SPARK-45990

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Nov 18, 2023

What changes were proposed in this pull request?

This PR aims to upgrade protobuf from 3.20.3 to 4.25.1 to fix PySpark failures on Python 3.11 environment for Apache Spark 4.0.0.

Why are the changes needed?

Currently, Python 3.11 Daily CI is failing.

v4.25.0 is the first release where protobuf starts to support Python 3.11 officially.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Pass the CIs.

I also verified this manually.

$ python/run-tests --modules=pyspark-connect --parallelism=1 --python-executables=python3.11
...
Finished test(python3.11): pyspark.sql.connect.window (6s)
Tests passed in 780 seconds

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

cc @grundprinzip and @HyukjinKwon

grpcio>=1.48,<1.57
grpcio-status>=1.48,<1.57
protobuf==3.20.3
protobuf==4.25.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm.. I remember we set this protobuf to 3.X because of some compat problem ... do you remember @grundprinzip ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From #38693

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Nov 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If then, we may need to drop Python 3.11 support officially from Spark Connect module.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had better match the server and client via 4.25.1 from Apache Spark 4.0.0. Otherwise, it's difficult to support Python 3.11 and Python 3.12 and more.

@dongjoon-hyun
Copy link
Member Author

WDYT about upgrade Java part too in this PR, @grundprinzip and @HyukjinKwon ?

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this change. I just vaguely remember that there was a compat problem so would defer to @grundprinzip though.

@HyukjinKwon
Copy link
Member

I'm fine with upgrading Java side too @hvanhovell but I don't have a very good insight about compat. I'll cc you Herman here too.

@HyukjinKwon
Copy link
Member

Oh btw should probably regen the python protobuf code (https://github.com/apache/spark/blob/master/dev/connect-gen-protos.sh)

@dongjoon-hyun
Copy link
Member Author

Thank you, @HyukjinKwon .

For now, apparently, PySpark protobuf upgrade seems to pass all PySpark connect CIs.

Let me tackle the issues step-by-step. I'll make a JAVA part very soon with the regenerated code.

I'm working in the context of the umbrella JIRA .

I hope to achieve all Python 3.8/3.9/3.10/3.11/3.12 for Apache Spark 4.0.0 for Spark Connect project in the community Daily CI level, @HyukjinKwon , @hvanhovell , @grundprinzip . If we need to abandon Python 3.11 and 3.12 for protobuf to be compatible with Spark 3.5.x or 3.4.0, we can officially revert this.

@dongjoon-hyun
Copy link
Member Author

Merged to master.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-45990 branch November 19, 2023 06:54
@dongjoon-hyun
Copy link
Member Author

@grundprinzip
Copy link
Contributor

Thanks for doing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants