-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-37244][PYTHON] Build and run tests on Python 3.10 #34526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #145011 has finished for PR 34526 at commit
|
|
Could you review this PR, @HyukjinKwon ? |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Looks good. @xinrong-databricks mind double checking this please since you're investigating Python 10 support. |
|
Thank you, @HyukjinKwon . |
|
No .. for some reasons, the memory setting doesn't work on Mac. I think previously the tests worked because it returned the fake number of memory limit. I think we should run the tests only on Linux. BTW the limitation was documented at #23664 |
|
Thank you for the confirmation. I'll make a PR to ignore it on Mac. |
|
Thank you @dongjoon-hyun ! LGTM |
|
Thank you for your reviews and testing, @xinrong-databricks . I'll update the PR description about the missing PyArrow tests. |
|
Thank you, @HyukjinKwon and @xinrong-databricks . |
|
Sure, LGTM2 |
### What changes were proposed in this pull request? This PR is a follow-up of #34526 to adjust one `pyspark.rdd` doctest additionally. ```python - >>> b''.join(result).decode('utf-8') + >>> ''.join([r.decode('utf-8') if isinstance(r, bytes) else r for r in result]) ``` ### Why are the changes needed? **Python 3.8/3.9** ```python Using Python version 3.8.12 (default, Nov 8 2021 17:15:19) Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1636432954207). SparkSession available as 'spark'. >>> from tempfile import NamedTemporaryFile >>> tempFile3 = NamedTemporaryFile(delete=True) >>> tempFile3.close() >>> codec = "org.apache.hadoop.io.compress.GzipCodec" >>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec) >>> from fileinput import input, hook_compressed >>> from glob import glob >>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed)) >>> result [b'bar\n', b'foo\n'] ``` **Python 3.10** ```python Using Python version 3.10.0 (default, Oct 29 2021 14:35:18) Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1636433378727). SparkSession available as 'spark'. >>> from tempfile import NamedTemporaryFile >>> tempFile3 = NamedTemporaryFile(delete=True) >>> tempFile3.close() >>> codec = "org.apache.hadoop.io.compress.GzipCodec" >>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec) >>> from fileinput import input, hook_compressed >>> from glob import glob >>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed)) >>> result ['bar\n', 'foo\n'] ``` ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ``` $ python/run-tests --testnames pyspark.rdd ``` Closes #34529 from dongjoon-hyun/SPARK-37244-2. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
### What changes were proposed in this pull request? This PR fixes `setup.py` to note that PySpark works with Python 3.10. ### Why are the changes needed? To officially support Python 3.10. ### Does this PR introduce _any_ user-facing change? Yes, it officially supports Python 3.10. ### How was this patch tested? It has been tested in #34526. Arrow related features are technically optional. Closes #34533 from HyukjinKwon/SPARK-37257. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
This PR aims to support building and running tests on Python 3.10. Python 3.10 added many new features and breaking changes. - https://docs.python.org/3/whatsnew/3.10.html This PR is a follow-up of apache#34526 to adjust one `pyspark.rdd` doctest additionally. ```python - >>> b''.join(result).decode('utf-8') + >>> ''.join([r.decode('utf-8') if isinstance(r, bytes) else r for r in result]) ``` **Python 3.8/3.9** ```python Using Python version 3.8.12 (default, Nov 8 2021 17:15:19) Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1636432954207). SparkSession available as 'spark'. >>> from tempfile import NamedTemporaryFile >>> tempFile3 = NamedTemporaryFile(delete=True) >>> tempFile3.close() >>> codec = "org.apache.hadoop.io.compress.GzipCodec" >>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec) >>> from fileinput import input, hook_compressed >>> from glob import glob >>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed)) >>> result [b'bar\n', b'foo\n'] ``` **Python 3.10** ```python Using Python version 3.10.0 (default, Oct 29 2021 14:35:18) Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1636433378727). SparkSession available as 'spark'. >>> from tempfile import NamedTemporaryFile >>> tempFile3 = NamedTemporaryFile(delete=True) >>> tempFile3.close() >>> codec = "org.apache.hadoop.io.compress.GzipCodec" >>> sc.parallelize(['foo', 'bar']).saveAsTextFile(tempFile3.name, codec) >>> from fileinput import input, hook_compressed >>> from glob import glob >>> result = sorted(input(glob(tempFile3.name + "/part*.gz"), openhook=hook_compressed)) >>> result ['bar\n', 'foo\n'] ``` No. ``` $ python/run-tests --testnames pyspark.rdd ``` Closes apache#34529 from dongjoon-hyun/SPARK-37244-2. Authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]> (cherry picked from commit 47ceae4) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This PR aims to support building and running tests on Python 3.10.
Python 3.10 added many new features and breaking changes.
For example, the following blocks building and testing PySpark on Python 3.10.
PYTHON 3.9.7
PYTHON 3.10.0
Why are the changes needed?
BEFORE
AFTER
PyArrowrelated-tests which is beyond of this PR.Does this PR introduce any user-facing change?
Yes, this will add the official support of Python 3.10.
How was this patch tested?
Pass the CIs and manually run Python tests on Python 3.10.