[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4#35871
[SPARK-38563][PYTHON] Upgrade to Py4J 0.10.9.4#35871HyukjinKwon wants to merge 1 commit intoapache:masterfrom
Conversation
|
cc @WeichenXu123 FYI |
|
Py4J 0.10.9.4 has a resource leak issue? |
|
Oops, I meant 0.10.9.3. Just fixed. |
|
Thanks! |
### What changes were proposed in this pull request? This PR upgrade Py4J 0.10.9.4, with relevant documentation changes. ### Why are the changes needed? Py4J 0.10.9.3 has a resource leak issue when pinned thread mode is enabled - it's enabled by default in PySpark at 41af409. We worked around this by enforcing users to use `InheritableThread` or `inhteritable_thread_target` as a workaround. After upgrading, we don't need to enforce users anymore because it automatically cleans up, see also py4j/py4j#471 ### Does this PR introduce _any_ user-facing change? Yes, users don't have to use `InheritableThread` or `inhteritable_thread_target` to avoid resource leaking problem anymore. ### How was this patch tested? CI in this PR should test it out. Closes #35871 from HyukjinKwon/SPARK-38563. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 8193b40) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
|
Merged to master, branch-3.3 and branch-3.2. |
This PR upgrade Py4J 0.10.9.4, with relevant documentation changes. Py4J 0.10.9.3 has a resource leak issue when pinned thread mode is enabled - it's enabled by default in PySpark at 41af409. We worked around this by enforcing users to use `InheritableThread` or `inhteritable_thread_target` as a workaround. After upgrading, we don't need to enforce users anymore because it automatically cleans up, see also py4j/py4j#471 Yes, users don't have to use `InheritableThread` or `inhteritable_thread_target` to avoid resource leaking problem anymore. CI in this PR should test it out. Closes #35871 from HyukjinKwon/SPARK-38563. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 8193b40) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
dongjoon-hyun
left a comment
There was a problem hiding this comment.
+1, LGTM. Thank you, @HyukjinKwon and @wangyum .
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Oh, @HyukjinKwon .
Is Py4J 0.10.9.4 tested with Python 3.10.x? This commit seems to break Python 3.10.x environment like the following.
apache-spark-master-python/python/lib/py4j-0.10.9.4-src.zip/py4j/clientserver.py", line 12, in <module>
06:58:13 ImportError: cannot import name 'Callable' from 'collections' (/Users/dongjoon/.pyenv/versions/3.10.1/lib/python3.10/collections/__init__.py)
|
It seems that Py4J has no test coverage for Python 3.10. |
|
Sorry but let me create reverting PRs for this.
|
|
For Apache Spark 3.2, I'm fine because we didn't support Python 3.10 before. |
|
Yeah I'm making another fix and release now. We can revert it for now too. Thanks for checking this! |
|
I made a fix, and fixing CI now (py4j/py4j#477). I will make another release, and upgrade it back soon. |
|
Thank you! |
### What changes were proposed in this pull request? This PR is a retry of #35871 with bumping up the version to 0.10.9.5. It was reverted because of Python 3.10 is broken, and Python 3.10 was not officially supported in Py4J. In Py4J 0.10.9.5, the issue was fixed (py4j/py4j#475), and it added Python 3.10 support officially with CI set up (py4j/py4j#477). ### Why are the changes needed? See #35871 ### Does this PR introduce _any_ user-facing change? See #35871 ### How was this patch tested? Py4J sets up Python 3.10 CI now, and I manually tested PySpark with Python 3.10 with this patch: ```bash ./bin/pyspark ``` ``` import py4j py4j.__version__ spark.range(10).show() ``` ``` Using Python version 3.10.0 (default, Mar 3 2022 03:57:21) Spark context Web UI available at http://172.30.5.50:4040 Spark context available as 'sc' (master = local[*], app id = local-1647571387534). SparkSession available as 'spark'. >>> import py4j >>> py4j.__version__ '0.10.9.5' >>> spark.range(10).show() +---+ | id| +---+ ... ``` Closes #35907 from HyukjinKwon/SPARK-38563-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
### What changes were proposed in this pull request? This PR is a retry of #35871 with bumping up the version to 0.10.9.5. It was reverted because of Python 3.10 is broken, and Python 3.10 was not officially supported in Py4J. In Py4J 0.10.9.5, the issue was fixed (py4j/py4j#475), and it added Python 3.10 support officially with CI set up (py4j/py4j#477). ### Why are the changes needed? See #35871 ### Does this PR introduce _any_ user-facing change? See #35871 ### How was this patch tested? Py4J sets up Python 3.10 CI now, and I manually tested PySpark with Python 3.10 with this patch: ```bash ./bin/pyspark ``` ``` import py4j py4j.__version__ spark.range(10).show() ``` ``` Using Python version 3.10.0 (default, Mar 3 2022 03:57:21) Spark context Web UI available at http://172.30.5.50:4040 Spark context available as 'sc' (master = local[*], app id = local-1647571387534). SparkSession available as 'spark'. >>> import py4j >>> py4j.__version__ '0.10.9.5' >>> spark.range(10).show() +---+ | id| +---+ ... ``` Closes #35907 from HyukjinKwon/SPARK-38563-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 97335ea) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This PR is a retry of #35871 with bumping up the version to 0.10.9.5. It was reverted because of Python 3.10 is broken, and Python 3.10 was not officially supported in Py4J. In Py4J 0.10.9.5, the issue was fixed (py4j/py4j#475), and it added Python 3.10 support officially with CI set up (py4j/py4j#477). See #35871 See #35871 Py4J sets up Python 3.10 CI now, and I manually tested PySpark with Python 3.10 with this patch: ```bash ./bin/pyspark ``` ``` import py4j py4j.__version__ spark.range(10).show() ``` ``` Using Python version 3.10.0 (default, Mar 3 2022 03:57:21) Spark context Web UI available at http://172.30.5.50:4040 Spark context available as 'sc' (master = local[*], app id = local-1647571387534). SparkSession available as 'spark'. >>> import py4j >>> py4j.__version__ '0.10.9.5' >>> spark.range(10).show() +---+ | id| +---+ ... ``` Closes #35907 from HyukjinKwon/SPARK-38563-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 97335ea) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This PR is a retry of apache#35871 with bumping up the version to 0.10.9.5. It was reverted because of Python 3.10 is broken, and Python 3.10 was not officially supported in Py4J. In Py4J 0.10.9.5, the issue was fixed (py4j/py4j#475), and it added Python 3.10 support officially with CI set up (py4j/py4j#477). See apache#35871 See apache#35871 Py4J sets up Python 3.10 CI now, and I manually tested PySpark with Python 3.10 with this patch: ```bash ./bin/pyspark ``` ``` import py4j py4j.__version__ spark.range(10).show() ``` ``` Using Python version 3.10.0 (default, Mar 3 2022 03:57:21) Spark context Web UI available at http://172.30.5.50:4040 Spark context available as 'sc' (master = local[*], app id = local-1647571387534). SparkSession available as 'spark'. >>> import py4j >>> py4j.__version__ '0.10.9.5' >>> spark.range(10).show() +---+ | id| +---+ ... ``` Closes apache#35907 from HyukjinKwon/SPARK-38563-followup. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 97335ea) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

What changes were proposed in this pull request?
This PR upgrade Py4J 0.10.9.4, with relevant documentation changes.
Why are the changes needed?
Py4J 0.10.9.3 has a resource leak issue when pinned thread mode is enabled - it's enabled by default in PySpark at 41af409.
We worked around this by enforcing users to use
InheritableThreadorinhteritable_thread_targetas a workaround.After upgrading, we don't need to enforce users anymore because it automatically cleans up, see also py4j/py4j#471
Does this PR introduce any user-facing change?
Yes, users don't have to use
InheritableThreadorinhteritable_thread_targetto avoid resource leaking problem anymore.How was this patch tested?
CI in this PR should test it out.