Skip to content

Conversation

@HyukjinKwon
Copy link
Member

What changes were proposed in this pull request?

This PR proposes to drop Python 3.8 in PySpark.

Why are the changes needed?

Python 3.8 is EOL in this October (https://devguide.python.org/versions/). Assuming from the release schedule for Spark 4, I think it's better to drop it now before we do it after branchcut, etc.

This also fixes the broken scheduled build (https://github.com/apache/spark/actions/runs/8818780802)

Does this PR introduce any user-facing change?

Yes, it drops Python 3.8 support.

How was this patch tested?

The changes will be tested through CI.

Was this patch authored or co-authored using generative AI tooling?

No.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for this decision.

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 25, 2024

BTW, I have two questions.

  1. Do we need a vote for this because it's still alive, @HyukjinKwon ?
  2. I'm not sure ModuleNotFoundError could be the reason of dropping a Python version. Do we have any other reason which bans Python 3.8?
ModuleNotFoundError: No module named 'zoneinfo'

@viirya
Copy link
Member

viirya commented Apr 25, 2024

+1 for dropping Python 3.8 support

@dongjoon-hyun
Copy link
Member

Could you take a look at the CI failure, @HyukjinKwon ?

@nchammas
Copy link
Contributor

zoneinfo is new to Python 3.9. It seems something in the build expects Python 3.9 or newer.

Looks like it's this line, which was added in #46122 a few days ago.

@HyukjinKwon HyukjinKwon marked this pull request as draft April 25, 2024 21:52
@HyukjinKwon
Copy link
Member Author

Will fix all up soon.

@HyukjinKwon HyukjinKwon marked this pull request as ready for review April 26, 2024 06:13
@HyukjinKwon
Copy link
Member Author

Merged to master.

LuciferYang pushed a commit that referenced this pull request May 13, 2024
…3.8 dropped

### What changes were proposed in this pull request?

This PR is a followup of #46228 that updates migration guide about Python 3.8 being dropped.

### Why are the changes needed?

To guide end users about the migration to Spark 4.0.

### Does this PR introduce _any_ user-facing change?

Yes, it fixes the documentation.

### How was this patch tested?

CI in this PR.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46545 from HyukjinKwon/SPARK-47993-followup.

Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
dongjoon-hyun added a commit that referenced this pull request Jun 4, 2024
### What changes were proposed in this pull request?

This PR aims to fix `Black` target version to `Python 3.9`.

### Why are the changes needed?

Since SPARK-47993 dropped Python 3.8 support officially at Apache Spark 4.0.0, we had better update target version to `Python 3.9`.

- #46228

`py39` is the version for `Python 3.9`.
```
$ black --help  | grep target
  -t, --target-version [py33|py34|py35|py36|py37|py38|py39|py310|py311|py312]
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with Python linter.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #46867 from dongjoon-hyun/SPARK-48531.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit that referenced this pull request Sep 11, 2024
…m `try_simplify_traceback`

### What changes were proposed in this pull request?

Apache Spark 4.0.0 supports only Python 3.9+.
- #46228

### Why are the changes needed?

To simplify and clarify the logic. I manually confirmed that this is the last logic about `sys.version_info` and `(3, 7)`.

```
$ git grep 'sys.version_info' | grep '(3, 7)'
python/pyspark/util.py:    if sys.version_info[:2] < (3, 7):
python/pyspark/util.py:    if "pypy" not in platform.python_implementation().lower() and sys.version_info[:2] >= (3, 7):
```

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48078 from dongjoon-hyun/SPARK-49600.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit that referenced this pull request Sep 13, 2024
…e `pypy3.9` links

### What changes were proposed in this pull request?

This PR aims to fix two Dockerfiles to create `pypy3.9` symlinks instead of `pypy3.8`.

https://github.com/apache/spark/blob/d2d293e3fb57d6c9dea084b5fe6707d67c715af3/dev/create-release/spark-rm/Dockerfile#L97

https://github.com/apache/spark/blob/d2d293e3fb57d6c9dea084b5fe6707d67c715af3/dev/infra/Dockerfile#L91

### Why are the changes needed?

Apache Spark 4.0 dropped `Python 3.8` support. We should make it sure that we don't use `pypy3.8` at all.
- #46228

### Does this PR introduce _any_ user-facing change?

No. This is a dev-only change.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48095 from dongjoon-hyun/SPARK-49620.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit that referenced this pull request Sep 13, 2024
…ndas_with_day_time_interval` in PyPy3.9

### What changes were proposed in this pull request?

This PR aims to re-enable `test_create_dataframe_from_pandas_with_day_time_interval` in PyPy3.9.

### Why are the changes needed?

This was disabled at PyPy3.8, but we dropped Python 3.8 support and the test passed with PyPy3.9.
- #46228

**BEFORE: Skipped with `Fails in PyPy Python 3.8, should enable.` message**
```
$ python/run-tests.py --python-executables pypy3 --testnames pyspark.sql.tests.test_creation
Running PySpark tests. Output is in /Users/dongjoon/APACHE/spark-merge/python/unit-tests.log
Will test against the following Python executables: ['pypy3']
Will test the following Python tests: ['pyspark.sql.tests.test_creation']
pypy3 python_implementation is PyPy
pypy3 version is: Python 3.9.19 (a2113ea87262, Apr 21 2024, 05:41:07)
[PyPy 7.3.16 with GCC Apple LLVM 15.0.0 (clang-1500.1.0.2.5)]
Starting test(pypy3): pyspark.sql.tests.test_creation (temp output: /Users/dongjoon/APACHE/spark-merge/python/target/58e26724-5c3e-4451-80f8-cabdb36f0901/pypy3__pyspark.sql.tests.test_creation__n448ay57.log)
Finished test(pypy3): pyspark.sql.tests.test_creation (6s) ... 3 tests were skipped
Tests passed in 6 seconds

Skipped tests in pyspark.sql.tests.test_creation with pypy3:
    test_create_dataframe_from_pandas_with_day_time_interval (pyspark.sql.tests.test_creation.DataFrameCreationTests) ... skipped 'Fails in PyPy Python 3.8, should enable.'
    test_create_dataframe_required_pandas_not_found (pyspark.sql.tests.test_creation.DataFrameCreationTests) ... skipped 'Required Pandas was found.'
    test_schema_inference_from_pandas_with_dict (pyspark.sql.tests.test_creation.DataFrameCreationTests) ... skipped '[PACKAGE_NOT_INSTALLED] PyArrow >= 10.0.0 must be installed; however, it was not found.'
```

**AFTER**
```
$ python/run-tests.py --python-executables pypy3 --testnames pyspark.sql.tests.test_creation
Running PySpark tests. Output is in /Users/dongjoon/APACHE/spark-merge/python/unit-tests.log
Will test against the following Python executables: ['pypy3']
Will test the following Python tests: ['pyspark.sql.tests.test_creation']
pypy3 python_implementation is PyPy
pypy3 version is: Python 3.9.19 (a2113ea87262, Apr 21 2024, 05:41:07)
[PyPy 7.3.16 with GCC Apple LLVM 15.0.0 (clang-1500.1.0.2.5)]
Starting test(pypy3): pyspark.sql.tests.test_creation (temp output: /Users/dongjoon/APACHE/spark-merge/python/target/1f0db01f-0beb-4ee2-817f-363eb2f2804d/pypy3__pyspark.sql.tests.test_creation__2w4gy9u1.log)
Finished test(pypy3): pyspark.sql.tests.test_creation (13s) ... 2 tests were skipped
Tests passed in 13 seconds

Skipped tests in pyspark.sql.tests.test_creation with pypy3:
    test_create_dataframe_required_pandas_not_found (pyspark.sql.tests.test_creation.DataFrameCreationTests) ... skipped 'Required Pandas was found.'
    test_schema_inference_from_pandas_with_dict (pyspark.sql.tests.test_creation.DataFrameCreationTests) ... skipped '[PACKAGE_NOT_INSTALLED] PyArrow >= 10.0.0 must be installed; however, it was not found.'
```

### Does this PR introduce _any_ user-facing change?

No, this is a test only change.

### How was this patch tested?

Manual tests with PyPy3.9.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48097 from dongjoon-hyun/SPARK-43354.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun added a commit that referenced this pull request Sep 14, 2024
### What changes were proposed in this pull request?

This PR aims to update Spark documentation landing page (`docs/index.md`) for Apache Spark 4.0.0-preview2 release.

### Why are the changes needed?

- [SPARK-45314 Drop Scala 2.12 and make Scala 2.13 by default](https://issues.apache.org/jira/browse/SPARK-45314)
- #46228
- #47842
- [SPARK-45923 Spark Kubernetes Operator](https://issues.apache.org/jira/browse/SPARK-45923)

### Does this PR introduce _any_ user-facing change?

No because this is a documentation-only change.

### How was this patch tested?

Manual review.

<img width="927" alt="Screenshot 2024-09-13 at 16 01 55" src="https://github.com/user-attachments/assets/bdbd0e61-d71a-41ca-aa1b-1b0805813a45">

<img width="911" alt="Screenshot 2024-09-13 at 16 02 09" src="https://github.com/user-attachments/assets/e13a6bba-2149-48fa-983d-c5399defdc70">

<img width="820" alt="Screenshot 2024-09-13 at 16 02 38" src="https://github.com/user-attachments/assets/721c7760-bc2e-444c-9209-174e3119c2b4">

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #48113 from dongjoon-hyun/SPARK-49649.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
HyukjinKwon pushed a commit that referenced this pull request Dec 2, 2024
### What changes were proposed in this pull request?

This is a follow-up of
- #46228

### Why are the changes needed?

To update the RDD programing guide consistently.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual review.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #49035 from dongjoon-hyun/SPARK-47993.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants