Skip to content

Upgrade Apache Spark to 3.5.0#1995

Merged
mathbunnyru merged 4 commits intojupyter:mainfrom
bjornjorgensen:spark3.5.0
Sep 16, 2023
Merged

Upgrade Apache Spark to 3.5.0#1995
mathbunnyru merged 4 commits intojupyter:mainfrom
bjornjorgensen:spark3.5.0

Conversation

@bjornjorgensen
Copy link
Contributor

@bjornjorgensen bjornjorgensen commented Sep 16, 2023

Describe your changes

Upgrade Apache Spark from 3.4.1 to 3.5.0
Release notes

Pandas on Spark API are using 'pandas<=2.0.3' now.
https://github.com/apache/spark/blob/9c0b803ba124a6e70762aec1e5559b0d66529f4d/dev/infra/Dockerfile#L67C40-L67C40

Issue ticket if applicable

Checklist (especially for first-time contributors)

  • I have performed a self-review of my code
  • If it is a core feature, I have added thorough tests
  • I will try not to use force-push to make the review process easier for reviewers
  • I have updated the documentation for significant changes

@mathbunnyru
Copy link
Member

Pandas on Spark API are using 'pandas<=2.0.3' now.
https://github.com/apache/spark/blob/9c0b803ba124a6e70762aec1e5559b0d66529f4d/dev/infra/Dockerfile#L67C40-L67C40

Does it mean latest pandas 2.1.0 has some issues with spark?
It would be really nice to completely remove pandas pin and the test, if everything works fine.

@bjornjorgensen
Copy link
Contributor Author

Pandas api on spark does not support pandas 2.1.0
apache/spark#42793

@mathbunnyru
Copy link
Member

Maybe #1937 will be fixed by this PR?

@bjornjorgensen
Copy link
Contributor Author

Maybe #1937 will be fixed by this PR?

No.

There are some problems with Hadoop 3.3.6 apache/hadoop#5706

https://lists.apache.org/thread/o7ockmppo5yqk2cm7f1kvo7plfgx6xnc

Co-authored-by: Ayaz Salikhov <mathbunnyru@users.noreply.github.com>
Copy link
Member

@mathbunnyru mathbunnyru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @bjornjorgensen!

@mathbunnyru mathbunnyru merged commit 52a999a into jupyter:main Sep 16, 2023
@bjornjorgensen bjornjorgensen deleted the spark3.5.0 branch September 16, 2023 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants