-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-7717: [CI] Have nightly integration test for Spark's latest release #6316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ursabot crossbow submit test-conda-python-3.6-spark-2.4.4 |
|
AMD64 Conda Crossbow Submit (#88674) builder has been succeeded. Revision: 0f4de52129403022aaee438b118132679e19a6a8 Submitted crossbow builds: ursa-labs/crossbow @ ursabot-468
|
|
@ursabot crossbow submit test-conda-python-3.6-spark-2.4.4 |
|
AMD64 Conda Crossbow Submit (#88694) builder has been succeeded. Revision: 03a4cfd685c03aaa11924730c08e4f90b0167c14 Submitted crossbow builds: ursa-labs/crossbow @ ursabot-469
|
|
@BryanCutler neither 2.4.4 nor 2.4.5-rc1 spark versions seem compatible with arrow > 0.15. Is arrow 0.16 going to be supported only for spark > 3? |
@kszucs Spark versions 2.4.x need to have the env variable |
|
Thanks Bryan, I’ll try that then. |
ca545f9 to
ce20605
Compare
|
@ursabot crossbow submit test-conda-python-3.6-spark-2.4.4 test-conda-python-3.7-spark-master |
|
AMD64 Conda Crossbow Submit (#90452) builder has been succeeded. Revision: ce2060550fdab643808cc8e4dfba4ae052771832 Submitted crossbow builds: ursa-labs/crossbow @ ursabot-505
|
|
@kszucs I did some local testing and I think there are a number of issues with getting Spark 2.4.4 tests working with the latest pyarrow and pandas >= 1.0.0. It might be better to setup a test for the upcoming release of Spark 3.0.0, the branch has been cut here https://github.com/apache/spark/tree/branch-3.0, that should work fine. |
|
SGTM |
|
@BryanCutler @kszucs what do you think about setting up a Spark 3.x integration test? |
|
Definitely, I'm going to roll up this PR and create a jira for spark 3.0 integration testing. |
|
Or just try to add it in this PR. |
|
@github-actions crossbow submit spark |
|
Revision: e91d5c3 Submitted crossbow builds: ursa-labs/crossbow @ actions-297
|
|
@BryanCutler could you please fix the build errors? The master/3: the build error might need to be fixed on spark's side. |
|
@github-actions crossbow submit test-conda-python-3.6-spark-2.4.6 |
|
Revision: a914eea Submitted crossbow builds: ursa-labs/crossbow @ actions-298
|
|
Yeah, definitely would be good to test against Spark 3.0. I'll have to work on a patch to handle some of the recent changes to Arrow Java to fix the build. |
|
Thanks You! Although Spark 2 seems to fail as well. |
|
Yup, the Spark 2 issues are probably due to the same changes - I'll take a look |
|
@kszucs I submitted a patch to fix Java compilation with Spark master and branch-3.0, and tested locally with the latest pyarrow so Spark integration tests should pass for these as of this morning. For Spark branch-2.4, it will require applying a different patch with the testing script, but maybe we should rethink this strategy a little. Spark 2.4.x will not upgrade Arrow Java anymore, so it would be more correct to take the latest Spark 2.4.6 release out of the box and test that with pyarrow from master. Is that possible to setup, wdyt? |
Thanks Bryan!
I'm not sure that I understand you correctly :)
|
|
I mean the current process for integration tests with the master branch is to build Spark with Arrow Java master, then run Java and Python tests. That process is good for Spark master and branch-3.0, but with Spark branch-2.4, it doesn't make too much sense to rebuild Spark with Arrow Java master. This is because the dependency on Arrow Java is pretty much frozen for branch-2.4 (unless there is a serious enough bug), but different versions of pyarrow can be used. It would be best to take the latest Spark 2.4.6 release out of the box and run the pyspark tests from the script with pyarrow from master. We are not really concerned about Java compatibility since it can't be upgraded anyway, but that still allows us to flush out any Python problems. |
|
@kszucs the nightly against Spark master have been passing. Do you think you could update this to just add the test against branch-3.0 and remove branch-2.4 for now? I'm not sure if the bot will pick it up from this PR, but I'll try to kick it off a test run. |
|
@github-actions crossbow submit test-conda-python-3.7-spark-branch-3.0 |
BryanCutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I think we could skip branch-2.4 for now and branch-3.0 should be working
| # use the master branch of spark, so prevent reusing any layers | ||
| run: --no-leaf-cache --env ARROW_PRE_0_15_IPC_FORMAT=1 conda-python-spark | ||
|
|
||
| test-conda-python-3.7-spark-branch-3.0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like this should be named python-3.8 also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing it.
|
Revision: a914eea Submitted crossbow builds: ursa-labs/crossbow @ actions-371
|
|
@github-actions crossbow submit test-spark |
|
Revision: 55d9411 Submitted crossbow builds: ursa-labs/crossbow @ actions-372
|
kszucs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, thanks @BryanCutler!
|
This is great, thanks @kszucs ! |
No description provided.