Skip to content

Conversation

@kszucs
Copy link
Member

@kszucs kszucs commented Jan 29, 2020

No description provided.

@kszucs
Copy link
Member Author

kszucs commented Jan 29, 2020

@ursabot crossbow submit test-conda-python-3.6-spark-2.4.4

@kszucs
Copy link
Member Author

kszucs commented Jan 29, 2020

AMD64 Conda Crossbow Submit (#88674) builder has been succeeded.

Revision: 0f4de52129403022aaee438b118132679e19a6a8

Submitted crossbow builds: ursa-labs/crossbow @ ursabot-468

Task Status
test-conda-python-3.6-spark-2.4.4 CircleCI

@github-actions
Copy link

@kszucs
Copy link
Member Author

kszucs commented Jan 29, 2020

@ursabot crossbow submit test-conda-python-3.6-spark-2.4.4

@kszucs
Copy link
Member Author

kszucs commented Jan 29, 2020

AMD64 Conda Crossbow Submit (#88694) builder has been succeeded.

Revision: 03a4cfd685c03aaa11924730c08e4f90b0167c14

Submitted crossbow builds: ursa-labs/crossbow @ ursabot-469

Task Status
test-conda-python-3.6-spark-2.4.4 CircleCI

@kszucs
Copy link
Member Author

kszucs commented Jan 29, 2020

@BryanCutler neither 2.4.4 nor 2.4.5-rc1 spark versions seem compatible with arrow > 0.15.

Is arrow 0.16 going to be supported only for spark > 3?

@BryanCutler
Copy link
Member

BryanCutler commented Jan 29, 2020

@BryanCutler neither 2.4.4 nor 2.4.5-rc1 spark versions seem compatible with arrow > 0.15.
Is arrow 0.16 going to be supported only for spark > 3?

@kszucs Spark versions 2.4.x need to have the env variable ARROW_PRE_0_15_IPC_FORMAT=1 to work with pyarrow >= 0.15. See also https://github.com/apache/spark/blob/master/docs/sql-pyspark-pandas-with-arrow.md#compatibility-setting-for-pyarrow--0150-and-spark-23x-24x

@kszucs
Copy link
Member Author

kszucs commented Jan 29, 2020

Thanks Bryan, I’ll try that then.

@kszucs kszucs force-pushed the spark-2 branch 2 times, most recently from ca545f9 to ce20605 Compare February 12, 2020 14:10
@kszucs
Copy link
Member Author

kszucs commented Feb 12, 2020

@ursabot crossbow submit test-conda-python-3.6-spark-2.4.4 test-conda-python-3.7-spark-master

@kszucs
Copy link
Member Author

kszucs commented Feb 12, 2020

AMD64 Conda Crossbow Submit (#90452) builder has been succeeded.

Revision: ce2060550fdab643808cc8e4dfba4ae052771832

Submitted crossbow builds: ursa-labs/crossbow @ ursabot-505

Task Status
test-conda-python-3.6-spark-2.4.4 CircleCI
test-conda-python-3.7-spark-master CircleCI

@BryanCutler
Copy link
Member

@kszucs I did some local testing and I think there are a number of issues with getting Spark 2.4.4 tests working with the latest pyarrow and pandas >= 1.0.0. It might be better to setup a test for the upcoming release of Spark 3.0.0, the branch has been cut here https://github.com/apache/spark/tree/branch-3.0, that should work fine.

@kszucs
Copy link
Member Author

kszucs commented Feb 12, 2020

SGTM

@wesm
Copy link
Member

wesm commented Jun 6, 2020

@BryanCutler @kszucs what do you think about setting up a Spark 3.x integration test?

@kszucs
Copy link
Member Author

kszucs commented Jun 8, 2020

Definitely, I'm going to roll up this PR and create a jira for spark 3.0 integration testing.

@kszucs
Copy link
Member Author

kszucs commented Jun 8, 2020

Or just try to add it in this PR.

@kszucs
Copy link
Member Author

kszucs commented Jun 8, 2020

@github-actions crossbow submit spark

@github-actions
Copy link

github-actions bot commented Jun 8, 2020

Revision: e91d5c3

Submitted crossbow builds: ursa-labs/crossbow @ actions-297

Task Status
test-conda-python-3.6-spark-2.4.6 Github Actions
test-conda-python-3.7-spark-branch-3.0 Github Actions
test-conda-python-3.8-spark-master Github Actions

@kszucs
Copy link
Member Author

kszucs commented Jun 8, 2020

@BryanCutler could you please fix the build errors? The master/3: the build error might need to be fixed on spark's side.

@kszucs
Copy link
Member Author

kszucs commented Jun 8, 2020

@github-actions crossbow submit test-conda-python-3.6-spark-2.4.6

@github-actions
Copy link

github-actions bot commented Jun 8, 2020

Revision: a914eea

Submitted crossbow builds: ursa-labs/crossbow @ actions-298

Task Status
test-conda-python-3.6-spark-2.4.6 Github Actions

@BryanCutler
Copy link
Member

Yeah, definitely would be good to test against Spark 3.0. I'll have to work on a patch to handle some of the recent changes to Arrow Java to fix the build.

@kszucs
Copy link
Member Author

kszucs commented Jun 8, 2020

Thanks You! Although Spark 2 seems to fail as well.

@BryanCutler
Copy link
Member

Yup, the Spark 2 issues are probably due to the same changes - I'll take a look

@BryanCutler
Copy link
Member

@kszucs I submitted a patch to fix Java compilation with Spark master and branch-3.0, and tested locally with the latest pyarrow so Spark integration tests should pass for these as of this morning.

For Spark branch-2.4, it will require applying a different patch with the testing script, but maybe we should rethink this strategy a little. Spark 2.4.x will not upgrade Arrow Java anymore, so it would be more correct to take the latest Spark 2.4.6 release out of the box and test that with pyarrow from master. Is that possible to setup, wdyt?

@kszucs
Copy link
Member Author

kszucs commented Jun 24, 2020

@kszucs I submitted a patch to fix Java compilation with Spark master and branch-3.0, and tested locally with the latest pyarrow so Spark integration tests should pass for these as of this morning.

Thanks Bryan!

For Spark branch-2.4, it will require applying a different patch with the testing script, but maybe we should rethink this strategy a little. Spark 2.4.x will not upgrade Arrow Java anymore, so it would be more correct to take the latest Spark 2.4.6 release out of the box and test that with pyarrow from master. Is that possible to setup, wdyt?

I'm not sure that I understand you correctly :)

  • keep arrow master <=> spark branch-3.0
  • keep arrow master <=> spark master
  • and keep arrow master <=> spark 2.4.6 but execute only the pyarrow tests?

@BryanCutler
Copy link
Member

I mean the current process for integration tests with the master branch is to build Spark with Arrow Java master, then run Java and Python tests. That process is good for Spark master and branch-3.0, but with Spark branch-2.4, it doesn't make too much sense to rebuild Spark with Arrow Java master. This is because the dependency on Arrow Java is pretty much frozen for branch-2.4 (unless there is a serious enough bug), but different versions of pyarrow can be used. It would be best to take the latest Spark 2.4.6 release out of the box and run the pyspark tests from the script with pyarrow from master. We are not really concerned about Java compatibility since it can't be upgraded anyway, but that still allows us to flush out any Python problems.

@BryanCutler
Copy link
Member

@kszucs the nightly against Spark master have been passing. Do you think you could update this to just add the test against branch-3.0 and remove branch-2.4 for now? I'm not sure if the bot will pick it up from this PR, but I'll try to kick it off a test run.

@BryanCutler
Copy link
Member

@github-actions crossbow submit test-conda-python-3.7-spark-branch-3.0

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think we could skip branch-2.4 for now and branch-3.0 should be working

# use the master branch of spark, so prevent reusing any layers
run: --no-leaf-cache --env ARROW_PRE_0_15_IPC_FORMAT=1 conda-python-spark

test-conda-python-3.7-spark-branch-3.0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this should be named python-3.8 also

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing it.

@github-actions
Copy link

github-actions bot commented Jul 1, 2020

Revision: a914eea

Submitted crossbow builds: ursa-labs/crossbow @ actions-371

Task Status
test-conda-python-3.7-spark-branch-3.0 Github Actions

@kszucs
Copy link
Member Author

kszucs commented Jul 1, 2020

@github-actions crossbow submit test-spark

@github-actions
Copy link

github-actions bot commented Jul 1, 2020

Revision: 55d9411

Submitted crossbow builds: ursa-labs/crossbow @ actions-372

Task Status
test-conda-python-3.7-spark-branch-3.0 Github Actions
test-conda-python-3.8-spark-master Github Actions

Copy link
Member Author

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks @BryanCutler!

@kszucs kszucs closed this in 0d789ac Jul 2, 2020
@BryanCutler
Copy link
Member

This is great, thanks @kszucs !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants