-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-6429: [Integration] Adding patch to fix Spark compilation for Integrition tests #5465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-6429: [Integration] Adding patch to fix Spark compilation for Integrition tests #5465
Conversation
integration/spark/runtest.sh
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kszucs I thought this would be better in the Dockerfile after checking out spark, but for some reason the patch didn't seem to apply. I was trying this and it seemed to run, but the file wasn't patched
COPY integration/spark/ARROW-6429.patch /tmp/
RUN patch -d /spark -p1 -i /tmp/ARROW-6429.patch
Any ideas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, it seems to work now. I'll change it..
7c6d150 to
48b2eac
Compare
|
@ursabot crossbow submit docker-spark-integration |
|
AMD64 Conda Crossbow Submit (#64318) builder has been succeeded. Revision: dd2483f Submitted crossbow builds: ursa-labs/crossbow @ ursabot-215
|
|
@BryanCutler seems like the build filed with a timeout Any ideas how could we speed up the spark integration test a bit? |
@kszucs , I noticed the Java builds for Arrow and Spark are unusually slow. I'm not sure why, but I'll take a look at the settings. Also, the pyspark tests run twice, once against python and python3.6, which I think are the same in this image. So we can just explicitly test one and that will save a couple minutes I think. |
|
Once #5471 is merged, we should be able to get a pass here (as long as no timeout). |
|
|
||
| # installing java and maven | ||
| ARG MAVEN_VERSION=3.5.4 | ||
| ARG MAVEN_VERSION=3.6.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the minimum version used by Spark, so setting this here will prevent Spark from downloading it during the build phase
|
|
||
| (echo "Testing PySpark:"; IFS=$'\n'; echo "${SPARK_PYTHON_TESTS[*]}") | ||
| python/run-tests --testnames "$(IFS=,; echo "${SPARK_PYTHON_TESTS[*]}")" | ||
| python/run-tests --testnames "$(IFS=,; echo "${SPARK_PYTHON_TESTS[*]}")" --python-executables python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark will look for and test separately against installed python versions, so setting this will make sure to run the tests once on the default python
|
@kszucs , I made a couple small adjustments but I think the reason for the timeout is that Spark can take a long time during assembly, which downloads and assembles all required dependencies. I don't think there is anyway to avoid this since we need to test pyspark, but perhaps there is someway to cache better. Since this could take a long time and we limit the output during the build to just warnings, we end up hitting the "too long without output" timeout. Is it possible to increase this by setting |
|
@BryanCutler seems like we can increase that timeout https://support.circleci.com/hc/en-us/articles/360007188574-Build-has-hit-timeout-limit I'm going to try to push and pull images before and after the build and run to spare the docker-compose build time. |
|
Thanks @kszucs , so will you be able to adjust that timeout or can I do it somewhere from this PR? I'm not sure where the config file is.. I'll try to trigger another test since it should pass now if it doesn't timeout. |
|
@ursabot crossbow submit docker-spark-integration |
|
AMD64 Conda Crossbow Submit (#65162) builder has been succeeded. Revision: 918ab91 Submitted crossbow builds: ursa-labs/crossbow @ ursabot-229
|
|
@BryanCutler #5485 should speed up the things a bit, and sets no_output_timeout to an hour. |
|
If you've tried it locally then we can go ahead and merge this. |
kszucs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The patch LGTM.
|
merged to master, thanks for reviewing @kszucs ! |
Arrow Java Writer now requires an IpcOption for some APIs, this patch fixes the compilation to run Spark Integration tests.