-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-1281: [C++/Python] Add Docker setup for testing HDFS IO in C++ and Python #895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
xhochy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some initial comments but it seems to be still quite rough so I would wait for more comments until it's settled.
python/testing/hdfs/Dockerfile
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you normally run as root inside of docker, sudo should not be required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I had to change the user back to root from the base image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding --rm to docker run should make this line redundant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some issues when I removed this line and then added --rm to the docker run command. Perhaps we can tweak in a subsequent PR when adding more features to these ad hoc Docker tests
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I'm adding the Python HDFS tests and will remove the WIP
python/testing/hdfs/Dockerfile
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I had to change the user back to root from the base image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
Change-Id: I40d3acd46802ecb2a37f4d83ed08a841645772ba
|
This is ready to go. This is good for one-off use but we should see about refactoring our CI scripts to be able to share code more easily with this kind of thing. It's a little bit tricky because of the various interdependent environment variables As part of ARROW-1213 I will add S3 testing to this setup so you can check things out locally against an access/secret key for a cloud bucket |
Change-Id: I9819fb4f79ae202164dc4cf41c8d35961cff2589
Change-Id: I820f8eb707df50c6d12602fe2d816c80b1402ee1
Change-Id: Ib247a679667a40365846507b6ea9795660226272
|
@xhochy let me know any more comments on this. I'm going to look at the Parquet RC in the meantime |
|
+1; I'm going to keep adding some more ad hoc tests |
xhochy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments about best practices I gather from developing a lot with Docker comtainers
| # under the License. | ||
|
|
||
| use_gcc() { | ||
| export CC=gcc-4.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to set these variables in The Dockerfile via ENV CC gcc-4.9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure about this as the idea was to be able to easily switch between compilers depending on the test script
| set -e | ||
|
|
||
| export PATH="$MINICONDA/bin:$PATH" | ||
| conda update -y -q conda |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in my dockerfiles, i normally include the conda installation in the image. Thus I get faster iteration times on repeated test runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also helpful to separate the base installation and the project specific dependencies into different layers of the docker image so they are shared between similar images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I will do that in the next patch
| set -ex | ||
|
|
||
| docker build -t arrow-hdfs-test -f hdfs/Dockerfile . | ||
| bash hdfs/restart_docker_container.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For multiple docker containers, have a look at docker-compose. This lets you start and plug multiple containers together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, thank you
We aren't testing this in Travis CI because spinning up an HDFS cluster is a bit heavy weight, but this will at least enable us to do easier ongoing validation that this functionality is working properly.