[CD] Adds python docker image pipeline #16435

perdasilva · 2019-10-11T05:58:12Z

Description

Extends the CD pipeline for publishing the python docker images

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

marcoabreu · 2019-10-11T12:19:27Z

cd/python/docker/Dockerfile

+ENV MXNET_COMMIT_ID=${MXNET_COMMIT_ID}
+
+RUN mkdir -p /mxnet
+COPY wheel_build/dist/*.whl /mxnet/.


You might want to put the copy behind the install instruction because of the cache layers

marcoabreu · 2019-10-11T12:19:42Z

cd/python/docker/Dockerfile

+FROM ${BASE_IMAGE}
+
+ARG MXNET_COMMIT_ID
+ENV MXNET_COMMIT_ID=${MXNET_COMMIT_ID}


Move that down as well

marcoabreu · 2019-10-11T12:21:04Z

cd/python/docker/Dockerfile.test

+# specific language governing permissions and limitations
+# under the License.
+#
+# Python MXNet Dockerfile


What's the point of this file? It feels like you're replicating mechanisms we have in the regular CI dockerfiles

This is a base image for the tests. Because we mount the local fs we need to use that same user. Same problem as in CI. I'll use the ubuntu_adduser.sh script, though...

marcoabreu · 2019-10-11T12:22:06Z

cd/python/docker/python_images.sh

+
+# Builds runtime image
+build() {
+    docker build -t "${image_name}" --build-arg PYTHON_CMD=${python_cmd} --build-arg BASE_IMAGE="${base_image}" --build-arg MXNET_COMMIT_ID=${GIT_COMMIT} -f ${resources_path}/Dockerfile .


Raw docker access is highly discouraged. Please use our Python Docker wrapper. Otherwise you are doing to run into all sorts of leaks

What will leak here? This assumes docker login has already been called

Docker build and docker run tend to leak resources and thus leave behind unclaimed containers. It took us a lot of time to get it right

I get it - but it's way to tightly integrated with the CI stuff. I don't think it should be done here.

As a compromise, I've added a trap to the script. This does basically what the build.py does.

We spent months on fixing all the problems that arose, so please spend a bit of time on integrating your stuff. We don't want the same issues to arise again - the current solution is proven to work.

Unfortunately, I don't have the time to refactor build.py and extract that functionality and make it general purpose. If you are willing, I'm happy to refactor this stuff once it is done...

The time saved now is paid by increased maintenance overhead and risk due to two separate solutions. So this is a time trade, where the increased maintenance overhead will then have to be handled by the community - sorry, but I don't like that stance.

marcoabreu · 2019-10-11T12:23:37Z

cd/utils/docker_login.py

+from botocore.exceptions import ClientError
+
+
+def docker_login():


We already have docker login code, please don't duplicate

Fair call - I'll refactor the login stuff out of the cache script and use that

marcoabreu · 2019-10-11T12:24:53Z

It feels like the wheel was reinvented a few times. Please use the existing mechanisms

…ng add user script

perdasilva · 2019-10-11T13:25:37Z

@marcoabreu which mechanism should I be using?

…e-uses it for CD

marcoabreu · 2019-10-11T13:56:11Z

The build.py or docker cache functions (the stuff we use to generate our docker cache). In the end, you're doing a similar job as the docker cache generation job, so you could adopt that

perdasilva · 2019-10-11T14:07:16Z

Sorry, I misunderstood you. I equated wheel with wheel file =P

But I don't necessarily agree with you. I don't think they are solving the same problem. My understanding of CI is that, as much as possible, we don't want to rebuild the images. This isn't so much of an issue here given the low level of complexity of the images, i.e. we just install python and copy over the wheel.

If you feel like you can refactor the stuff in build and cache to be more generic and not so tightly integrated with CI, I'm happy to use it.

What seems like an issue to me though, are the zombie containers. As a mid term solution, I'm trapping the SIGTERM and SIGINT signals and cleaning up the container - which is exactly what the python code does anyway - except in a way that is tightly bound to CI...

marcoabreu · 2019-10-11T14:14:02Z

Sorry I'm not following. Both build docker images and then publish them. If you're bothered by the usage of catching, you can disable incremental builds in the function.

Sorry, but I don't really see much reason to go with a hack if we already got a working solution that you just have to integrate into. You were aware of the need to integrate your stuff with the existing code from the beginning when you decided you'd like to work off-branch, so this request shouldn't be a surprise.

perdasilva · 2019-10-11T14:19:22Z

The problem is the tight integration with the CI stuff. At some point we had something, it's from months ago and I don't have time to merge it with the current state of things.

Again, if you want to do it, I'm happy to hook up this stuff to whatever you come up with...

perdasilva · 2019-10-11T14:27:59Z

At the end of the day, I'm on my own time here and I don't have much of it. Are you doing to work with me to push this through, or should I close the PR?

marcoabreu · 2019-10-11T14:31:43Z

Maybe you can find somebody else in the community who'd be able to assist you here. I'm just the reviewer here.

perdasilva · 2019-10-11T14:59:36Z

No worries. Understood. In the future I would suggest being more flexible and working with the submitter to mitigate any issues as a follow up. Thus, fostering actual community building. Now, the community can spend time recreating this functionality + lose another contributer...wish you guys all the best!

marcoabreu · 2019-10-11T15:14:44Z

Since I know of the priority shift and the allocated time budget for future work on this particular project, I don't believe that a follow up would have happened. The risk this PR contains weighs over its benefit. No doubt that this feature would have been quite helpful to MXNet, but we already went through a lot of trouble with regards to zombie containers and I consider the stability of the whole CI more important. I made sure to bring up this particular point (integrating the solution with the existing tools and interfaces) right from the inception of CD - if the time budget wasn't allocated properly to accommodate this, then it's a pity, but it's a hard requirement. As development of a feature is part of a project, integration is as well - but that's often forgotten. I consider a standalone solution (like the one presented here) a proof of concept and the necessary next step is thus to align it with the existing infrastructure. If there's no time for that, then it has to stop at the POC stage. I'm generally open towards incremental changes, but not when a PR brings a risk to the entire system.

perdasilva · 2019-10-11T15:33:37Z

I mitigated the risk in the PR. The integration could have been done in a follow up PR refactoring the functionality out of the scripts...as I said I've been working on this on my own time and would have continued to do so. But the consistent lack of flexibility and the constant trading of continuous improvement for delayed perfection makes this a hard project to contribute to. Wish you guys all the best!

perdasilva requested review from aaronmarkham, marcoabreu and szha as code owners October 11, 2019 05:58

perdasilva force-pushed the cd_python_docker_pipeline branch 3 times, most recently from 0dba0ad to 60b5f31 Compare October 11, 2019 06:27

perdasilva changed the title ~~[WIP][CD] Adds python docker image pipeline~~ [CD] Adds python docker image pipeline Oct 11, 2019

perdasilva added 2 commits October 11, 2019 08:44

Adds docker utilities

59e7781

Adds python docker pipeline

8eafd39

perdasilva force-pushed the cd_python_docker_pipeline branch from 60b5f31 to 8eafd39 Compare October 11, 2019 06:44

marcoabreu suggested changes Oct 11, 2019

View reviewed changes

Fixes Dockerfile instruction order, build context, and re-uses existi…

2a5509e

…ng add user script

Refactors docker login functionality out of docker_cache script and r…

daaebf8

…e-uses it for CD

perdasilva force-pushed the cd_python_docker_pipeline branch from c42da77 to daaebf8 Compare October 11, 2019 13:36

Cleans up docker container on sigterm and sigint

57093a4

perdasilva closed this Oct 11, 2019

perdasilva mentioned this pull request Oct 19, 2019

[CD] Adds python docker pipeline #16547

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CD] Adds python docker image pipeline #16435

[CD] Adds python docker image pipeline #16435

perdasilva commented Oct 11, 2019

marcoabreu Oct 11, 2019

perdasilva Oct 11, 2019

marcoabreu Oct 11, 2019

perdasilva Oct 11, 2019

marcoabreu Oct 11, 2019

perdasilva Oct 11, 2019

perdasilva Oct 11, 2019

marcoabreu Oct 11, 2019

perdasilva Oct 11, 2019

marcoabreu Oct 11, 2019

perdasilva Oct 11, 2019

perdasilva Oct 11, 2019

marcoabreu Oct 11, 2019

perdasilva Oct 11, 2019

marcoabreu Oct 11, 2019

marcoabreu Oct 11, 2019

perdasilva Oct 11, 2019 •

edited

Loading

perdasilva Oct 11, 2019

marcoabreu commented Oct 11, 2019

perdasilva commented Oct 11, 2019 •

edited

Loading

marcoabreu commented Oct 11, 2019

perdasilva commented Oct 11, 2019 •

edited

Loading

marcoabreu commented Oct 11, 2019

perdasilva commented Oct 11, 2019

perdasilva commented Oct 11, 2019

marcoabreu commented Oct 11, 2019

perdasilva commented Oct 11, 2019 •

edited

Loading

marcoabreu commented Oct 11, 2019

perdasilva commented Oct 11, 2019

		from botocore.exceptions import ClientError


		def docker_login():

[CD] Adds python docker image pipeline #16435

[CD] Adds python docker image pipeline #16435

Conversation

perdasilva commented Oct 11, 2019

Description

Checklist

Essentials

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

perdasilva Oct 11, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

marcoabreu commented Oct 11, 2019

perdasilva commented Oct 11, 2019 • edited Loading

marcoabreu commented Oct 11, 2019

perdasilva commented Oct 11, 2019 • edited Loading

marcoabreu commented Oct 11, 2019

perdasilva commented Oct 11, 2019

perdasilva commented Oct 11, 2019

marcoabreu commented Oct 11, 2019

perdasilva commented Oct 11, 2019 • edited Loading

marcoabreu commented Oct 11, 2019

perdasilva commented Oct 11, 2019

perdasilva Oct 11, 2019 •

edited

Loading

perdasilva commented Oct 11, 2019 •

edited

Loading

perdasilva commented Oct 11, 2019 •

edited

Loading

perdasilva commented Oct 11, 2019 •

edited

Loading