-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[CD] Adds python docker image pipeline #16435
Conversation
0dba0ad
to
60b5f31
Compare
60b5f31
to
8eafd39
Compare
cd/python/docker/Dockerfile
Outdated
ENV MXNET_COMMIT_ID=${MXNET_COMMIT_ID} | ||
|
||
RUN mkdir -p /mxnet | ||
COPY wheel_build/dist/*.whl /mxnet/. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to put the copy behind the install instruction because of the cache layers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cd/python/docker/Dockerfile
Outdated
FROM ${BASE_IMAGE} | ||
|
||
ARG MXNET_COMMIT_ID | ||
ENV MXNET_COMMIT_ID=${MXNET_COMMIT_ID} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move that down as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
# Python MXNet Dockerfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the point of this file? It feels like you're replicating mechanisms we have in the regular CI dockerfiles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a base image for the tests. Because we mount the local fs we need to use that same user. Same problem as in CI. I'll use the ubuntu_adduser.sh script, though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cd/python/docker/python_images.sh
Outdated
|
||
# Builds runtime image | ||
build() { | ||
docker build -t "${image_name}" --build-arg PYTHON_CMD=${python_cmd} --build-arg BASE_IMAGE="${base_image}" --build-arg MXNET_COMMIT_ID=${GIT_COMMIT} -f ${resources_path}/Dockerfile . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raw docker access is highly discouraged. Please use our Python Docker wrapper. Otherwise you are doing to run into all sorts of leaks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will leak here? This assumes docker login has already been called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docker build and docker run tend to leak resources and thus leave behind unclaimed containers. It took us a lot of time to get it right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it - but it's way to tightly integrated with the CI stuff. I don't think it should be done here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a compromise, I've added a trap to the script. This does basically what the build.py does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We spent months on fixing all the problems that arose, so please spend a bit of time on integrating your stuff. We don't want the same issues to arise again - the current solution is proven to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I don't have the time to refactor build.py and extract that functionality and make it general purpose. If you are willing, I'm happy to refactor this stuff once it is done...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The time saved now is paid by increased maintenance overhead and risk due to two separate solutions. So this is a time trade, where the increased maintenance overhead will then have to be handled by the community - sorry, but I don't like that stance.
cd/utils/docker_login.py
Outdated
from botocore.exceptions import ClientError | ||
|
||
|
||
def docker_login(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have docker login code, please don't duplicate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair call - I'll refactor the login stuff out of the cache script and use that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
It feels like the wheel was reinvented a few times. Please use the existing mechanisms |
…ng add user script
@marcoabreu which mechanism should I be using? |
c42da77
to
daaebf8
Compare
The build.py or docker cache functions (the stuff we use to generate our docker cache). In the end, you're doing a similar job as the docker cache generation job, so you could adopt that |
Sorry, I misunderstood you. I equated wheel with wheel file =P But I don't necessarily agree with you. I don't think they are solving the same problem. My understanding of CI is that, as much as possible, we don't want to rebuild the images. This isn't so much of an issue here given the low level of complexity of the images, i.e. we just install python and copy over the wheel. If you feel like you can refactor the stuff in build and cache to be more generic and not so tightly integrated with CI, I'm happy to use it. What seems like an issue to me though, are the zombie containers. As a mid term solution, I'm trapping the SIGTERM and SIGINT signals and cleaning up the container - which is exactly what the python code does anyway - except in a way that is tightly bound to CI... |
Sorry I'm not following. Both build docker images and then publish them. If you're bothered by the usage of catching, you can disable incremental builds in the function. Sorry, but I don't really see much reason to go with a hack if we already got a working solution that you just have to integrate into. You were aware of the need to integrate your stuff with the existing code from the beginning when you decided you'd like to work off-branch, so this request shouldn't be a surprise. |
The problem is the tight integration with the CI stuff. At some point we had something, it's from months ago and I don't have time to merge it with the current state of things. Again, if you want to do it, I'm happy to hook up this stuff to whatever you come up with... |
At the end of the day, I'm on my own time here and I don't have much of it. Are you doing to work with me to push this through, or should I close the PR? |
Maybe you can find somebody else in the community who'd be able to assist you here. I'm just the reviewer here. |
No worries. Understood. In the future I would suggest being more flexible and working with the submitter to mitigate any issues as a follow up. Thus, fostering actual community building. Now, the community can spend time recreating this functionality + lose another contributer...wish you guys all the best! |
Since I know of the priority shift and the allocated time budget for future work on this particular project, I don't believe that a follow up would have happened. The risk this PR contains weighs over its benefit. No doubt that this feature would have been quite helpful to MXNet, but we already went through a lot of trouble with regards to zombie containers and I consider the stability of the whole CI more important. I made sure to bring up this particular point (integrating the solution with the existing tools and interfaces) right from the inception of CD - if the time budget wasn't allocated properly to accommodate this, then it's a pity, but it's a hard requirement. As development of a feature is part of a project, integration is as well - but that's often forgotten. I consider a standalone solution (like the one presented here) a proof of concept and the necessary next step is thus to align it with the existing infrastructure. If there's no time for that, then it has to stop at the POC stage. I'm generally open towards incremental changes, but not when a PR brings a risk to the entire system. |
I mitigated the risk in the PR. The integration could have been done in a follow up PR refactoring the functionality out of the scripts...as I said I've been working on this on my own time and would have continued to do so. But the consistent lack of flexibility and the constant trading of continuous improvement for delayed perfection makes this a hard project to contribute to. Wish you guys all the best! |
Description
Extends the CD pipeline for publishing the python docker images
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.