Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

CI: unify centos7_base.sh and centos7_core.sh #18093

Merged
merged 3 commits into from
Apr 20, 2020
Merged

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Apr 17, 2020

Description

Unifying the files is a mechanism to ensure the same system dependencies are used on CI and CD. For example, previously openblas-devel was missing on CD.

@mxnet-bot
Copy link

Hey @leezu , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [website, clang, miscellaneous, windows-gpu, sanity, edge, unix-gpu, centos-gpu, centos-cpu, windows-cpu, unix-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@leezu
Copy link
Contributor Author

leezu commented Apr 18, 2020

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@leezu
Copy link
Contributor Author

leezu commented Apr 18, 2020

@mxnet-bot run ci [unix-gpu]

np truedivide

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@leezu
Copy link
Contributor Author

leezu commented Apr 18, 2020

@mxnet-bot run ci [unix-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@leezu leezu force-pushed the fixcd branch 2 times, most recently from 1d4434c to c234074 Compare April 18, 2020 18:17
These images are used for testing the static builds. The idea is to have as few
system packages present as possible to catch potential dependency issues in our
release artifacts.
Copy link
Contributor

@marcoabreu marcoabreu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid having instructions directly inside the docker files. Instead, follow the same pattern of she'll scripts.

I know the usual way is to do it Inline, but I'd appreciate if we could keep the same style here.

@leezu
Copy link
Contributor Author

leezu commented Apr 18, 2020

@marcoabreu The current style is broken. We need to get rid of it and replace it with a better style. Let me elaborate.

This practice of copying shell files into the container leads to constant docker cache invalidation and makes for a horrible development experience. We need to reconsider this strategy and find a better solution that actually works. Just try generating a docker image locally. It will never rely on the mxnetci docker cache. You can check a number of references about this problem

That's problem Nr 1 with copying shell scripts. It may be possible to workaround by .dockerignore files, but the references pointed out above seem inconclusive. In either case, I'm not convinced that these shell scripts are the best solution:

Each of our docker containers has a certain purpose, such as providing a build setup, providing a setup with cuda drivers, providing a minimal setup for testing of release artifacts etc. For each purpose we like to ensure / test it on a variety of platforms (CentOS7 and some newer Ubuntu distro for build; various older and newer Linux distros for testing release artifacts). This leads to a combinatorial number of Dockerfiles. The shell scripts are used to avoid duplicating the same install steps in each Dockerfile. But as the shell scripts are separate from the Dockerfiles they are used in, people get confused easily about which script is used for which purpose; some duplication occurs or some unnecessary dependencies are installed. The problem here is, that there is a disconnect between the content of each shell script and the places of use that finally leads to a hard to understand "mess". Looking at https://github.com/apache/incubator-mxnet/tree/37dbbd4/ci/docker/install it's not really easy to understand for any newcomer.

I suggest to firstly simplify the Dockerfiles. As you see in this PR, at least some of the shell scripts install way to many dependencies for the Dockerfiles in which they are included. Other shell scripts can be completely removed by switching from Ubuntu 16.04 base image to 20.04 base image and using system dependencies instead of compiling stuff from source (we can use a very recent version of Ubuntu, because we already ensure compatibility with old distros based on CentOS7). This allows us to greatly simplify each Dockerfile.

Secondly, instead of having a separate Dockerfile for every particular setting (Dockerfile.publish.centos7_gpu_cu102, Dockerfile.publish.centos7_gpu_cu101, ...), just have a single Dockerfile for a class of settings (Dockerfile.publish.centos7 in this example) and dynamically specify the base image.

Together, this will greatly simplify our setup and greatly improve the developer experience by fixing the cache and will avoid the need for the shell scripts. This PR doesn't implement it yet obviously, but is a first step.

@leezu leezu requested a review from marcoabreu April 18, 2020 21:50
@leezu
Copy link
Contributor Author

leezu commented Apr 19, 2020

@mxnet-bot run ci [unix-gpu, windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu, unix-gpu]

@leezu leezu merged commit 586c8ab into apache:master Apr 20, 2020
@leezu leezu deleted the fixcd branch April 20, 2020 00:56
AntiZpvoh pushed a commit to AntiZpvoh/incubator-mxnet that referenced this pull request Jul 6, 2020
* CI: Unify centos7_base.sh and centos7_core.sh (delete centos7_base.sh)

* Simplify centos7.publish.test Dockerfiles used for testing binary artifacts

publish.test Dockerfiles should have as few system packages present as possible to catch potential dependency issues in release artifacts.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants