Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fix PyPI Packages and Python Docker Images nightly release #18222

Merged
merged 4 commits into from
May 5, 2020

Conversation

mseth10
Copy link
Contributor

@mseth10 mseth10 commented May 2, 2020

Description

Nightly PyPI packages are currently not getting published to https://dist.mxnet.io/python/all

There is an error in pipeline Build stage for GPU variants link. The logs state the following error - ERROR: No artifacts found for this configuration.

The binaries for all variants are present in s3, but GPU builds couldn't find them as they are searching at the wrong path. This issue arises because the s3 bucket path contains OS version, which is different during upload and download of GPU binaries. This PR fixes this issue by hard-coding OS to be the same for both upload and download.

This also resolves a similar error in Python docker images pipeline link

There is another error in both pipelines: link1 link2, which is due to a SyntaxError caused by incorrect bash script trigger command. This PR also resolves these bugs.

@mseth10 mseth10 requested a review from szha as a code owner May 2, 2020 13:38
@mxnet-bot
Copy link

Hey @mseth10 , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [centos-gpu, unix-gpu, windows-gpu, centos-cpu, miscellaneous, website, edge, unix-cpu, clang, sanity, windows-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@leezu
Copy link
Contributor

leezu commented May 2, 2020

@mxnet-bot run ci [windows-gpu]

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [windows-gpu]

@mseth10 mseth10 force-pushed the fix_nightly_cd branch 2 times, most recently from 263e510 to 2d6949a Compare May 2, 2020 23:07
@mseth10 mseth10 changed the title Fix GPU PyPI package nightly release pipeline Fix PyPI packages and Python Docker Images nightly release May 2, 2020
@mseth10 mseth10 changed the title Fix PyPI packages and Python Docker Images nightly release Fix PyPI Packages and Python Docker Images nightly release May 2, 2020
@@ -396,7 +396,7 @@ def get_s3_key_prefix(args: argparse.Namespace, subdir: str = '') -> str:
:param subdir: An optional subdirectory in which to store the files. Post-pended to the end of the prefix.
:return: A string containing the S3 key prefix to be used to uploading and downloading files to the artifact repository
"""
prefix = "{git_sha}/{libtype}/{os}/{variant}/".format(**vars(args))
prefix = "{git_sha}/{libtype}/{variant}/".format(**vars(args))
Copy link
Contributor

@marcoabreu marcoabreu May 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this is a fix. The OS was put there for a reason apparently, so I'd first like to see a deep dive to understand the reasoning behind that parameter before removing it. Also, I'd like to understand how this "broke".

Copy link
Contributor Author

@mseth10 mseth10 May 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoabreu that's a valid concern. Thanks for raising it. Here's what I found after a deep dive:
The binaries for both CPU and GPU builds are pushed using a CPU ubuntu18.04 instance so they are present in the s3 bucket folder ubuntu18.04. The release pipeline uses GPU instances, currently running ubuntu16.04, for the cu* variants and thus fail to pull those artifacts.
The efforts to upgrade the GPU AMIs to ubuntu18.04 are underway. Meanwhile we need to get the nightly packages released due to significant user impact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, having OS in the s3 bucket prefix does not make sense. Please let me know if I am missing something.

Copy link
Contributor

@leezu leezu May 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marcoabreu this is broken because there is no reason for different stages of the CD pipeline the exact same OS version in their AMI. Perhaps the author intended to take the OS used for building the binary into account. But that's not done and currently the S3 path depends on the AMI. Thus it's broken as GPU and CPU AMIs run different OS versions at this time. Thus deletion of that unneeded complexity seems to be the best path forward.

Copy link
Contributor

@marcoabreu marcoabreu May 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a binary on unix, windows, arm etc will result in different binaries which have to be differentiated. It might be a valid point that the os should not be the differentiator, but it should be properly addressed (or shown) that these mentioned cases are properly considered.

When removing something, it's up to the PR creator to prove that it will not cause any harm and that the initial design decision are not broken - or otherwise explicitly state it and make an alternative proposal. "fix" is not a valid argument or proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern here is that we'd be changing the directory structure in s3 - which we then have to fix later on.

I'd propose to either introduce the platform identifier immediately or replace the os parameter with a matching value (replace ubuntu1804 with Ubuntu 1604) as a hotfix. But I'd prefer if we do not change the directory structure in the underlying system as that'd be something we have to clean up later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can hardcode some value for {os} instead of deleting it, sure. Though this is a internal ARTIFACT_REPOSITORY_BUCKET s3 bucket, which is private and arguably should use object expiration as access to old objects is not needed (later step of the CD publishes the proper build artifacts to a public bucket). Thus I don't think we have to make any guarantees about a consistent structure over time.

I have some more concerns about the structure of the CD, but I'll open a separate issue to discuss them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our discussion, I have reverted commit that removes {os} from s3 path. I have hard-coded os to ubuntu18.04 for both upload and download of binaries.
@marcoabreu can you please take a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@perdasilva may be able to clarify

@leezu leezu mentioned this pull request May 4, 2020
@szha
Copy link
Member

szha commented May 5, 2020

@mseth10 thanks for posting the fix. @marcoabreu @leezu the CI check has passed. How shall we move forward?

@leezu leezu merged commit c329a52 into apache:master May 5, 2020
apeforest pushed a commit to apeforest/incubator-mxnet that referenced this pull request May 29, 2020
)

* remove OS from s3 library path

* fix bash script run commands

* Revert "remove OS from s3 library path"

This reverts commit 2665113.

* hardcode s3 path for upload/download of binaries

Co-authored-by: Ubuntu <[email protected]>
leezu pushed a commit that referenced this pull request May 29, 2020
…18432)

* remove OS from s3 library path

* fix bash script run commands

* Revert "remove OS from s3 library path"

This reverts commit 2665113.

* hardcode s3 path for upload/download of binaries

Co-authored-by: Ubuntu <[email protected]>

Co-authored-by: Manu Seth <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
AntiZpvoh pushed a commit to AntiZpvoh/incubator-mxnet that referenced this pull request Jul 6, 2020
)

* remove OS from s3 library path

* fix bash script run commands

* Revert "remove OS from s3 library path"

This reverts commit 2665113.

* hardcode s3 path for upload/download of binaries

Co-authored-by: Ubuntu <[email protected]>
ChaiBapchya pushed a commit to ChaiBapchya/mxnet that referenced this pull request Aug 15, 2020
) (apache#18432)

* remove OS from s3 library path

* fix bash script run commands

* Revert "remove OS from s3 library path"

This reverts commit 2665113.

* hardcode s3 path for upload/download of binaries

Co-authored-by: Ubuntu <[email protected]>

Co-authored-by: Manu Seth <[email protected]>
Co-authored-by: Ubuntu <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants