-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Fix PyPI Packages and Python Docker Images nightly release #18222
Conversation
Hey @mseth10 , Thanks for submitting the PR
CI supported jobs: [centos-gpu, unix-gpu, windows-gpu, centos-cpu, miscellaneous, website, edge, unix-cpu, clang, sanity, windows-cpu] Note: |
@mxnet-bot run ci [windows-gpu] |
Jenkins CI successfully triggered : [windows-gpu] |
263e510
to
2d6949a
Compare
cd/utils/artifact_repository.py
Outdated
@@ -396,7 +396,7 @@ def get_s3_key_prefix(args: argparse.Namespace, subdir: str = '') -> str: | |||
:param subdir: An optional subdirectory in which to store the files. Post-pended to the end of the prefix. | |||
:return: A string containing the S3 key prefix to be used to uploading and downloading files to the artifact repository | |||
""" | |||
prefix = "{git_sha}/{libtype}/{os}/{variant}/".format(**vars(args)) | |||
prefix = "{git_sha}/{libtype}/{variant}/".format(**vars(args)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this is a fix. The OS was put there for a reason apparently, so I'd first like to see a deep dive to understand the reasoning behind that parameter before removing it. Also, I'd like to understand how this "broke".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcoabreu that's a valid concern. Thanks for raising it. Here's what I found after a deep dive:
The binaries for both CPU and GPU builds are pushed using a CPU ubuntu18.04 instance so they are present in the s3 bucket folder ubuntu18.04. The release pipeline uses GPU instances, currently running ubuntu16.04, for the cu* variants and thus fail to pull those artifacts.
The efforts to upgrade the GPU AMIs to ubuntu18.04 are underway. Meanwhile we need to get the nightly packages released due to significant user impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, having OS in the s3 bucket prefix does not make sense. Please let me know if I am missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@marcoabreu this is broken because there is no reason for different stages of the CD pipeline the exact same OS version in their AMI. Perhaps the author intended to take the OS used for building the binary into account. But that's not done and currently the S3 path depends on the AMI. Thus it's broken as GPU and CPU AMIs run different OS versions at this time. Thus deletion of that unneeded complexity seems to be the best path forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a binary on unix, windows, arm etc will result in different binaries which have to be differentiated. It might be a valid point that the os should not be the differentiator, but it should be properly addressed (or shown) that these mentioned cases are properly considered.
When removing something, it's up to the PR creator to prove that it will not cause any harm and that the initial design decision are not broken - or otherwise explicitly state it and make an alternative proposal. "fix" is not a valid argument or proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern here is that we'd be changing the directory structure in s3 - which we then have to fix later on.
I'd propose to either introduce the platform identifier immediately or replace the os parameter with a matching value (replace ubuntu1804 with Ubuntu 1604) as a hotfix. But I'd prefer if we do not change the directory structure in the underlying system as that'd be something we have to clean up later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can hardcode some value for {os}
instead of deleting it, sure. Though this is a internal ARTIFACT_REPOSITORY_BUCKET
s3 bucket, which is private and arguably should use object expiration as access to old objects is not needed (later step of the CD publishes the proper build artifacts to a public bucket). Thus I don't think we have to make any guarantees about a consistent structure over time.
I have some more concerns about the structure of the CD, but I'll open a separate issue to discuss them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per our discussion, I have reverted commit that removes {os} from s3 path. I have hard-coded os
to ubuntu18.04
for both upload and download of binaries.
@marcoabreu can you please take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@perdasilva may be able to clarify
@mseth10 thanks for posting the fix. @marcoabreu @leezu the CI check has passed. How shall we move forward? |
) * remove OS from s3 library path * fix bash script run commands * Revert "remove OS from s3 library path" This reverts commit 2665113. * hardcode s3 path for upload/download of binaries Co-authored-by: Ubuntu <[email protected]>
…18432) * remove OS from s3 library path * fix bash script run commands * Revert "remove OS from s3 library path" This reverts commit 2665113. * hardcode s3 path for upload/download of binaries Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Ubuntu <[email protected]>
) * remove OS from s3 library path * fix bash script run commands * Revert "remove OS from s3 library path" This reverts commit 2665113. * hardcode s3 path for upload/download of binaries Co-authored-by: Ubuntu <[email protected]>
) (apache#18432) * remove OS from s3 library path * fix bash script run commands * Revert "remove OS from s3 library path" This reverts commit 2665113. * hardcode s3 path for upload/download of binaries Co-authored-by: Ubuntu <[email protected]> Co-authored-by: Manu Seth <[email protected]> Co-authored-by: Ubuntu <[email protected]>
Description
Nightly PyPI packages are currently not getting published to https://dist.mxnet.io/python/all
There is an error in pipeline Build stage for GPU variants link. The logs state the following error -
ERROR: No artifacts found for this configuration.
The binaries for all variants are present in s3, but GPU builds couldn't find them as they are searching at the wrong path. This issue arises because the s3 bucket path contains OS version, which is different during upload and download of GPU binaries. This PR fixes this issue by hard-coding OS to be the same for both upload and download.
This also resolves a similar error in Python docker images pipeline link
There is another error in both pipelines: link1 link2, which is due to a
SyntaxError
caused by incorrect bash script trigger command. This PR also resolves these bugs.