Skip to content

Conversation

@humanzz
Copy link
Contributor

@humanzz humanzz commented Jul 10, 2023

Issue #, if available: #167

Description of changes:

  • update modules.install_requirements to check for the presence of codeartifact (CA_* prefixed) environment variable
  • if env variable is present, build the authenticated endpoint index url, and add that to the pip install command
  • update sagemaker test dependency to support the setting of environment variables and update tox to resolve conflicts with newer sagemaker
  • add commented out integ test for installing requirements from codeartifact

Testing done:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I used the commit message format described in CONTRIBUTING
  • I have used the regional endpoint when creating S3 and/or STS clients (if appropriate)
  • I have updated any necessary documentation, including READMEs

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have checked that my tests are not configured for a specific region or account (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@humanzz
Copy link
Contributor Author

humanzz commented Jul 10, 2023

This is related to the similar PR for sagemaker-inference-toolkit at aws/sagemaker-inference-toolkit#130

@humanzz
Copy link
Contributor Author

humanzz commented Jul 10, 2023

I've been using the codeartifact-related code for a few months already. I was not able to add any related integration tests as such tests would require access to a fixed codeartifact repo, and for the container to have access to AWS credentials.

I'm asking for guidance if any such integ tests would be needed here.

Additionally, is anything required in README to mention this CodeArtifact support?

@humanzz
Copy link
Contributor Author

humanzz commented Jul 11, 2023

As an attempt to integ test this (code not in a committable state), I made the following changes (git diff)

diff --git a/setup.py b/setup.py
index 9b30321..17aea4a 100644
--- a/setup.py
+++ b/setup.py
@@ -80,11 +80,11 @@ setuptools.setup(
     install_requires=required_packages,
     extras_require={
         "test": [
-            "tox==3.13.1",
+            "tox==4.6.4",
             "pytest==4.4.1",
             "pytest-cov",
             "mock",
-            "sagemaker[local]<2",
+            "sagemaker[local]===2.172.0",
             "black==22.3.0 ; python_version >= '3.7'",
         ]
     },
diff --git a/test/integration/local/test_dummy.py b/test/integration/local/test_dummy.py
index f4db94e..8bc21bd 100644
--- a/test/integration/local/test_dummy.py
+++ b/test/integration/local/test_dummy.py
@@ -38,15 +38,24 @@ def container():
 def test_install_requirements(capsys):
     estimator = Estimator(
         image_uri="sagemaker-training-toolkit-test:dummy",
-        role="SageMakerRole",
+        # role="SageMakerRole",
+        role="...",
         instance_count=1,
         instance_type="local",
+        environment={
+            "CA_REPOSITORY_ARN": "..."
+        }
     )

     estimator.fit()

     stdout = capsys.readouterr().out

+    print(stdout)
+
     assert "Installing collected packages: pyfiglet" in stdout
     assert "Successfully installed pyfiglet-0.8.post1" in stdout
     assert "Reporting training SUCCESS" in stdout

which

  1. Update container building to include the CA_* environment variables. I had to update sagemaker to latest version, coz the version used did not support environment= for Estimator. I also had to update tox as older version had conflicts with newer sagemaker
  2. Update test_install_requirements to print the container's stdout so I can see the pip install commands

With that, I got stdout looking like

iqe4fvsnb7-algo-1-0z7sx  | Looking in indexes: https://xxx:****@xxx-xxx.d.codeartifact.us-west-2.amazonaws.com/pypi/xxx/simple/
iqe4fvsnb7-algo-1-0z7sx  | Collecting pyfiglet==0.8.post1 (from -r requirements.txt (line 1))
iqe4fvsnb7-algo-1-0z7sx  |   Downloading https://xxx-xxx.d.codeartifact.us-west-2.amazonaws.com/pypi/xxx/simple/pyfiglet/0.8.post1/pyfiglet-0.8.post1-py2.py3-none-any.whl (865 kB)

@humanzz
Copy link
Contributor Author

humanzz commented Jul 15, 2023

Based on #187 (comment), I've updated the PR to include a commented out integ test, that includes updates to sagemaker and tox

token = auth_token_response["authorizationToken"]
endpoint_response = client.get_repository_endpoint(domain=domain, domainOwner=owner, repository=repository, format="pypi")
unauthenticated_index = endpoint_response["repositoryEndpoint"]
return re.sub("https://", "https://aws:{}@".format(token), re.sub("{}/?$".format(repository), "{}/simple/".format(repository), unauthenticated_index))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ensure https://aws works for all regions. Some snow-forted region may have different prefix.

LGTM otherwise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Have you tested this in China region? Are we sure it will not be like https://aws.cn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per https://aws.amazon.com/codeartifact/faq/, CodeArtifact is not available everywhere

In which AWS Regions is CodeArtifact available?

CodeArtifact is available in the following 13 [AWS Regions](https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/):

US East (N.Virginia)
US East (Ohio)
US West (Oregon)
EU (Ireland)
EU (London)
EU (Frankfurt)
EU (Stockholm)
EU (Milan)
EU (Paris)
Asia Pacific (Sydney)
Asia Pacific (Tokyo)
Asia Pacific (Mumbai)
Asia Pacific (Singapore).

Additionally, the aws: part is not really abt the partition, it's the username e.g. https://username:[email protected]/somewhere

client.get_repository_endpoint returns the https://example.com/somewhere, then the following code is abt adding the username/password, and appending some path to the url

@humanzz humanzz force-pushed the codeartifact branch 2 times, most recently from 9a75eee to 621ac15 Compare July 21, 2023 10:06
@humanzz
Copy link
Contributor Author

humanzz commented Jul 24, 2023

The inference-side change has been merged at

Appreciate if you can have a look here.

assert "Reporting training SUCCESS" in stdout


# def test_install_requirements_from_codeartifact(capsys):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you enable this test later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, for this to really be enabled, it requires that the build infrastructure for this package to actually have a codeartifact repo which I believe it doesn't.
I've written the integ test, that can be enabled if the build infra has such access.

So, at the moment, the answer is no, I don't think I can enable this on my own.

https://docs.aws.amazon.com/service-authorization/latest/reference/list_awscodeartifact.html#awscodeartifact-resources-for-iam-policies
:return: authenticated codeartifact index url
"""
repository_arn = os.getenv(CA_REPOSITORY_ARN_ENV)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a repository_arn sample in comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second documentation link already has the format of the arn.

arn:${Partition}:codeartifact:${Region}:${Account}:repository/${DomainName}/${RepositoryName}

Also, keep in mind, this is a private method, but am happy to add this above format for clarity if u think that the link is not enough

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, pls add it. Thanks

emeraldbay
emeraldbay previously approved these changes Aug 2, 2023
Copy link
Contributor

@emeraldbay emeraldbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- update modules.install_requirements to check for the presence of codeartifact (CA_* prefixed) environment variable
- if env variable is present, build the authenticated endpoint index url, and add that to the pip install command
- update sagemaker test dependency to support the setting of environment variables and update tox to resolve conflicts with newer sagemaker
- add commented out integ test for installing requirements from codeartifact
@humanzz humanzz linked an issue Aug 8, 2023 that may be closed by this pull request
Copy link
Contributor

@emeraldbay emeraldbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@humanzz humanzz merged commit 0ad97fd into aws:master Aug 8, 2023
emeraldbay pushed a commit to emeraldbay/sagemaker-training-toolkit that referenced this pull request Oct 12, 2023
emeraldbay pushed a commit to emeraldbay/sagemaker-training-toolkit that referenced this pull request Oct 13, 2023
emeraldbay pushed a commit to emeraldbay/sagemaker-training-toolkit that referenced this pull request Oct 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support CodeArtifact repositories for installing Python packages

4 participants