Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite uv-auth #2976

Merged
merged 13 commits into from
Apr 16, 2024
Merged

Rewrite uv-auth #2976

merged 13 commits into from
Apr 16, 2024

Conversation

@zanieb zanieb changed the title zb/refactor auth store Refactor uv-auth Apr 10, 2024
@zanieb
Copy link
Member Author

zanieb commented Apr 10, 2024

Lookee this demonstrates the fix to passwords with percent-encoded characters

diff --git a/crates/uv-auth/src/credentials.rs b/crates/uv-auth/src/credentials.rs
index 9c222cff..dd1c8af8 100644
--- a/crates/uv-auth/src/credentials.rs
+++ b/crates/uv-auth/src/credentials.rs
@@ -37,11 +37,7 @@ impl Credentials {
             username: urlencoding::decode(url.username())
                 .expect("An encoded username should always decode")
                 .into_owned(),
-            password: url.password().map(|password| {
-                urlencoding::decode(password)
-                    .expect("An encoded password should always decode")
-                    .into_owned()
-            }),
+            password: url.password().map(|password| password.to_string()),
         })
     }
 }
@@ -191,7 +187,7 @@ mod test {
             .clone();
         header.set_sensitive(false);
 
-        assert_debug_snapshot!(header, @r###""Basic dXNlcjpwYXNzd29yZD09""###);
-        assert_snapshot!(decode_basic_auth(header), @"Basic: user:password==");
+        assert_debug_snapshot!(header, @r###""Basic dXNlcjpwYXNzd29yZCUzRCUzRA==""###);
+        assert_snapshot!(decode_basic_auth(header), @"Basic: user:password%3D%3D");
     }
 }

crates/uv-auth/src/store.rs Outdated Show resolved Hide resolved
crates/uv-auth/src/store.rs Outdated Show resolved Hide resolved
@zanieb
Copy link
Member Author

zanieb commented Apr 10, 2024

Intentionally leaving this unlabeled because we need a changelog entry for the user-facing fix

@zanieb
Copy link
Member Author

zanieb commented Apr 10, 2024

❯ packse index build scenarios/example.json --no-hash
❯ npx http-server --username user --password foo== ./index
❯ cargo run -q -- pip install --extra-index-url http://user:[email protected]:8080/simple-html example-a --reinstall --no-cache
Resolved 2 packages in 60ms
Downloaded 2 packages in 4ms
Installed 2 packages in 3ms
 - example-a==1.0.0
 + example-a==1.0.0
 - example-b==2.0.0
 + example-b==2.0.0
❯ gcm
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
❯ cargo run -q -- pip install --extra-index-url http://user:[email protected]:8080/simple-html example-a --reinstall --no-cache
error: Failed to download: example-a==1.0.0
  Caused by: HTTP status client error (401 Unauthorized) for url (http://127.0.0.1:8080/files/example_a-1.0.0-py3-none-any.whl#sha256=05f52f5cb7b5a677b4810004ec487f6420fbee6a368038cf8cf8384de5be1939)

@charliermarsh
Copy link
Member

Sweet, I will review tonight.

Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall ideas make sense to me, I defer to you on which feedback to address specifically prior to merging.

crates/uv-auth/src/keyring.rs Outdated Show resolved Hide resolved
crates/uv-auth/src/keyring.rs Outdated Show resolved Hide resolved
crates/uv-auth/src/middleware.rs Outdated Show resolved Hide resolved
/// See pip's implementation
/// <https://github.com/pypa/pip/blob/ae5fff36b0aad6e5e0037884927eaa29163c0611/src/pip/_internal/network/auth.py#L102>
fn fetch_subprocess(&self, url: &Url) -> Result<Option<String>, Error> {
let output = match Command::new("keyring")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is being refactored now anyway -- would it be possible to make this adjustable?

Suggested change
let output = match Command::new("keyring")
let keyring_command = std::env::var("UV_KEYRING_COMMAND").unwrap_or("keyring");
let output = match Command::new(keyring_command)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather consider this separately, sorry. This pull request is already huge.

} else {
debug!("No credentials found for already-seen URL: {url}");
trace!("Skipping keyring lookup for {url} with no username");
Copy link
Contributor

@BakerNet BakerNet Apr 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a breaking change - since before, it would attempt keyring with a placeholder username for keyring backends which don't need a username (notably Google Artifact Registry which my company uses).

In the initial/current implementation, it would use the same placeholder username that Google Artifact Registry uses internally (oauth2accesstoken see here). Annoyingly, the keyring CLI command requires a username is provided, even though some backends don't even check the username at all (in linked Google code, you can see username is never read).

Here, you can see pip attempting keyring even when username is None when trying import strategy - which is first attempt by default for pip.

This change would be an annoyance for our use case, as we'd have to add usernames to all of our index urls in all of our projects. We could deal with it, of course, but would be nice not to

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for giving the pull request a look!

I'm pretty uncomfortable using a hard-coded default username for all queries. It doesn't make sense to me to default to a specific username. I'd be willing to consider adding a workaround like passing null to keyring but not using it in the credentials or perhaps providing the Google placeholder if we detect a Google Artifact Registry (via this logic).

pip reading the keyring with a None username there is valid (and seems like a separate use case) because the get_credentials API is designed to return both a username and password. I don't know why this API is not available via the CLI. cc @jaraco ?

Copy link
Member Author

@zanieb zanieb Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a change where we send an arbitrary username but don't populate the URL with it... #3048 — this seems more correct than main but does not match pip's subprocess keyring provider afaict

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty uncomfortable using a hard-coded default username for all queries.

Yeah, totally fair

perhaps providing the Google placeholder if we detect a Google Artifact Registry (via this logic).

Selfishly, I'd like this but also understand if not

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does #3048 work for you though? Or do they require that username to be in the request?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you will need to update your URLs then, unfortunately. We can suggest exposing this API in the keyring CLI though so we can support retrieval of credentials without an explicit username.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can peek at the URL but I'm hesitant to add logic for specific providers.

Copy link
Contributor

@BakerNet BakerNet Apr 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaraco Made a quick and dirty PoC PR for exposing get_credential via cli: jaraco/keyring#678

The use case that demanded get_credential (jaraco/keyring#350) was API-only and it wasn't obvious if there should be a CLI interface to it, especially because get_credential was returning something requiring more structure than the other calls, something for which there was no obvious right answer.

Agreed with this - it would be nice to have it exposed via optional argument to keyring get but then the output is not consistent when providing username vs not. Which is why in my PoC PR it's a different operation

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zanieb would be good to mention in release notes that anyone using keyring subprocess with Google Artifact Registry should update index-urls with oauth2accesstoken@

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I'll note this as a breaking change.

@jenshnielsen
Copy link

@zanieb Did a quick test and can confirm that this works against Azure artifacts

@alelavml3
Copy link

Hi, I'm writing this comment because with this new release we are facing an authentication problems. We use a private Sonatype Nexus repository server which uses a private PyPi Repo to store our company private Python packages.
The command for compiling and install are the following:

export PYPI_INDEX_URL=https://$(NEXUS_USER):$(NEXUS_PWD)@$(NEXUS_SERVER_URL)
uv pip compile pyproject.toml -o requirements.txt --index-url $(PYPI_INDEX_URL)
uv pip sync requirements.txt --index-url $(PYPI_INDEX_URL)

The error we get is during uv pip sync:

error: Failed to download distributions
  Caused by: Failed to fetch wheel: google-cloud-pubsub==2.21.0
  Caused by: HTTP status client error (401 Unauthorized) for url (<NEXUS_SERVER_URL>/packages/google-cloud-pubsub/2.21.0/google_cloud_pubsub-2.21.0-py2.py3-none-any.whl#sha256=fabd19e08faa1f70081b0e5ea003a3c031a1d2c1b798cf1be5ded2dbf1d1dbef)

here the package is google-cloud-pubsub but it happens with any random package.

How can we fix this issue with this new version? For now, we are using v0.1.32 and everything works.

Thank you in advance for your answers!

@zanieb @jenshnielsen

@jenshnielsen
Copy link

@alelavml3 I think you should probably create a new issue to track this. That being said I have observed when fetching from Azure Artifacts that I get a 401 on the first attempt to install a package but retrying to install the same package works as expected. Perhaps your issue is the same? I have not had a chance to create an issue for this just yet

@zanieb
Copy link
Member Author

zanieb commented Apr 18, 2024

@alelavml3 Thanks for the report, a new issue would be greatly appreciated with RUST_LOG=trace uv pip sync -v ... logs. Be sure to redact your password.

zanieb added a commit that referenced this pull request Apr 19, 2024
This reverts commit c0efeed.

# Conflicts:
#	crates/uv-configuration/src/authentication.rs
#	crates/uv/src/cli.rs
#	crates/uv/src/main.rs
zanieb added a commit that referenced this pull request Apr 19, 2024
This reverts commit c0efeed.

# Conflicts:
#	crates/uv-configuration/src/authentication.rs
#	crates/uv/src/cli.rs
#	crates/uv/src/main.rs
zanieb added a commit that referenced this pull request Apr 19, 2024
This reverts commit c0efeed.

# Conflicts:
#	crates/uv-configuration/src/authentication.rs
#	crates/uv/src/cli.rs
#	crates/uv/src/main.rs
zanieb added a commit that referenced this pull request Apr 19, 2024
This reverts commit c0efeed.

# Conflicts:
#	crates/uv-configuration/src/authentication.rs
#	crates/uv/src/cli.rs
#	crates/uv/src/main.rs

# Conflicts:
#	crates/uv-auth/src/lib.rs
#	crates/uv/src/commands/pip_compile.rs
#	crates/uv/src/commands/pip_install.rs
#	crates/uv/src/commands/pip_sync.rs
#	crates/uv/src/commands/venv.rs
zanieb added a commit that referenced this pull request Apr 22, 2024
In #2976 I made some changes that led to regressions:

- We stopped tracking URLs that we had not seen credentials for in the
cache
- This means the cache no longer returns a value to indicate we've seen
a realm before
- We stopped seeding the cache with URLs 
- Combined with the above, this means we no longer had a list of
locations that we would never attempt to fetch credentials for
- We added caching of credentials found on requests
- Previously the cache was only populated from the seed or credentials
found in the netrc or keyring
- This meant that the cache was populated for locations that we
previously did not cache, i.e. GitHub artifacts(?)

Unfortunately this unveiled problems with the granularity of our cache.
We cache credentials per realm (roughly the hostname) but some realms
have mixed authentication modes i.e. different credentials per URL or
URLs that do not require credentials. Applying credentials to a URL that
does not require it can lead to a failed request, as seen in #3123 where
GitHub throws a 401 when receiving credentials.

To resolve this, the cache is expanded to supporting caching at two
levels:

- URL, cached URL must be a prefix of the request URL
- Realm, exact match required

When we don't have URL-level credentials cached, we attempt the request
without authentication first. On failure, we'll search for realm-level
credentials or fetch credentials from external services. This avoids
providing credentials to new URLs unless we know we need them.

Closes #3123
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking A breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants