Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uv-resolver: make source=git more structured #4631

Closed
wants to merge 1 commit into from

Conversation

BurntSushi
Copy link
Member

This splits out the git fields into an inline table instead of smushing
them into a single URL.

I'm unsure about this change, because it seems like we are using the git
URLs in other user facing ways. For example, some of the snapshot diffs
look like this:

     Installed 2 packages in [TIME]
      - project==0.1.0 (from file://[TEMP_DIR]/)
      + project==0.1.0 (from file://[TEMP_DIR]/)
-     + uv-public-pypackage==0.1.0 (from git+https://github.com/astral-test/uv-public-pypackage@0dacfd662c64cb4ceb16e6cf65a157a8b715b979?rev=0.0.1#0dacfd662c64cb4ceb16e6cf65a157a8b715b979)
+     + uv-public-pypackage==0.1.0 (from git+https://github.com/astral-test/uv-public-pypackage@0dacfd662c64cb4ceb16e6cf65a157a8b715b979)

which seems non-ideal. It should be possible to make the lock file use
structured fields while keeping the URL for terser output as above, but
I'm not sure that it's worth it.

This builds on top of #4627

@BurntSushi BurntSushi requested review from charliermarsh and konstin and removed request for charliermarsh June 28, 2024 17:19
Base automatically changed from ag/rejigger-source to main June 28, 2024 19:02
Copy link
Member

@konstin konstin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new snapshot looks more correct to me, losing the duplication it had

@charliermarsh charliermarsh self-assigned this Jul 1, 2024
@charliermarsh
Copy link
Member

I can own this one.

#[derive(Clone, Debug, Eq, Hash, PartialEq, PartialOrd, Ord, serde::Deserialize)]
enum GitSourceKind {
Tag(String),
Branch(String),
Rev(String),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was Rev removed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because, at least in this change, "rev" is no longer a distinct kind. It is strictly duplicative with the fact that the lock file always records a revision regardless of what the underlying kind is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But how does it record the revision that was requested by the user?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. I believe that is exactly the thing I am unsure about here. As in, I don't think I grok why we need to keep track of what the user requested specifically in the lock file, since it is presumably in pyproject.toml. But yeah, if we do need to keep track of what the user requested in the lock file, specifically for git (but notably not for other things), then we will indeed need to capture some extra state here somehow. Ideally without duplication the revision hash.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the user has rev = "refs/foo" in their requirements, and we lock with that request, and then they re-lock with the same requirements... We want to know that the locked requirement matches rev = "refs/foo" even if the commit that maps to that ref has changed (unless the user passed --upgrade).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so my mistake here is assuming that "revision" and "precise sha1" were equivalent. They aren't, because a revision might be a named reference to something. I'll add this back.

@BurntSushi BurntSushi force-pushed the ag/rejigger-git-source branch 3 times, most recently from 32f3e96 to b4156f3 Compare July 9, 2024 13:32
@BurntSushi
Copy link
Member Author

BurntSushi commented Jul 9, 2024

@charliermarsh OK, I've updated this PR to include the revision in the lock file so that doesn't get dropped.

I think the main issue remaining here is that some of the snapshot diffs for uv pip compile output look wrong due to the URLs changing. I think the root cause is that, since the lock file no longer uses a single URL string to represent all of the git information, things external to the lock data structure can't reuse it. So we might need a step that flattens the new structured representation to the URL format for uv pip compile?

We could also just keep the URL representation in the lock file and not use structured data. I wouldn't mind that personally.

@@ -603,7 +603,7 @@ fn branching_urls_of_different_sources_disjoint() -> Result<()> {
[[distribution]]
name = "iniconfig"
version = "2.0.0"
source = { git = "https://github.com/pytest-dev/iniconfig?rev=93f5930e668c0d1ddf4597e38dd0dea4e2665e7a#93f5930e668c0d1ddf4597e38dd0dea4e2665e7a" }
source = { git = "https://github.com/pytest-dev/iniconfig", precise = "93f5930e668c0d1ddf4597e38dd0dea4e2665e7a", revision = "93f5930e668c0d1ddf4597e38dd0dea4e2665e7a" }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we omit the revision if they are the same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I am in favor of this, but I think it then becomes ambiguous. Specifically, I'm thinking about this part in the code:

Git {
git,
commit,
subdirectory,
tag,
branch,
revision,
} => {
let precise: GitSha =
commit
.parse()
.map_err(|err| LockErrorKind::InvalidGitSourceRevision {
given: commit.to_string(),
err,
})?;
let kind = match (tag, branch, revision) {
(None, None, None) => GitSourceKind::DefaultBranch,
(Some(tag), None, None) => GitSourceKind::Tag(tag),
(None, Some(branch), None) => GitSourceKind::Branch(branch),
(None, None, Some(revision)) => GitSourceKind::Rev(revision),
_ => {
let kind = LockErrorKind::InvalidGitSourceTooMany;
return Err(kind.into());
}
};

Namely, this is where we deserialize our structured git data back into a Source. Currently, when branch, tag and revision are all missing, we assume "default branch" semantics. Which means that if we drop revision = "..." when it's identical to commit = "...", we won't be able to distinguish between "user specified a specific revision" and "user asked for the default branch."

This splits out the git fields into an inline table instead of smushing
them into a single URL.

I'm unsure about this change, because it seems like we are using the git
URLs in other user facing ways. For example, some of the snapshot diffs
look like this:

         Installed 2 packages in [TIME]
          - project==0.1.0 (from file://[TEMP_DIR]/)
          + project==0.1.0 (from file://[TEMP_DIR]/)
    -     + uv-public-pypackage==0.1.0 (from git+https://github.com/astral-test/uv-public-pypackage@0dacfd662c64cb4ceb16e6cf65a157a8b715b979?rev=0.0.1#0dacfd662c64cb4ceb16e6cf65a157a8b715b979)
    +     + uv-public-pypackage==0.1.0 (from git+https://github.com/astral-test/uv-public-pypackage@0dacfd662c64cb4ceb16e6cf65a157a8b715b979)

which seems non-ideal. It should be possible to make the lock file use
structured fields while keeping the URL for terser output as above, but
I'm not sure that it's worth it.

This builds on top of #4627
@@ -186,7 +186,7 @@ fn add_git() -> Result<()> {
Installed 2 packages in [TIME]
- project==0.1.0 (from file://[TEMP_DIR]/)
+ project==0.1.0 (from file://[TEMP_DIR]/)
+ uv-public-pypackage==0.1.0 (from git+https://github.com/astral-test/uv-public-pypackage@0dacfd662c64cb4ceb16e6cf65a157a8b715b979?tag=0.0.1#0dacfd662c64cb4ceb16e6cf65a157a8b715b979)
+ uv-public-pypackage==0.1.0 (from git+https://github.com/astral-test/uv-public-pypackage@0dacfd662c64cb4ceb16e6cf65a157a8b715b979)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charliermarsh Other than preference, I think this was the main issue preventing me from merging this PR. I'm not sure that this is correct.

(I do kinda prefer the URL approach since it's consistent with what we do in other places.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's run with URL. It seems fine to me. Feel free to close.

@BurntSushi BurntSushi closed this Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants