Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize extras in lockfile #3958

Merged
merged 2 commits into from
Jun 3, 2024
Merged

Normalize extras in lockfile #3958

merged 2 commits into from
Jun 3, 2024

Conversation

charliermarsh
Copy link
Member

@charliermarsh charliermarsh commented Jun 1, 2024

Summary

Previously, when we locked something like flask[dotenv], we created two separate distributions in the lockfile: one for flask, which included the base dependencies, and one for flask[dotenv], which included the base dependencies and the dotenv dependencies. This was easy to implement, but it meant that we were duplicating all of the distribution files for every extra, and duplicating all of the base dependencies for every extra.

This PR normalizes the data such that we now have one entry per distribution (i.e., ExtraName was removed from DistributionId), with an optional dependencies table with an entry per extra, like:

[[distribution]]
name = "project"
version = "0.1.0"
source = "editable+file://[TEMP_DIR]/"
sdist = { url = "file://[TEMP_DIR]/" }

[[distribution.dependencies]]
name = "anyio"
version = "3.7.0"
source = "registry+https://pypi.org/simple"

[distribution.optional-dependencies]

[[distribution.optional-dependencies.test]]
name = "iniconfig"
version = "2.0.0"
source = "registry+https://pypi.org/simple"

This requires a bit more work upfront, because we now need to merge multiple packages from the PetGraph representation when creating the lockfile.

Closes #3916.

@charliermarsh charliermarsh marked this pull request as ready for review June 1, 2024 18:16
pub(crate) source: Source,
}

impl DistributionId {
fn from_annotated_dist(annotated_dist: &AnnotatedDist) -> DistributionId {
pub(crate) fn from_annotated_dist(annotated_dist: &AnnotatedDist) -> DistributionId {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not thrilled that I'm making so many things pub(crate) here. Should I change ResolutionGraph::to_lock into a impl TryFrom<ResolutionGraph for Lock?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of bias toward concrete and bespoke (but conventional) conversion routines like ResolutionGraph::to_lock unless there's a specific need for generic fallible conversions.

I think this is more of a stylistic choice, but I'd say it's just a specific manifestation of "don't go generic unless there's a reason to." And with specific conversion routines, it's straight-forward to add more parameters if they ever become necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooooooo, I completely misunderstood your question. You were suggesting the TryFrom impl so that the conversion would be defined in this module, and that would in turn prevent exposing stuff.

Yeah I think I'd do that. Although, following from my previous comment, I'd probably just define Lock::from_resolution_graph or something. And that might in turn require exposing more stuff from the graph, but maybe that's okay.

(I don't think we've really settled on a great balance here. I wonder, for example, whether it really makes sense to have a ResolutionGraph at all. But this gets to the "installation might want different types" idea that's been floating around. It's a bigger refactor for sure.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm also not totally convinced that ResolutionGraph will be necessary in the long run.

@charliermarsh
Copy link
Member Author

@BurntSushi - Ignoring the code, curious if you prefer this representation?

Copy link

codspeed-hq bot commented Jun 1, 2024

CodSpeed Performance Report

Merging #3958 will not alter performance

Comparing charlie/ex (02f3ead) with main (362b00c)

Summary

✅ 13 untouched benchmarks

@charliermarsh charliermarsh added the preview Experimental behavior label Jun 1, 2024
Copy link
Member

@ibraheemdev ibraheemdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new representation generally makes sense to me.

Would it make more sense to put optional dependencies under distributions.extras."name".dependencies? Could we ever want to put other information under distributions.extras."name"?

let mut locked_dist = lock::Distribution::from_annotated_dist(dist)?;
for neighbor in self.petgraph.neighbors(node_index) {
let dependency_dist = &self.petgraph[neighbor];
locked_dist.add_dependency(dependency_dist);
}
locked_dists.push(locked_dist);
if let Some(locked_dist) = locked_dists.insert(locked_dist.id.clone(), locked_dist) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why could the previous code do an unconditional push here?

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd echo @ibraheemdev's question. Otherwise, this generally LGTM. If it's possible, I think I would prefer a way where we're not slapping pub(crate) on everything, but I don't feel too strongly at this point while we're still trying to figure out what the data types should be.

pub(crate) source: Source,
}

impl DistributionId {
fn from_annotated_dist(annotated_dist: &AnnotatedDist) -> DistributionId {
pub(crate) fn from_annotated_dist(annotated_dist: &AnnotatedDist) -> DistributionId {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of bias toward concrete and bespoke (but conventional) conversion routines like ResolutionGraph::to_lock unless there's a specific need for generic fallible conversions.

I think this is more of a stylistic choice, but I'd say it's just a specific manifestation of "don't go generic unless there's a reason to." And with specific conversion routines, it's straight-forward to add more parameters if they ever become necessary.

pub(crate) source: Source,
}

impl DistributionId {
fn from_annotated_dist(annotated_dist: &AnnotatedDist) -> DistributionId {
pub(crate) fn from_annotated_dist(annotated_dist: &AnnotatedDist) -> DistributionId {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooooooooo, I completely misunderstood your question. You were suggesting the TryFrom impl so that the conversion would be defined in this module, and that would in turn prevent exposing stuff.

Yeah I think I'd do that. Although, following from my previous comment, I'd probably just define Lock::from_resolution_graph or something. And that might in turn require exposing more stuff from the graph, but maybe that's okay.

(I don't think we've really settled on a great balance here. I wonder, for example, whether it really makes sense to have a ResolutionGraph at all. But this gets to the "installation might want different types" idea that's been floating around. It's a bigger refactor for sure.)

@charliermarsh
Copy link
Member Author

I think I will leave the representation as-is for now because it closely mirrors the pyproject.toml schema, where you have an optional-dependencies map that's keyed on extra name. That was intentional, because I'm hoping to make it possible to reify the distribution metadata from the lockfile in the future. It's a very good question though.

@charliermarsh charliermarsh force-pushed the charlie/ex branch 3 times, most recently from ec7ee43 to eaa1c91 Compare June 3, 2024 18:53
@charliermarsh charliermarsh enabled auto-merge (squash) June 3, 2024 18:55
@charliermarsh charliermarsh merged commit 10cd6b9 into main Jun 3, 2024
46 checks passed
@charliermarsh charliermarsh deleted the charlie/ex branch June 3, 2024 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Experimental behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Normalize extra representation in lockfile
3 participants