Skip to content

Apply migrations based on migrator filename#2369

Draft
h-vetinari wants to merge 2 commits into
conda-forge:mainfrom
h-vetinari:slash_timestamp
Draft

Apply migrations based on migrator filename#2369
h-vetinari wants to merge 2 commits into
conda-forge:mainfrom
h-vetinari:slash_timestamp

Conversation

@h-vetinari

@h-vetinari h-vetinari commented Aug 22, 2025

Copy link
Copy Markdown
Member

Currently the logic of migration application takes into account migrator_ts in two places. For one, the ordering of how different migrators are applied

migration_variants.sort(key=lambda fn_v: (fn_v[1]["migrator_ts"], fn_v[0]))

which has become relevant for how we're sorting the migrations for 3.13t (nogil) and 3.14, due to the way how additional_zip_keys works. The second place is to determine whether migrations are applied (between local ones and the ones from the pinning) or deleted:
elif ts in migrations_in_cfp:

This caused #2367, conda-forge/conda-forge-pinning-feedstock#7686 but more importantly #2368: to base the application of migrations on these timestamps is just very confusing.

And it's also completely unnecessary, because we have a completely consistent model that we already use most of the time: that is, migrations get applied if the migrator file in .ci_support with the filename foo.yaml matches a migrator of that filename in the global pinning. There are overrides based on use_local and migration_number, but that's the baseline. So to avoid the effect of these timestamps on migrator application, just make that choice based on the migrator filename in all cases.

Closes #2368

Very likely also addresses #2336

@h-vetinari

Copy link
Copy Markdown
Member Author

This might actually also solve #2336

Comment on lines +7 to +8
* Avoid using migration timestamps to determine whether migrations should be applied. The logic for ``use_local`` and ``migration_number`` remains unchanged, but migrations are now uniformly applied based on the name of the migrator file, and timestamps are only used to order the application of migrators (#2368).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use both the filename and the timestamp. It's hard to keep track of which filenames were used before, that's why the timestamp was used as a unique identifier.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain how you envision this? To me it still sounds like an inconsistent (or at least very hard-to-follow) mental model. What I'd like is a process that's as simple as:

From what I can tell, we'd never need to care about filenames which occurred in the past, only those that are directly present in the current pinning.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd never need to care about filenames which occurred in the past

Let's say we had a migrator foo.yaml applied in the past and still present in the feedstock. This migrator was closed and deleted in conda-forge-pinning-feedstock.

Later, another migrator with the same name foo.yaml is applied. Then conda-smithy will consider that old foo.yaml still present in the feedstock is the same as the new foo.yaml and use the new migrator as that file is newer and has updates.

@h-vetinari h-vetinari Aug 22, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't that how it should work? The foo.yaml in the global pinning takes precedence, unless overridden by use_local: true.

If the feedstock had been rerendered between the deletion of the old foo.yaml and the creation of the new foo.yaml (both from the POV of the global pinning), the old one would have been deleted in the feedstock anyway. So the fact that it gets replaced with the new one (if it exists) to me sounds like a feature, not a bug

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a feature, yes, but it also means that we have to keep track of what was used before to not accidentally use this feature. My example was about two completely different migrators with the same name foo.yaml and they should not override each other.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The REST API will find up to 4,000 repositories that match your filters and return results from those repositories.

The check would fail on a single hit in exactly the same way as for 10'000 hits. The upper limit is irrelevant for the check I have in mind

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am almost 100% sure the text implies github won't search all of our repositories. It says

To keep the REST API fast for everyone, we limit the number of repositories a query will search through.

They are not limiting the results, but instead the number of repos searched.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your interpretation is incorrect. The limitation is not about searching (which is based on a computed index), but about how many results can (reasonably) be fetched.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It literally says "repositories a query will search through." I don't know how else to parse that very explicit statement about what is limited since it literally says they limit the number of repositories they search.

@h-vetinari h-vetinari Aug 22, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I parse this is that the first part is colloquial, and the second part actually reveals the limitation ("finding", not "searching"; that distinction is irrelevant in common parlance, but not here). That argument is bolstered by the fact that their search API even provides an explicit value for whether the results are complete

Got {
  "total_count": 0,
  "incomplete_results": false,      # <----
  "items": [
  ]
}

In your interpretation, "incomplete_results" would always have to be true in orgs with more than 4000 repos. This is not what's observed in practice.

Finally, my interpretation is consistent with the observation that searching a pre-computed index does not provide a way to tell the query "please stop at X repos" (because the index itself has no notion of individual repos, just an amalgamation of their contents), it only provides a way to limit how many results are returned.

Comment thread conda_smithy/configure_feedstock.py Outdated
@h-vetinari h-vetinari marked this pull request as draft August 22, 2025 12:07
@h-vetinari h-vetinari changed the title Apply migrations based on migrator filename; remove slashes from variant names Apply migrations based on migrator filename Aug 22, 2025
@chrisburr

Copy link
Copy Markdown
Member

@h-vetinari looks like there is still more to fix for windows: conda-forge/pycurl-feedstock#35

But hard to tell from my phone, maybe the comma?

@h-vetinari

Copy link
Copy Markdown
Member Author

But hard to tell from my phone, maybe the comma?

Yeah, that's why I had added 390bdd9 (now merged through #2370)

@h-vetinari h-vetinari force-pushed the slash_timestamp branch 2 times, most recently from bbfa27d to ff9025f Compare August 22, 2025 12:46
@beckermr

Copy link
Copy Markdown
Member

I'll cut a release after this PR is done.

@h-vetinari

Copy link
Copy Markdown
Member Author

I'll cut a release after this PR is done.

I think this PR will have to wait till we have a chance to do something like conda-forge/conda-forge-pinning-feedstock#7691.

If you want I can still do a reduced version of this that switches to determining uniqueness by a tuple of (filename, ts), but getting the timestamp out (my original goal) doesn't seem feasible right now.

@beckermr

Copy link
Copy Markdown
Member

I think actually we can close this pr and merge my pr that ensures timestamps are unique.

@h-vetinari

Copy link
Copy Markdown
Member Author

I think actually we can close this pr and merge my pr that ensures timestamps are unique.

We can merge the timestamp PR. I'll keep this open though - it's still a worthwhile change, just that we can't apply it yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

smithy applies migration not present in feedstock if timestamps match with local migration that should have already been deleted

4 participants