Skip to content

workflows/labels: manage stale & merge conflict labels#419481

Merged
wolfgangwalther merged 8 commits intoNixOS:masterfrom
wolfgangwalther:ci-labels-with-search
Jun 24, 2025
Merged

workflows/labels: manage stale & merge conflict labels#419481
wolfgangwalther merged 8 commits intoNixOS:masterfrom
wolfgangwalther:ci-labels-with-search

Conversation

@wolfgangwalther
Copy link
Contributor

@wolfgangwalther wolfgangwalther commented Jun 24, 2025

To manage 2.status: stale and 2.status: merge conflict labels, we need to restructure the code a bit: The "list all PRs and stop iterating after updated_at cutoff" approach won't work anymore, because we also need to look at much older PRs.

I thought about being smart and trying to filter out pull requests which certainly won't change, for example when they already have a merge conflict label or when they are already stale. But this quickly becomes quite complicated, considering all edge-cases. Thus, I opted for a simpler approach: We now scan all PRs twice a day. We still operate on the last updated PRs every 10 minutes, that doesn't change. But we additionally operate on a batch of 100 "other" PRs. This batch is rotated through across the 144 workflow runs each day. 7k PRs / 100 per page = 70 pages ~ 2x / day.

To label a merge conflict, the PR needs to be touched twice: The first time around the API GET request will trigger the creation of a new test merge commit. The second time around, this will have completed, the merge_commit_sha state be updated and the label applied. This means that stale labels are updated within 12 hours and merge conflict labels within 24 hours. Which is more than OK.

Of course, this will use a lot more API requests. My estimation is in the range of 3-4k / hour now. But this is not a problem at all, because with #419188 we have about 11k requests / hour to spare without any impact on other jobs. In fact, the "scan all PRs" approach has one more benefit: We don't need to manually backfill anything, the labeling will just fix those labels within 24 hours. Thus, I removed the "update within" input from workflow_dispatch - this trigger is now only useful from forks for testing.

With #419188 and this, the labeler should become much more hands-off and just do its thing. (Not that it was much hands-on so far anyway...).

To make those changes, I had to make a few refactors, which caused huge indentation changes. I tried to split them into separate commits, so that reviewing the PR with "hide whitespace" should work reasonably well. Don't even try to look at this PR with whitespace changes visible and/or at the full diff of all commits...

Refactors:

  • Using the /search/issues endpoint doesn't allow to find "a single PR", so the "pull request event" case is now handled differently.
  • Older PRs might not have the right workflow runs or artifacts, but still need to label. Handling this case gracefully now, which also works a bit towards Workflow "Label PR" fails for old PRs ("Artifact has expired") #416719.

Open items:

  • Some PRs might be hit twice in the same run, when they are both on the current page and recently updated.
  • Potentially for later: Issues should also be labeled as stale, so extend the logic to handle issues, too?

Note: The settings for stale labeling are taken from .github/stale.yml. It seems like the stale bot has not been working for some time anymore, at least https://github.com/probot/stale is archived and the app seems to be disabled.

Assuming this works well, this should close #100477.

Things done


Add a 👍 reaction to pull requests you find important.

@nix-owners nix-owners bot requested review from Mic92, philiptaron and zowoq June 24, 2025 07:50
@nixpkgs-ci nixpkgs-ci bot added 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux. 6.topic: continuous integration Affects continuous integration (CI) in Nixpkgs, including Ofborg and GitHub Actions 6.topic: policy discussion Discuss policies to work in and around Nixpkgs backport release-24.11 labels Jun 24, 2025
@MattSturgeon
Copy link
Contributor

Some PRs might be hit twice in the same run, when they are both on the current page and recently updated.

One approach would be to remember which PR numbers this run has looked at in a Set, Array, or Object; i.e. when either the "recent" or "backlog" code path starts work on a PR, it does its work within a java-style visitedPullRequests.computeIfAbsent(), or simply after an includes() check.

If checking for presence in the set and adding to the set aren't done in one atomic transaction, there's still potential for a race condition... But I guess that's pretty unlikely, and not too important; the worst that'd happen is a PR is checked twice.

@wolfgangwalther
Copy link
Contributor Author

wolfgangwalther commented Jun 24, 2025

One approach would be to remember which PR numbers this run has looked at in a Set, Array, or Object

This would work, but it could also be done without race condition: With the new approach, we don't need to iterate over the response in steps and eventually call done() - the search will return only those PRs that we want to look at anyway. Thus, I am fetching the whole search response at once before. At this stage, we could also filter out duplicates, before we even start looking at them.

I also considered an alternative: The list-of-updated PRs contains this search query updated:>=${cutoff.toISOString()}. It should be enough to add the reverse of that condition to the allOptions query, so that the big list doesn't even contain the duplicates to start with. But this would cause the big list to change more frequently from run-to-run and thus make it more likely to miss some PRs when going from page-to-page in different runs. This problem still exists when older PRs are merged or closed, the whole list will shift a bit and thus some PRs will be lost in the current iteration. But merging/closing happens on a much smaller scale than "every update", so I wouldn't want to do it the filter way.


Edit: I think something like this should do it:

               const items = [].concat(updatedItems, allItems)
+                .filter((thisItem, idx, arr) => idx == arr.findIndex(firstItem => firstItem.number == thisItem.number))

@wolfgangwalther
Copy link
Contributor Author

Edit: I think something like this should do it:

               const items = [].concat(updatedItems, allItems)
+                .filter((thisItem, idx, arr) => idx == arr.findIndex(firstItem => firstItem.number == thisItem.number))

Tested, this works well. Added it locally, but will only push after first round of review to save CI resources.

@MattSturgeon
Copy link
Contributor

But this would cause the big list to change more frequently from run-to-run and thus make it more likely to miss some PRs when going from page-to-page in different runs. This problem still exists when older PRs are merged or closed, the whole list will shift a bit and thus some PRs will be lost in the current iteration.

I was going to hold off asking until I'd looked at the diff... But I was curious how you were handling pagination here. Especially given we're dealing with a moving target and between each run it is likely that more PRs were "updated", therefore shifting their position in search results and changing whether they match the search query.

@wolfgangwalther
Copy link
Contributor Author

I was going to hold off asking until I'd looked at the diff... But I was curious how you were handling pagination here. Especially given we're dealing with a moving target and between each run it is likely that more PRs were "updated", therefore shifting their position in search results and changing whether they match the search query.

Yeah, to counter the effect as much as possible the big list is sorted by created in ascending order. That should remove all update-related changes, but as mentioned earlier, when is:open changes, that will change the list.

@wolfgangwalther
Copy link
Contributor Author

wolfgangwalther commented Jun 24, 2025

An alternative, to counter that problem a bit, would be to create some kind of overlap. For example instead of creating batches of 100 and running a single one, we could make batches of 25 and then run 5 batches instead of 4 per run, with one of them overlapping. This should then catch those PRs that would have moved out of the current batch, because some PRs created earlier were merged in the meantime.

Edit: Or maybe not even in those smaller batches, but still 100. And then run the previous page, too, overlapping. This would have the effect that merge conflicts would be labeled in 12 hours instead of 24, because any PR will be looked at after 10 minutes again - when the merge commit has been computed. Might not be the worst thing to do.

@MattSturgeon
Copy link
Contributor

Sounds reasonable. I wonder how much overlap buffer we need for this to be reliable?

@wolfgangwalther
Copy link
Contributor Author

Sounds reasonable. I wonder how much overlap buffer we need for this to be reliable?

If we do the simple approach of just running two pages with 100 items, so overlap=100, then we should have much more than needed, I think.

In general, the effect of a merged PR is bigger the older the PR is, because it shifts more pages. But it's also much less likely for a PR that is really old to be merged. I'd say the effect is negligible for PRs in the stale-range. It only becomes relevant for more recent PRs. But those don't need to be marked stale - and they are also less likely to be merge-conflicted.

I'd say we try it without overlap at first, see how many API requests this burns, see whether anybody notices any strange effect for more recent PRs - and only if this is actually relevant, introduce some overlap.

Copy link
Contributor

@MattSturgeon MattSturgeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 28458b3f8dfb58db9cebfa36f50601ae77c4a3eb's commit messages you mention we need to iterate over all PRs for the stale and conflict labels:

the much bigger one in being able to handle merge-conflict and stale labels properly later. For those, it's inevitable to eventually scan through all PRs.

This may be true for conflicts, because I'm not sure you can easily search for them. Even if you could, your comment here would apply:

The attribute merge_commit_sha keeps the old value of null or the hash until the new test merge commit has either successfuly been created or failed so.

For stale issues/PRs, this could also be achieved with the inverse search to our "recently updated" search; i.e. a "not updated since" search. It could even be a search for "not updated since XYZ and also doesn't have stable label" to only get the relevant results.

This probably isn't necessary, because by their nature it isn't important to mark stale items as stale immediately, but just pointing out that it is technically possible.

Even so, would handling adding the stable label separately simplify the main labeling step? I guess the main step would still need to remove the stale label, so maybe not...


I'd say we try it without overlap at first, see how many API requests this burns, see whether anybody notices any strange effect for more recent PRs - and only if this is actually relevant, introduce some overlap.

SGTM. Maybe add brief comments to the relevant parts of the code?

@wolfgangwalther
Copy link
Contributor Author

For stale issues/PRs, this could also be achieved with the inverse search to our "recently updated" search; i.e. a "not updated since" search. It could even be a search for "not updated since XYZ and also doesn't have stable label" to only get the relevant results.

This probably isn't necessary, because by their nature it isn't important to mark stale items as stale immediately, but just pointing out that it is technically possible.

Even so, would handling adding the stable label separately simplify the main labeling step? I guess the main step would still need to remove the stale label, so maybe not...

The problem is, that this simple search doesn't quite work: updated will also be changed when we add the stale label. That's why it's important to look at the timeline of events for the actually stale-relevant timestamp. We can't search for that. And this can very well be older than the updated, so we must look at all PRs, at least those not labeled stale already.

So essentially, we could only filter out those PRs that are already merge conflicted and stale. "only" is relative of course, because that's still a huge number. The problem is, we probably still need to run at least one go across all PRs, because the labels might be incorrectly set right now. Also, if we introduce new labels, we want them to appear on all PRs if possible, even those marked that way already.

Also consider the case of somebody adding the stale label manually - we'd suddenly skip looking at the PR, even though this PR might not actually be stale according to our rules at all.

So yeah, I think we really need to look at all PRs, even for stale.

Separate commit for better diff.
…ntext

We previously ran another list request in this case, but don't need to
anymore - we already have the `pull_request` context available.
This doesn't provide much value in itself, yet, but is much more
flexible in the next step, when also looking at much older PRs.
This replaces the manual dispatch trigger with a batched run through all
pull requests every day. This has the small benefit of not having to
worry about backfilling labeling after fixing bugs - and the much bigger
one in being able to handle merge-conflict and stale labels properly
later. For those, it's inevitable to eventually scan through all PRs.

At this stage, the vast majority of PRs will still be skipped, because
there won't be an eval run with artifact available. This will be
improved in the next step.

Technically, the workflow_dispatch trigger is kept to allow easily
testing this in forks, where the scheduled jobs are disabled. The
triggered job will behave similar to the scheduled job, though, and have
no special inputs.
We keep working through the PR, even though we don't have any eval
results. This will allow actually managing labels for much older PRs as
well. Most importantly, it will allow merge-conflict and stale-labeling
next.
This manages the `2. status: stale` label for pull requests only (not
issues, yet) with the following conditions:
- The last event on the timeline of the Pull Request counts.
- Labeling and unlabeling of any kind are ignored.
- Older than 180 days are stale.
- Security labeled PRs are never stale.

To handle this label correctly, it's important to go through all pull
requests. Any approach to limit the list of PRs via search are not going
to work:
- Filtering by `updated` is not going to work, because it includes the
last time that *a label was set* on the PR. To actually find out whether
a PR is stale or not, the timeline of events needs to be looked at.
- Filtering by an existing stale label is not going to work either,
because such a label might have been added manually and thus breaking
the rules we set up here. Thus any existing label needs to be confirmed
as well.
The code comments describe much better what we do then a commit message
could ever do.
Copy link
Contributor Author

@wolfgangwalther wolfgangwalther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thansk for the great, thorough review (as always!). I addressed all your comments.

@wolfgangwalther wolfgangwalther force-pushed the ci-labels-with-search branch from 8d857cb to 36e9fe9 Compare June 24, 2025 12:48
@wolfgangwalther
Copy link
Contributor Author

wolfgangwalther commented Jun 24, 2025

I also gave it another test-run in my fork, still works, didn't break anything.

Potentially for later: Issues should also be labeled as stale, so extend the logic to handle issues, too?

Certainly for later, because it will increase the number of items to look at massively. Especially initially this might be relevant, because:

  • On the first few iterations, we are probably going to change labels on hundreds of PRs.
  • Those will appear in the "updated" category again on the next run.

So until we are through the whole thing twice (because of merge-conflict labels, too), this will run a lot more PRs each run. This should still fit in easily with the resource limits, but if we take issues initially as well... then it might get a lot closer to the limit.

Edit: Also.. when we go through all PRs, I wonder how many more edge-cases we're going to hit that we didn't account for, yet :D

@MattSturgeon
Copy link
Contributor

  • Those will appear in the "updated" category again on the next run.

I hadn't considered that. So for the first several runs the "recently updated" results will mostly consist of PRs updated by the prior run, instead of PRs updated by contributors.

This shouldn't be a problem, per se... We don't limit how many "recent" PRs we will process, so it shouldn't leave any PRs un-labelled. It just means the number of pages each run has to iterate through will be higher than usual.

Copy link
Contributor

@MattSturgeon MattSturgeon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diff LGTM.

🤞 we haven't missed any important edge-cases.
As always, let's be ready to revert if everything suddenly 💥

@wolfgangwalther wolfgangwalther merged commit 802980b into NixOS:master Jun 24, 2025
44 of 46 checks passed
@wolfgangwalther wolfgangwalther deleted the ci-labels-with-search branch June 24, 2025 13:39
@nixpkgs-ci
Copy link
Contributor

nixpkgs-ci bot commented Jun 24, 2025

Successfully created backport PR for release-24.11:

@nixpkgs-ci
Copy link
Contributor

nixpkgs-ci bot commented Jun 24, 2025

Successfully created backport PR for release-25.05:

@github-actions github-actions bot added the 8.has: port to stable This PR already has a backport to the stable release. label Jun 24, 2025
@wolfgangwalther
Copy link
Contributor Author

Stale labeling doesn't work as expected, yet in #238561. This should not be stale!

@wolfgangwalther
Copy link
Contributor Author

Merge conflict was properly set in #415587.

Notice: Processed 112 PRs, made 872 API requests and downloaded 103 artifacts.

This will be more in the range of 5k+ requests / hour - but still fine, I guess.

@wolfgangwalther
Copy link
Contributor Author

Uh, there's a bigger problem here:

Error: Unhandled error: HttpError: Only the first 1000 search results are available

I don't see a quick fix (if at all) for that, so reverting now...

@MattSturgeon
Copy link
Contributor

Stale labeling doesn't work as expected, yet in #238561. This should not be stale!

Would be interesting to know what latest_event_at was bound to for that PR...

Inspecting locally

gh api /repos/NixOS/nixpkgs/issues/238561/timeline

Produces: events.json

  • The latest event is your cross-reference
  • Then labeled created_at 2025-06-24T13:44:50Z by nixpkgs-ci[bot] (from the run)
  • Then another labeled created_at 2025-06-24T13:26:45Z by nixpkgs-ci[bot]
  • Then a reviewed; this doesn't have created_at, but instead has "submitted_at": "2025-06-24T13:10:35Z"

@wolfgangwalther
Copy link
Contributor Author

Inspecting locally

gh api /repos/NixOS/nixpkgs/issues/238561/timeline

Aha!

I was looking at /repos/NixOS/nixpkgs/issues/238561/events and didn't see anything that didn't have created_at. That explains it, then :)

@wolfgangwalther
Copy link
Contributor Author

Merge conflict was properly set in #415587.

Apparently the merge conflict label didn't work when opening a PR, though, as can be seen in #419574.

💥

@mdaniels5757

This comment was marked as outdated.

@mdaniels5757
Copy link
Member

Wait but #419600 still has no rebuild labels?

@wolfgangwalther
Copy link
Contributor Author

Wait but #419600 still has no rebuild labels?

That's indeed odd. Will look into it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

6.topic: continuous integration Affects continuous integration (CI) in Nixpkgs, including Ofborg and GitHub Actions 6.topic: policy discussion Discuss policies to work in and around Nixpkgs 8.has: port to stable This PR already has a backport to the stable release. 10.rebuild-darwin: 0 This PR does not cause any packages to rebuild on Darwin. 10.rebuild-linux: 0 This PR does not cause any packages to rebuild on Linux.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Stale bot doesn't always remove the label after a comment

3 participants