Add `--exclude-flaky` flag to exclude flaky diagnostics from diagnostic summary tables by AlexWaygood · Pull Request #22 · astral-sh/ecosystem-analyzer

AlexWaygood · 2026-03-22T22:35:03Z

Summary

This PR adds an --exclude-flaky flag that excludes flaky diagnostics from the summary table posted as comments on PRs. I don't much care if a PR added 40 invalid-await diagnostics if the diagnostics are on a known-flaky project and all 40 invalid-await diagnostics were in fact detected as being flaky by ecosystem-analyzer; it's annoying to have to click through to the HTML report to be able to see that those 40 diagnostics were in fact all flaky diagnostics on prefect.

We want to know if any diagnostics are newly flaky or newly not flaky -- but Doug's experiments showed that running a flaky project 10 times (what we do in CI for ty PRs) isn't generally enough to be able to establish that. And anyway, we only run flaky detection in PR CI for known-flaky projects.

I propose that we run ecosystem-analyzer in CI on PRs with this new --exclude-flaky flag. The weekly run of ecosystem-analyzer, where we run flaky-diagnostic detection across the whole ecosystem and run with --flaky-runs=20, can continue to be run without this flag, so that we can still have a way of empirically establishing whether any flaky diagnostics were newly added or removed.

This is one of the ideas mentioned in #11

…table Adds `--exclude-flaky/--no-exclude-flaky` to `generate-diff-statistics`. When enabled, flaky diagnostic diffs (added/removed/changed flaky locations) are excluded from the per-lint and per-project summary tables. Stable diagnostic changes from projects that also have flaky data are still counted. Closes #11 (partially — rerun skipping and short-circuiting to follow). https://claude.ai/code/session_01M5AdQL4a1fFa4cngcDsYrM

The notice now also triggers when --exclude-flaky drops flaky diffs from the summary table (not only when the raw diff omits flaky projects), and the wording says "This PR summary" instead of "Raw diff output". https://claude.ai/code/session_01M5AdQL4a1fFa4cngcDsYrM

sharkdp · 2026-03-23T09:21:30Z

Did you see that we already exclude flaky diagnostics from the inline diff? Does this feature here do the same for the HTML report? Should the inline diff feature also be gated on this flag? Or should we just make this the default, if we agree that it's the desired behavior?

AlexWaygood · 2026-03-23T13:09:00Z

Did you see that we already exclude flaky diagnostics from the inline diff?

Yes. But I find it odd that in astral-sh/ruff#24109 (comment), for example, the summary table still says that 40 invalid-await diagnostics are going away even though the raw diff section is empty. That's what this PR is trying to fix.

Should the inline diff feature also be gated on this flag? Or should we just make this the default, if we agree that it's the desired behavior?

Making it the default for the PR comment probably makes sense, yeah. I suppose we probably want these flaky diagnostics included in the summary tables in the HTML report even with this option applied... so I guess it makes sense to remove the flag, and make sure the summary table in the HTML report is given different handling to the summary table in the PR comment?

sharkdp · 2026-03-23T14:36:13Z

Yes. But I find it odd that in astral-sh/ruff#24109 (comment), for example, the summary table still says that 40 invalid-await diagnostics are going away even though the raw diff section is empty. That's what this PR is trying to fix.

That's why I added this hint 😄 "Changes in flaky projects detected. Raw diff output excludes flaky projects; see the HTML report for details."

so I guess it makes sense to remove the flag, and make sure the summary table in the HTML report is given different handling to the summary table in the PR comment?

Yeah, not sure. I also opened a PR this morning that remove a bunch of diagnostics from prefect, and those were actual changes, even though the project is flaky. Since we only have flakiness information on the level of projects (?), I guess it's still valuable to include diagnostics from flaky projects somewhere in the PR comment summary.

But I don't care too much either way.

AlexWaygood · 2026-03-23T14:40:32Z

That's why I added this hint 😄 "Changes in flaky projects detected. Raw diff output excludes flaky projects; see the HTML report for details."

Right, but that only creates more questions for me! If you know they're all flaky (and therefore, you know I won't care about them), why are you showing them to me at all 😆

Yeah, not sure. I also opened a PR this morning that remove a bunch of diagnostics from prefect, and those were actual changes, even though the project is flaky. Since we only have flakiness information on the level of projects (?), I guess it's still valuable to include diagnostics from flaky projects somewhere in the PR comment summary.

What I wanted to implement was that non-flaky diagnostics from flaky project would still be featured in the PR comment summary. And only flaky diagnostics from flaky projects would be excluded from the PR comment summary. I believe that's what the test_exclude_flaky_omits_only_flaky_diffs_from_statistics test on this PR checks?

claude added 2 commits March 22, 2026 22:17

AlexWaygood requested a review from sharkdp March 22, 2026 22:35

sharkdp approved these changes Mar 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `--exclude-flaky` flag to exclude flaky diagnostics from diagnostic summary tables#22

Add `--exclude-flaky` flag to exclude flaky diagnostics from diagnostic summary tables#22
AlexWaygood wants to merge 2 commits intomainfrom
claude/exclude-flaky-diagnostics-OMa3r

AlexWaygood commented Mar 22, 2026 •

edited

Loading

Uh oh!

sharkdp commented Mar 23, 2026

Uh oh!

AlexWaygood commented Mar 23, 2026

Uh oh!

sharkdp commented Mar 23, 2026

Uh oh!

AlexWaygood commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

AlexWaygood commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

sharkdp commented Mar 23, 2026

Uh oh!

AlexWaygood commented Mar 23, 2026

Uh oh!

sharkdp commented Mar 23, 2026

Uh oh!

AlexWaygood commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AlexWaygood commented Mar 22, 2026 •

edited

Loading