Skip to content

Add --exclude-flaky flag to exclude flaky diagnostics from diagnostic summary tables#22

Open
AlexWaygood wants to merge 2 commits intomainfrom
claude/exclude-flaky-diagnostics-OMa3r
Open

Add --exclude-flaky flag to exclude flaky diagnostics from diagnostic summary tables#22
AlexWaygood wants to merge 2 commits intomainfrom
claude/exclude-flaky-diagnostics-OMa3r

Conversation

@AlexWaygood
Copy link
Copy Markdown
Member

@AlexWaygood AlexWaygood commented Mar 22, 2026

Summary

This PR adds an --exclude-flaky flag that excludes flaky diagnostics from the summary table posted as comments on PRs. I don't much care if a PR added 40 invalid-await diagnostics if the diagnostics are on a known-flaky project and all 40 invalid-await diagnostics were in fact detected as being flaky by ecosystem-analyzer; it's annoying to have to click through to the HTML report to be able to see that those 40 diagnostics were in fact all flaky diagnostics on prefect.

We want to know if any diagnostics are newly flaky or newly not flaky -- but Doug's experiments showed that running a flaky project 10 times (what we do in CI for ty PRs) isn't generally enough to be able to establish that. And anyway, we only run flaky detection in PR CI for known-flaky projects.

I propose that we run ecosystem-analyzer in CI on PRs with this new --exclude-flaky flag. The weekly run of ecosystem-analyzer, where we run flaky-diagnostic detection across the whole ecosystem and run with --flaky-runs=20, can continue to be run without this flag, so that we can still have a way of empirically establishing whether any flaky diagnostics were newly added or removed.

This is one of the ideas mentioned in #11

claude added 2 commits March 22, 2026 22:17
…table

Adds `--exclude-flaky/--no-exclude-flaky` to `generate-diff-statistics`.
When enabled, flaky diagnostic diffs (added/removed/changed flaky
locations) are excluded from the per-lint and per-project summary tables.
Stable diagnostic changes from projects that also have flaky data are
still counted.

Closes #11 (partially — rerun skipping and short-circuiting to follow).

https://claude.ai/code/session_01M5AdQL4a1fFa4cngcDsYrM
The notice now also triggers when --exclude-flaky drops flaky diffs from
the summary table (not only when the raw diff omits flaky projects), and
the wording says "This PR summary" instead of "Raw diff output".

https://claude.ai/code/session_01M5AdQL4a1fFa4cngcDsYrM
@AlexWaygood AlexWaygood requested a review from sharkdp March 22, 2026 22:35
@sharkdp
Copy link
Copy Markdown
Collaborator

sharkdp commented Mar 23, 2026

Did you see that we already exclude flaky diagnostics from the inline diff? Does this feature here do the same for the HTML report? Should the inline diff feature also be gated on this flag? Or should we just make this the default, if we agree that it's the desired behavior?

@AlexWaygood
Copy link
Copy Markdown
Member Author

Did you see that we already exclude flaky diagnostics from the inline diff?

Yes. But I find it odd that in astral-sh/ruff#24109 (comment), for example, the summary table still says that 40 invalid-await diagnostics are going away even though the raw diff section is empty. That's what this PR is trying to fix.

Should the inline diff feature also be gated on this flag? Or should we just make this the default, if we agree that it's the desired behavior?

Making it the default for the PR comment probably makes sense, yeah. I suppose we probably want these flaky diagnostics included in the summary tables in the HTML report even with this option applied... so I guess it makes sense to remove the flag, and make sure the summary table in the HTML report is given different handling to the summary table in the PR comment?

@sharkdp
Copy link
Copy Markdown
Collaborator

sharkdp commented Mar 23, 2026

Yes. But I find it odd that in astral-sh/ruff#24109 (comment), for example, the summary table still says that 40 invalid-await diagnostics are going away even though the raw diff section is empty. That's what this PR is trying to fix.

That's why I added this hint 😄 "Changes in flaky projects detected. Raw diff output excludes flaky projects; see the HTML report for details."

so I guess it makes sense to remove the flag, and make sure the summary table in the HTML report is given different handling to the summary table in the PR comment?

Yeah, not sure. I also opened a PR this morning that remove a bunch of diagnostics from prefect, and those were actual changes, even though the project is flaky. Since we only have flakiness information on the level of projects (?), I guess it's still valuable to include diagnostics from flaky projects somewhere in the PR comment summary.

But I don't care too much either way.

@AlexWaygood
Copy link
Copy Markdown
Member Author

That's why I added this hint 😄 "Changes in flaky projects detected. Raw diff output excludes flaky projects; see the HTML report for details."

Right, but that only creates more questions for me! If you know they're all flaky (and therefore, you know I won't care about them), why are you showing them to me at all 😆

Yeah, not sure. I also opened a PR this morning that remove a bunch of diagnostics from prefect, and those were actual changes, even though the project is flaky. Since we only have flakiness information on the level of projects (?), I guess it's still valuable to include diagnostics from flaky projects somewhere in the PR comment summary.

What I wanted to implement was that non-flaky diagnostics from flaky project would still be featured in the PR comment summary. And only flaky diagnostics from flaky projects would be excluded from the PR comment summary. I believe that's what the test_exclude_flaky_omits_only_flaky_diffs_from_statistics test on this PR checks?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants