Skip to content

feat(reporters): group GitHub summary failures by exception type#5491

Merged
thomhurst merged 4 commits intomainfrom
feat/github-summary-group-failures-by-exception
Apr 10, 2026
Merged

feat(reporters): group GitHub summary failures by exception type#5491
thomhurst merged 4 commits intomainfrom
feat/github-summary-group-failures-by-exception

Conversation

@thomhurst
Copy link
Copy Markdown
Owner

Summary

  • Replaces the flat failure list in the GitHub Actions step summary with failures grouped by exception type (closes feat(reporters): group GitHub summary failures by exception type #5484)
  • Each group is a collapsible <details> section (Collapsible mode) or bold header (Full mode) with a markdown table of affected tests and a "Common error" block
  • Includes timeouts in the Quick Diagnosis one-liner
  • Caps at 50 tests per group with overflow indicator for size safety
  • Other non-passing tests (cancelled, in-progress) remain in a separate table

Test plan

  • 8 new tests covering: grouping by exception type, ordering by count, timeout grouping, collapsible/full styles, common error display, quick diagnosis with timeouts, separation of other non-passing tests
  • All 227 TUnit.Engine.Tests pass (128 succeeded, 99 platform-skipped)

Replace the flat failure list in the GitHub Actions step summary with
failures grouped by exception type. Each group is a collapsible section
(or bold header in Full mode) containing a markdown table of affected
tests and a "Common error" block from the first failure in the group.

- Add FailureEntry record and GetExceptionTypeName helper
- Group failures by exception type, ordered by count descending
- Cap at 50 tests per group with overflow indicator
- Include timeouts in Quick Diagnosis section
- Keep other non-passing tests (cancelled, in-progress) in separate table
- Add 8 new tests covering grouping, ordering, styles, and edge cases
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: feat(reporters): group GitHub summary failures by exception type

This is a well-motivated improvement — grouping failures by exception type makes CI summaries significantly more actionable when many tests share the same root cause. The design is clean and the test coverage is solid.

Issues Found

1. HTML Injection in <pre> Common Error Block (Bug)

File: TUnit.Engine/Reporters/GitHubReporter.cs — new code at line ~384

// Current — unencoded
stringBuilder.AppendLine($"<pre>{firstError}</pre>");

firstError comes from GetError() which returns raw exception messages. Assertion libraries commonly produce messages like:

  • "Expected <null> but got <string>"
  • "Expected value to be <42> but was <0>"

These contain < and > which break the HTML structure when embedded verbatim. GitHub's sanitizer will sometimes help, but it's not reliable — the output becomes malformed HTML in the rendered summary.

Better approach:

stringBuilder.AppendLine($"<pre>{System.Net.WebUtility.HtmlEncode(firstError)}</pre>");

Note: this same issue exists pre-PR in GetDetails()return $"<pre>{GetError(stateProperty)}</pre>";. Worth fixing both, but the PR introduces a new callsite so it's fair to flag here.


2. Non-Deterministic "Common Error" Selection

File: TUnit.Engine/Reporters/GitHubReporter.cs — line ~379

var firstError = entries[0].CommonError;

failureMessages is built by iterating last.Values which is a Dictionary<string, TestNodeUpdateMessage>. Dictionary iteration order in .NET is implementation-defined and not stable across runs. entries[0] is therefore an arbitrary test, not necessarily the most representative one.

The test acknowledges this with:

// The common error is from the first entry in the group (order not guaranteed)
(output.Contains("Object reference not set") || output.Contains("Different message"))
    .ShouldBeTrue(...)

Better approach: Find the most frequently occurring error within the group, which is actually more useful as a "common" error:

var commonError = entries
    .Where(e => !string.IsNullOrWhiteSpace(e.CommonError))
    .GroupBy(e => e.CommonError)
    .OrderByDescending(g => g.Count())
    .FirstOrDefault()
    ?.Key;

This also makes the output stable and makes the label semantically accurate — it's the error that appears most often in the group, not just a random one.


3. No Test for the 50-Test-Per-Group Cap

The PR description specifically mentions "Caps at 50 tests per group with overflow indicator for size safety" but there's no test for this behavior. If the cap logic were accidentally removed or the threshold changed, tests would still pass.

A simple test would be:

[Test]
public async Task AfterRunAsync_Caps_Group_At_50_Tests()
{
    var (reporter, outputFile) = await SetupReporter();
    var messages = Enumerable.Range(1, 55)
        .Select(i => CreateFailedTestMessage(i.ToString(), $"T{i}", "Svc", new NullReferenceException("n")))
        .ToArray();
    await FeedTestMessages(reporter, messages);
    await reporter.AfterRunAsync(1, CancellationToken.None);
    var output = await File.ReadAllTextAsync(outputFile);
    output.ShouldContain("...and 5 more");
}

4. Minor: GetError Could Be Static

GetExceptionTypeName is correctly marked static. GetError has no instance state dependencies either but isn't. For consistency:

private static string? GetError(IProperty? stateProperty) => ...

What's Done Well

  • FailureEntry record is a clean, focused abstraction — replacing the anonymous tuple is the right call.
  • GetExceptionTypeName static helper is well-extracted and reused correctly in both the Quick Diagnosis block and the failure detail section — no duplication.
  • Timeout classification is elegantly handled by including TimeoutTestNodeStateProperty in the failure iteration loop (isFailed check) and returning "Timeout" from GetExceptionTypeName.
  • Test infrastructure (SetupReporter, FeedTestMessages, Create*TestMessage helpers) is solid and makes tests easy to read and extend.
  • Temp file tracking with _tempFiles list and cleanup in [After(Test)] is the right pattern.
  • The collapsible vs. full mode branching is clean — single conditional per group rather than duplicating the whole rendering logic.

The HTML encoding fix is the most important change to address before merging.

@codacy-production
Copy link
Copy Markdown

codacy-production bot commented Apr 10, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 10 complexity

Metric Results
Complexity 10

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

- HTML-encode common error in <pre> block to prevent injection from
  exception messages containing < and > characters
- Select most frequent error message in each group instead of arbitrary
  first entry, making "Common error" semantically accurate and stable
- Add test for the 50-test-per-group cap with overflow indicator
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: feat(reporters): group GitHub summary failures by exception type

Overall this is a well-structured improvement that makes GitHub step summaries much more actionable. The grouping-by-exception-type approach is a good UX decision, and the test coverage is solid. I have a few observations on correctness, design, and performance.


Correctness Issues

SkippedTestNodeStateProperty silently excluded from "Other non-passing tests"

After the refactor, isFailed is defined as:

var isFailed = stateProperty is FailedTestNodeStateProperty or ErrorTestNodeStateProperty
    or TimeoutTestNodeStateProperty;

But SkippedTestNodeStateProperty does not match isFailed, nor does it route into otherMessages — it falls into the else branch. The skipped-tests section is built separately above (from the skipped array), so skipped tests do still appear; however, they will also not appear in the otherMessages table, which is correct. But if a test is skipped and lands in last.Values, it will hit the else branch and get added to otherMessages as "Skipped / Unknown Test State"... wait, actually GetStatus returns "Skipped" and GetDetails returns the explanation, so this is handled correctly. But it's easy to miss — a comment clarifying intentional double-representation (or filtering skipped from last.Values earlier) would improve readability.

GetExceptionTypeName returns "Unknown" for ErrorTestNodeStateProperty with null exception

ErrorTestNodeStateProperty e => e.Exception?.GetType().Name ?? "Unknown",

This is fine semantically, but "Unknown" as a group key means tests with null exceptions get silently lumped together under "Unknown". Consider a more descriptive fallback like "Error (no exception)" or "ErrorTestNodeStateProperty" to aid diagnosis.


Design / Architectural Observations

Double-iteration over failed.Concat(timeout) for quick diagnosis

The quick-diagnosis block (lines ~267–288) iterates failed.Concat(timeout) a second time to produce the summary line, after failureMessages has already been built with the same data. These two code paths are doing essentially the same grouping — once inline, once later in the full rendering loop. Consider extracting the grouped failure data into a local variable that both sections share, to avoid the duplicated LINQ work and keep the grouping logic in one place.

entries.GroupBy(e => e.CommonError) allocates a new grouping per failure group

var commonError = entries
    .Where(e => !string.IsNullOrWhiteSpace(e.CommonError))
    .GroupBy(e => e.CommonError)
    .OrderByDescending(g => g.Count())
    .FirstOrDefault()
    ?.Key;

For the common error "winner", a simple frequency-count with a Dictionary<string, int> would avoid the intermediate grouping allocation and be clearer in intent. This isn't a hot path, but since the project values minimizing allocations:

var freq = new Dictionary<string, int>(StringComparer.Ordinal);
foreach (var e in entries)
    if (!string.IsNullOrWhiteSpace(e.CommonError))
        freq[e.CommonError!] = freq.GetValueOrDefault(e.CommonError!) + 1;
var commonError = freq.Count > 0 ? freq.MaxBy(kv => kv.Value).Key : null;

group.ToList() materializes each group unnecessarily

var entries = group.ToList();
var count = entries.Count;

Since count is needed for the label, the ToList() is necessary to avoid double-enumeration. This is fine. But consider naming it more clearly (groupEntries) given the outer variable group is the LINQ grouping.

maxTestsPerGroup = 50 is a magic constant with no configurability

The PR caps each group at 50 tests and notes this is "for size safety." Given the existing pattern of reading TUNIT_GITHUB_REPORTER_STYLE from the environment, a similar TUNIT_GITHUB_REPORTER_MAX_TESTS_PER_GROUP env var would make this tuneable for large test suites. At minimum the constant deserves a comment explaining the motivation.


Test Observations

#pragma warning disable TPEXP is broad

The #pragma warning disable TPEXP suppresses experimental API warnings for the entire test file. If only a few APIs trigger it, consider disabling it more narrowly around the specific usage site.

CreateCancelledTestMessage suppresses CS0618 with #pragma

This is the right pattern when working with an obsolete API intentionally. The suppress/restore pair is correctly scoped — good.

SetupReporter calls await reporter.IsEnabledAsync() but discards the result

IsEnabledAsync() is called for its side effect (setting _reporterStyle, reading the env var). If the env var isn't set correctly the reporter will silently behave as if disabled. Consider asserting the return value is true in the helper, so a misconfigured test fails loudly.


Minor Nits

  • The FailureEntry record field order (Name, SourceLink, Duration, ExceptionType, CommonError) doesn't match the usage pattern (ExceptionType and CommonError are populated at construction time before Name is rendered). The order isn't wrong, but grouping related fields (ExceptionType, CommonError together then Name, SourceLink, Duration) would be more readable.

  • System.Net.WebUtility.HtmlEncode is used inline without a using alias. Since this file already targets netN multi-targeting, this is fine, but a using static or local alias at the top of the method would reduce line length.

Overall the logic is sound and the feature is a meaningful improvement to the developer experience in GitHub Actions. The main actionable items are the double-iteration concern and the potential for an improved fallback label when the exception type is unknown.

Fail loudly if env vars are misconfigured instead of silently producing
empty output.
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review: feat(reporters): group GitHub summary failures by exception type

Reviewing the third commit (29d4d4d) against the previous two review cycles.


What Was Fixed — Good Work

All three issues from the first review are now addressed:

  • HTML encoding — is in place ✓
  • Non-deterministic common error — now selects the most-frequent message with ✓
  • 50-test cap testAfterRunAsync_Caps_Group_At_50_Tests covers the overflow indicator ✓

From the second review, the SetupReporter assertion is fixed — enabled.ShouldBeTrue(...) means a misconfigured test now fails loudly ✓


Still Open From Previous Reviews

1. Double-Iteration (Architectural — Second Review)

The quick-diagnosis block (line ~266) iterates , selects properties, and groups by exception type — then the rendering section (line ~330) iterates again, selects the same properties, and groups again. Both passes do structurally identical work.

The grouping could be computed once and shared:

grouped

This isn't just a performance concern — it's a correctness concern: if the two passes use slightly different logic, they can produce inconsistent labels. The current code is close enough to be safe, but the two code paths will drift as features are added.

2. Is Not Static (First Review — Minor)

GetExceptionTypeName is correctly . GetError at line 522 accesses no instance state but is declared . The annotation makes the compiler enforce this and documents the intent.

3. Suppresses Entire File (Second Review — Minor)

The file-level suppression means new experimental APIs could be added anywhere in the file without any warning. Narrowing it to the two sites that use experimental APIs (TestNodeUpdateMessage, TestNode constructors in tests, and any experimental property types in the reporter) would make the suppression more honest.


Remaining Nit

Magic constant maxTestsPerGroup = 50 has no comment explaining the rationale (file size cap? readability?). A single-line comment would help future readers understand why 50 and not 100 or 10.


Summary

The correctness issues from the first review are all fixed — the HTML encoding and most-frequent-error selection are good changes. The double-iteration concern remains the most architecturally significant open item; the others are minor. This is close to merge-ready.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review: feat(reporters): group GitHub summary failures by exception type

Reviewing the third commit (29d4d4d) against the previous two review cycles.


What Was Fixed — Good Work

All three issues from the first review are now addressed:

  • HTML encodingSystem.Net.WebUtility.HtmlEncode(commonError) is in place ✓
  • Non-deterministic common error — now selects the most-frequent message with GroupBy...OrderByDescending
  • 50-test cap testAfterRunAsync_Caps_Group_At_50_Tests covers the overflow indicator ✓

From the second review, the SetupReporter assertion is fixed — enabled.ShouldBeTrue(...) means a misconfigured test now fails loudly ✓


Still Open From Previous Reviews

1. Double-Iteration (Architectural — raised in second review)

The quick-diagnosis block (~line 266) iterates failed.Concat(timeout), selects properties, and groups by exception type. The rendering section (~line 330) then iterates last.Values again, selects the same properties, and groups again. Both passes do structurally identical work.

The grouping could be computed once from the already-built failureMessages list and shared with the quick-diagnosis section. This avoids the duplicated property-selection and grouping logic, and more importantly ensures the two sections can never silently diverge if one is updated without the other.

The current flow builds failureMessages after the quick-diagnosis block, which forces the duplication. Reordering to build failureMessages first (early in AfterRunAsync) would allow both sections to share the same data.

2. GetError Is Not Static (raised in first review — minor)

GetExceptionTypeName is correctly static. GetError at line 522 accesses no instance state but is not declared static. Adding static makes the compiler enforce that constraint and documents the intent.

3. #pragma warning disable TPEXP Suppresses Entire File (raised in second review — minor)

The file-level suppression in the test file means new experimental APIs could be added anywhere without generating a warning. Narrowing it to the specific callsites that use experimental APIs would be more precise.


Remaining Nit

const int maxTestsPerGroup = 50 — a one-line comment explaining why 50 (file-size cap? readability threshold?) would help future readers who might otherwise adjust it without understanding the intent.


Overall

The correctness issues from the first review are fully resolved. The double-iteration is the most architecturally significant remaining concern; the others are minor polish. This is close to merge-ready.

- Eliminate double-iteration: build failureMessages before quick
  diagnosis block and derive quick diagnosis from it, instead of
  iterating failed.Concat(timeout) separately with duplicate
  GetExceptionTypeName/grouping logic
- Make GetError static (no instance state accessed)
- Remove unused #pragma warning disable TPEXP (no warnings triggered)
- Add comment explaining maxTestsPerGroup = 50 rationale
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final Review: feat(reporters): group GitHub summary failures by exception type

Reviewing the 4th commit (40fbc04) against all previous review cycles.


All Open Issues Resolved

Every item from the previous three review rounds has been addressed:

Issue Status
HTML-encode common error in <pre> block Fixed in commit 2
Non-deterministic common error selection Fixed in commit 2 (most-frequent wins)
Missing 50-test-cap test Added in commit 2
SetupReporter silent failure on misconfigured env vars Fixed in commit 3
Double-iteration (quick diagnosis + rendering both grouping separately) Fixed in commit 4 — failureMessages built first, shared by both sections
GetError not static Fixed in commit 4
Broad #pragma warning disable TPEXP Removed in commit 4
maxTestsPerGroup = 50 magic constant with no explanation Comment added in commit 4

Final State

The architecture is clean:

  • Single pass collects all failure data into List<FailureEntry>
  • Quick diagnosis and the full grouped rendering both derive from the same list — no divergence risk
  • GetExceptionTypeName and GetError are both correctly static
  • FailureEntry record carries all the fields both sections need (ClassName addition in commit 4)
  • HTML encoding is in place for user-controlled content in the error display block
  • Test coverage is comprehensive and the test helpers are clean

This is ready to merge.

@thomhurst thomhurst disabled auto-merge April 10, 2026 18:24
@thomhurst thomhurst merged commit 81b3e75 into main Apr 10, 2026
13 of 15 checks passed
@thomhurst thomhurst deleted the feat/github-summary-group-failures-by-exception branch April 10, 2026 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(reporters): group GitHub summary failures by exception type

1 participant