Skip to content

fix: catch unhandled exceptions in ExecuteRequestAsync to prevent IDE RPC crashes#5271

Merged
thomhurst merged 4 commits intomainfrom
fix/before-session-hook-exception-handling
Mar 27, 2026
Merged

fix: catch unhandled exceptions in ExecuteRequestAsync to prevent IDE RPC crashes#5271
thomhurst merged 4 commits intomainfrom
fix/before-session-hook-exception-handling

Conversation

@thomhurst
Copy link
Copy Markdown
Owner

Summary

  • Stop re-throwing exceptions from ExecuteRequestAsync so they don't propagate to the MTP host and crash Rider's JSON-RPC channel (the error is still logged and reported via ReportUnhandledException)
  • The session now closes with IsSuccess = false via a unified SessionFailed flag (replaces the old AfterSessionHooksFailed)

Closes #5263

Test plan

  • TUnit.Engine.Tests pass (212 total, 114 succeeded, 98 skipped)
  • Manual: throw an exception in a [Before(TestSession)] hook, run from Rider — verify no JsonRpcInternalErrorException and the error is visible

… RPC crashes

Stop re-throwing exceptions from ExecuteRequestAsync so they don't
propagate to the MTP host and crash Rider's JSON-RPC channel. The error
is still logged and reported via ReportUnhandledException, and the
session now closes with IsSuccess = false.

Unified BeforeSessionHooksFailed/AfterSessionHooksFailed into a single
SessionFailed flag since both are only checked together.

Closes #5263
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Summary

This is a well-targeted fix for a real-world IDE integration issue. Swallowing the exception in and routing the error through is the correct MTP pattern — publishing an message to the bus is far preferable to letting the exception bubble up and break Rider's JSON-RPC channel.


What's Good

  • Root cause is correctly identified. MTP's contract is that failures should be reported via the message bus, not via exceptions from . The previous was violating this contract.
  • ** is already in place** — the exception is not silently discarded; it's published as an node, which is exactly how MTP surfaces unhandled failures to IDEs and CI.
  • The rename → is accurate. The old name was misleading since the flag is now also set by the catch in (which can catch failures far earlier than "after session hooks").
  • The comment on the line is a useful breadcrumb for future maintainers.

One Thing Worth Considering

The cancellation-exception branch (lines 70-84) still ends with :

catch (Exception e) when (IsCancellationException(e))
{
    // ...log...
    throw;  // still propagates
}

If Rider's JSON-RPC channel is sensitive to any exception escaping , this path could trigger the same crash on cancellation. It's possible MTP handles specially at the host level (many async frameworks do), but it may be worth verifying manually against Rider when cancellation occurs mid-run.

If MTP does not treat cancellation specially, the same fix should be applied here:

catch (Exception e) when (IsCancellationException(e))
{
    var msg = context.CancellationToken.IsCancellationRequested
        ? "The test run was cancelled."
        : "Test execution stopped due to fail-fast.";
    await GetOrCreateServiceProvider(context).Logger.LogErrorAsync(msg);
    serviceProvider.SessionFailed = true;  // don't re-throw
}

This is worth a conscious decision either way — if you've confirmed MTP/Rider handles gracefully, a short comment to that effect would help.


Minor Nit

is called twice in the cancellation path (lines 75 and 80) but the result is discarded both times. This is pre-existing code, but since you're already touching nearby lines, caching it in a local variable would be a cheap cleanup:

var sp = GetOrCreateServiceProvider(context);
await sp.Logger.LogErrorAsync(context.CancellationToken.IsCancellationRequested
    ? "The test run was cancelled."
    : "Test execution stopped due to fail-fast.");

Verdict

Approved as-is. The core fix is correct and the rename improves clarity. The cancellation-throw question is the only thing worth a follow-up look, but it doesn't block this PR.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Summary

This is a well-targeted fix for a real-world IDE integration issue. Swallowing the exception in ExecuteRequestAsync and routing the error through ReportUnhandledException is the correct MTP pattern — publishing an ErrorTestNodeStateProperty message to the bus is far preferable to letting the exception bubble up and break Rider's JSON-RPC channel.


What's Good

  • Root cause is correctly identified. MTP's contract is that failures should be reported via the message bus, not via exceptions from ExecuteRequestAsync. The previous throw was violating this contract.
  • ReportUnhandledException is already in place — the exception is not silently discarded; it's published as an ErrorTestNodeStateProperty node, which is exactly how MTP surfaces unhandled failures to IDEs and CI.
  • The rename AfterSessionHooksFailedSessionFailed is accurate. The old name was misleading since the flag is now also set by the catch in ExecuteRequestAsync (which can catch failures far earlier than "after session hooks").
  • The comment on the // Do NOT re-throw line is a useful breadcrumb for future maintainers.

One Thing Worth Considering

The cancellation-exception branch (lines 70-84) still ends with throw;:

catch (Exception e) when (IsCancellationException(e))
{
    // ...log...
    throw;  // still propagates
}

If Rider's JSON-RPC channel is sensitive to any exception escaping ExecuteRequestAsync, this path could trigger the same crash on cancellation. It's possible MTP handles OperationCanceledException specially at the host level (many async frameworks do), but it may be worth verifying manually against Rider when cancellation occurs mid-run.

If MTP does not treat cancellation specially, the same fix could apply here too — set SessionFailed = true instead of re-throwing. Either way, a short comment confirming the intent would help future readers.


Minor Nit

GetOrCreateServiceProvider(context) is called twice in the cancellation path (lines 75 and 80) but the result is discarded both times. This is pre-existing, but since you're already nearby, caching it in a local variable would be a cheap cleanup.


Verdict

Approved as-is. The core fix is correct and the rename improves clarity. The cancellation-throw question is the only thing worth a follow-up look, but it doesn't block this PR.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review

The minor nit from the previous review has been addressed — the second commit () correctly deduplicates the call in the cancellation handler. Good cleanup.

Still-open question: cancellation path

The primary concern from the previous review remains unaddressed. The cancellation catch block still re-throws:

catch (Exception e) when (IsCancellationException(e))
{
    var message = context.CancellationToken.IsCancellationRequested
        ? "The test run was cancelled."
        : "Test execution stopped due to fail-fast.";
    await GetOrCreateServiceProvider(context).Logger.LogErrorAsync(message);

    throw;  // ← could this also crash Rider's JSON-RPC channel?
}

This doesn't need to block the PR, but it would be worth one of:

  1. A code comment confirming that MTP/Rider handles OperationCanceledException specially and the re-throw is safe — this makes the intentional asymmetry explicit for future maintainers.
  2. A follow-up issue if it hasn't been tested manually against Rider during cancellation.

The asymmetry between the two catch blocks (one swallows + sets SessionFailed, one re-throws) will be confusing to the next person reading this file unless the reasoning is documented inline.

Everything else

The core fix is correct, the rename from AfterSessionHooksFailed to SessionFailed is a clear improvement, and ReportUnhandledException ensures the error is still surfaced to the user. Approved as-is.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review

The minor nit from the previous review has been addressed — the second commit deduplicates the GetOrCreateServiceProvider(context) call in the cancellation handler. Good cleanup.

Still-open question: cancellation path

The primary concern from the previous review remains unaddressed. The cancellation catch block still re-throws:

catch (Exception e) when (IsCancellationException(e))
{
    // ...log message...
    throw;  // could this also crash Rider's JSON-RPC channel?
}

This doesn't need to block the PR, but it would be worth one of:

  1. A code comment confirming that MTP/Rider handles OperationCanceledException specially and the re-throw is safe — this makes the intentional asymmetry explicit for future maintainers.
  2. A follow-up issue if it hasn't been tested manually against Rider during cancellation.

The asymmetry between the two catch blocks (one swallows + sets SessionFailed, one re-throws) will be confusing to the next person reading this file unless the reasoning is documented inline.

Everything else

The core fix is correct, the rename from AfterSessionHooksFailed to SessionFailed is a clear improvement, and ReportUnhandledException ensures the error is still surfaced to the user. Approved as-is.

Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR fixes a real bug (Rider JSON-RPC crash when [Before(TestSession)] throws) cleanly and correctly. Overall the changes are well-reasoned and the code is in good shape.

What works well

  • Swallowing the exception correctly: The original re-throw was the root cause of the Rider crash. The fix of catching, logging, reporting via ReportUnhandledException, and marking SessionFailed = true is the right approach — the error is still fully surfaced to the user without crashing the transport layer.

  • SessionFailed rename: Unifying AfterSessionHooksFailed (which was only set by TestSessionCoordinator) with the broader concept of any failure in ExecuteRequestAsync is a good abstraction. The old name was misleading since the flag is now also set for before-session failures and unhandled exceptions outside of hooks.

  • Cancellation path cleanup: The ternary refactor and deduplication of GetOrCreateServiceProvider in the cancellation handler is a nice, low-risk improvement.

One thing worth considering

GetOrCreateServiceProvider called inside the catch blocks (lines 75 and 82 in the new code):

catch (Exception e) when (IsCancellationException(e))
{
    await GetOrCreateServiceProvider(context).Logger.LogErrorAsync(message); // new service provider if try threw early
    throw;
}
catch (Exception e)
{
    var serviceProvider = GetOrCreateServiceProvider(context); // same issue
    ...
}

If an exception is thrown early in the try block — before GetOrCreateServiceProvider completes — the catch handlers call it again. Since TUnitServiceProvider's constructor is heavy (wires up most of the engine), a second invocation on an already-partially-set-up session could have unexpected side effects, and if the constructor itself was the source of the exception, both catch handlers would also throw, losing the original error.

A safer pattern would hoist the service provider reference:

TUnitServiceProvider? serviceProvider = null;
try
{
    serviceProvider = GetOrCreateServiceProvider(context);
    ...
}
catch (Exception e) when (IsCancellationException(e))
{
    await (serviceProvider?.Logger ?? /* fallback logger */).LogErrorAsync(message);
    throw;
}
catch (Exception e)
{
    await (serviceProvider?.Logger ?? /* fallback logger */).LogErrorAsync(e);
    ...
    serviceProvider?.SessionFailed = true; // only set if we had one
}

This avoids re-running the factory and makes the failure mode explicit. Whether this edge case is realistic depends on how often TUnitServiceProvider..ctor can fail — if it's essentially infallible in practice, the current approach is fine as a pragmatic simplification.

Summary

The fix is correct and the SessionFailed unification is a genuine improvement. The above is a minor defensive concern rather than a blocking issue. Happy to see this merged.

@thomhurst thomhurst enabled auto-merge (squash) March 27, 2026 19:46
Copy link
Copy Markdown
Contributor

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Review (Round 4)

The core fix continues to be correct and the code is in good shape. Two items have been carried across previous review rounds — here is the current status of each.


Item 1: Cancellation path comment (previously raised ×3, now partially addressed)

The comment added on line 77 is a direct response to the previous reviews:

// Re-throw is safe here — MTP handles OperationCanceledException specially.
throw;

This is the minimum viable answer to the question raised in every prior review. The asymmetry between the two catch blocks is now documented inline, which was exactly what was asked for. The concern is resolved at the documentation level.

If the assertion in that comment is ever found to be wrong (i.e., Rider does crash on a cancelled run), the fix is the same pattern used in the non-cancellation catch block. Nothing blocks the PR on this.


Item 2: GetOrCreateServiceProvider called inside catch blocks (raised in the last review, not yet addressed)

This is the one remaining structural concern. Lines 75 and 82 both call GetOrCreateServiceProvider(context) inside catch handlers:

// cancellation handler (line 75)
await GetOrCreateServiceProvider(context).Logger.LogErrorAsync(message);

// general catch (line 82)
var serviceProvider = GetOrCreateServiceProvider(context);

Meanwhile, the try block already calls GetOrCreateServiceProvider on line 54 and stores it in a local variable. The catch handlers discard that variable and call the method again.

The actual risk is narrow but real: GetOrCreateServiceProvider uses ConcurrentDictionary.GetOrAdd, so if the try-block call completed successfully before the exception was thrown (the common case — the session was created, then something later threw), the catch will just return the same already-constructed instance from the dictionary, which is safe. However, if the exception was thrown during GetOrCreateServiceProvider itself (i.e., during TUnitServiceProvider's constructor), the catch handlers will retry that same failing constructor. The constructor is large (it wires up almost the entire engine graph), and a second invocation on a partial-state session could produce a misleading secondary failure that buries the original exception.

The straightforward fix is to hoist the variable outside the try/catch:

TUnitServiceProvider? serviceProvider = null;
try
{
    serviceProvider = GetOrCreateServiceProvider(context);
    // ... rest of try body using serviceProvider ...
}
catch (Exception e) when (IsCancellationException(e))
{
    await (serviceProvider?.Logger.LogErrorAsync(message) ?? Task.CompletedTask);
    throw;
}
catch (Exception e)
{
    if (serviceProvider is not null)
    {
        await serviceProvider.Logger.LogErrorAsync(e);
        await ReportUnhandledException(context, e);
        serviceProvider.SessionFailed = true;
    }
    else
    {
        // ServiceProvider construction itself failed — log via fallback or swallow
    }
}

This also has the benefit of making it clear what state is available in each catch handler.

Whether this is worth fixing now or in a follow-up is a judgment call — the failure-during-constructor path is unlikely in practice. But because this PR is explicitly about making error-handling more robust, it's worth calling out.


Everything else

  • The SessionFailed rename is correct and an improvement over the narrow AfterSessionHooksFailed.
  • TestSessionCoordinator.cs correctly uses SessionFailed = true in place of the old name.
  • CloseTestSessionAsync correctly reads the flag and sets IsSuccess = false.
  • The ReportUnhandledException path publishes an ErrorTestNodeStateProperty node, which is the right way to surface failures to IDEs and CI under MTP.

The PR is in good shape. The GetOrCreateServiceProvider-in-catch issue is the only thing worth a follow-up, but it does not block merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant