Skip to content

[OpAMP] Fix flakiness in tests#4169

Open
Copilot wants to merge 2 commits intomainfrom
copilot/fix-ws-transport-test-flakiness
Open

[OpAMP] Fix flakiness in tests#4169
Copilot wants to merge 2 commits intomainfrom
copilot/fix-ws-transport-test-flakiness

Conversation

Copy link
Copy Markdown

Copilot AI commented Apr 21, 2026

On .NET Framework (net462), the WebSocket receive loop can abort/close the socket while SendAsync is still in-flight after detecting an oversized server message. This races to produce a WebSocketException ("transitioned into 'Aborted' state") or OperationCanceledException that the tests did not tolerate, causing intermittent failures.

Changes

  • WsTransport_DropsOversizedResponseWithoutDispatchingFrame — wrap SendAsync with Record.ExceptionAsync; assert the exception is null, WebSocketException, or OperationCanceledException (any other type fails the test)
  • WsTransport_LogsOversizedResponseWarning — same fix; identical pattern with the same race exposure

All original assertions are preserved: server closes with MessageTooBig, no frames dispatched to listeners.

// Before — throws on net462 when socket aborts mid-send
await wsTransport.SendAsync(FrameGenerator.GenerateMockAgentFrame().Frame, CancellationToken.None);

// After — tolerates the expected abort race
var sendException = await Record.ExceptionAsync(
    () => wsTransport.SendAsync(FrameGenerator.GenerateMockAgentFrame().Frame, CancellationToken.None));
Assert.True(
    sendException is null || sendException is WebSocketException || sendException is OperationCanceledException,
    $"Unexpected exception from SendAsync: {sendException}");

No production code changed.

Original prompt

Repo: open-telemetry/opentelemetry-dotnet-contrib

Failing CI job URL: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/actions/runs/24716677111/job/72294849492

Failure:

  • Test: OpenTelemetry.OpAmp.Client.Tests.WsTransportTest.WsTransport_DropsOversizedResponseWithoutDispatchingFrame
  • Framework: net462 (from artifacts path release_net462)
  • Exception: System.Net.WebSockets.WebSocketException: InternalClientWebSocket transitioned into 'Aborted' state; inner TaskCanceledException
  • Stack: WsTransmitter.SendAsync at src/OpenTelemetry.OpAmp.Client/Internal/Transport/WebSocket/WsTransmitter.cs:40, test at test/OpenTelemetry.OpAmp.Client.Tests/WsTransportTest.cs:250

Goal:

  • Remove flakiness by adjusting the test so it does not fail if the client SendAsync races with the receiver aborting/closing the socket when an oversized server message is detected.
  • Preserve the original intent: oversized response should be dropped, no frame dispatched to listeners, and server should observe client close status MessageTooBig.

Implementation guidance:

  1. Update WsTransport_DropsOversizedResponseWithoutDispatchingFrame in test/OpenTelemetry.OpAmp.Client.Tests/WsTransportTest.cs.
  2. Wrap the await wsTransport.SendAsync(...) call with Record.ExceptionAsync (or Assert.ThrowsAnyAsync if preferred), and allow WebSocketException (and optionally OperationCanceledException) as acceptable outcomes. Fail if any other exception occurs.
  3. Keep the existing assertions about:
    • TryGetClientCloseStatus within 5 seconds and status equals WebSocketCloseStatus.MessageTooBig
    • mockListener receives no message
  4. Ensure the test remains deterministic across TFMs, especially net462, and still fails if the transport does not close with MessageTooBig or dispatches frames.

Files involved:

  • test/OpenTelemetry.OpAmp.Client.Tests/WsTransportTest.cs

Notes:

  • Do not change production code unless necessary; prefer test-only fix.
  • Ensure code style matches existing test patterns in this file.

The following is the prior conversation context from the user's chat exploration (may be truncated):

User: Please find a solution for this failing job. Use the logs, job definition, and any referenced files where the failure occurred. Keep your response focused on the solution and include code suggestions when appropriate.

Assistant: The failing job is due to a flaky/incorrect expectation in WsTransport_DropsOversizedResponseWithoutDispatchingFrame when running on .NET Framework (net462).

What’s happening

The test sends a request (wsTransport.SendAsync(...)) and then asserts the server observed the client closing with MessageTooBig:

  • Failure: System.Net.WebSockets.WebSocketException: ... transitioned into the 'Aborted' state
  • Stack shows the exception originates from the client send path:
    src/OpenTelemetry.OpAmp.Client/Internal/Transport/WebSocket/WsTransmitter.cs:40
    and is surfaced by the test at:
    test/OpenTelemetry.OpAmp.Client.Tests/WsTransportTest.cs:250

This indicates the receive loop likely detects the oversized server response and closes/aborts the socket while the client send is still in-flight, which on net462 can surface as an Aborted WebSocket and cause SendAsync to throw. The test currently doesn’t allow for that race.

Solution (make the test resilient to the expected abort)

Update WsTransport_DropsOversizedResponseWithoutDispatchingFrame to tolerate a WebSocketException (and/or OperationCanceledException) from SendAsync, because aborting the socket is an acceptable outcome once an oversized message is received.

Suggested test change

In test/OpenTelemetry.OpAmp.Client.Tests/WsTransportTest.cs, modify the SendAsync line to record/ignore the expected exception:

await wsTransport.StartAsync(CancellationToken.None);

// Sending can race with the receiver aborting the socket after detecting the oversized response.
// On .NET Framework this may surface as WebSocketException (Aborted).
var sendException = await Record.ExceptionAsync(
    () => wsTransport.SendAsync(FrameGenerator.GenerateMockAgentFrame().Frame, CancellationToken.None));

Assert.True(
    sendException is null || sendException is WebSocketException || sendException is OperationCanceledException,
    $"Unexpected exception from SendAsync: {sendException}");

Assert.True(opAmpServer.TryGetClientCloseStatus(TimeSpan.FromSeconds(5), out var closeStatus));
Assert.Equal(WebSocketCloseStatus.MessageTooBig, closeStatus);

Assert.False(mockListener.TryWaitForMessage(TimeSpan.FromMilliseconds(500)));
Assert.Empty(mockListener.Messages);

This keeps the intent of the test (oversized response is dropped, no frame dispatched, connection closed with MessageTooBig) while removing the timing dependency that causes the net462 abort.

Optional hardening (library-side)

If you want the client API to behave more predictably, you can also consider handling “socket already aborted/closed...

This pull request was created from Copilot chat.

… oversized-response tests on net462

Agent-Logs-Url: https://github.com/open-telemetry/opentelemetry-dotnet-contrib/sessions/7bf71274-9739-427f-9169-7f035cb1a946

Co-authored-by: martincostello <1439341+martincostello@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix flakiness in WsTransport_DropsOversizedResponse test fix(opamp): remove flakiness in oversized WebSocket response tests on net462 Apr 21, 2026
Copilot AI requested a review from martincostello April 21, 2026 10:31
@github-actions github-actions Bot added the comp:opamp.client Things related to OpenTelemetry.OpAmp.Client label Apr 21, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.80%. Comparing base (19db4b8) to head (9301891).
⚠️ Report is 3 commits behind head on main.
✅ All tests successful. No failed tests found.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4169      +/-   ##
==========================================
+ Coverage   73.66%   73.80%   +0.14%     
==========================================
  Files         466      458       -8     
  Lines       18330    18288      -42     
==========================================
- Hits        13502    13498       -4     
+ Misses       4828     4790      -38     
Flag Coverage Δ
unittests-Instrumentation.Cassandra ?
unittests-OpAmp.Client 82.92% <ø> (+0.33%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 10 files with indirect coverage changes

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@martincostello martincostello marked this pull request as ready for review April 21, 2026 10:40
@martincostello martincostello requested a review from a team as a code owner April 21, 2026 10:40
Copilot AI review requested due to automatic review settings April 21, 2026 10:40
@martincostello martincostello changed the title fix(opamp): remove flakiness in oversized WebSocket response tests on net462 [OpAMP] Fix flakiness in tests Apr 21, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Stabilizes OpAmp WebSocket oversized-response tests on .NET Framework (net462) by tolerating a known race where the transport aborts the socket while a client SendAsync is still in-flight.

Changes:

  • Wrap wsTransport.SendAsync(...) with Record.ExceptionAsync in two oversized-response tests.
  • Treat null, WebSocketException, and OperationCanceledException as acceptable outcomes for the send in these scenarios while preserving all existing close-status and “no frames dispatched” assertions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:opamp.client Things related to OpenTelemetry.OpAmp.Client

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants