Make TrackedSession bookkeeping thread-safe#3070
Conversation
EnvelopeHistory backed its records with an unsynchronized List<T>, but with IncludeExternalTransports() records arrive concurrently from parallel transport listener threads (e.g. multiple Rabbit MQ queues). Concurrent Add + LINQ iteration intermittently threw NullReferenceException from RecordCrossApplication, surfacing through AssertNoExceptionsWereThrown and failing whatever test had an active tracked session. TrackedSession's _statuses and _exceptions lists had the same unguarded add/enumerate pattern. All EnvelopeHistory members now synchronize on a private lock (Records returns a snapshot), and the session lists are guarded by locking with snapshot reads. Includes a regression stress test that reproduces the NullReferenceException deterministically against the previous implementation.
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR hardens Wolverine tracking against concurrency issues when multiple transport listener threads record activity for the same envelope/session.
Changes:
- Add synchronization around
TrackedSessionstatus/exception collections via snapshot reads + locked writes - Make
EnvelopeHistorythread-safe by guarding_recordswith a private lock and returning record snapshots - Add a stress test reproducing concurrent listener recording
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| src/Wolverine/Tracking/TrackedSession.cs | Avoid concurrent enumeration/modification by snapshotting and locking around exception/status collections |
| src/Wolverine/Tracking/EnvelopeHistory.cs | Serialize access to _records via a private lock; return snapshots to prevent concurrent iteration faults |
| src/Testing/CoreTests/Tracking/envelope_history_thread_safety.cs | Add a multi-writer stress test to validate thread safety |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| private EnvelopeRecord[] statusesSnapshot() | ||
| { | ||
| lock (_statuses) | ||
| { | ||
| return _statuses.ToArray(); | ||
| } | ||
| } | ||
|
|
||
| private Exception[] exceptionsSnapshot() | ||
| { | ||
| lock (_exceptions) | ||
| { | ||
| return _exceptions.ToArray(); | ||
| } | ||
| } |
There was a problem hiding this comment.
The camelCase is on purpose — the private methods in these files already use it
(lastOf, markLastCompleted, writeGrid, ...), so I just matched the surrounding code.
Can rename if the maintainers prefer PascalCase here.
There was a problem hiding this comment.
Copilot is lying. Our naming convention is camelCasing for private members:)
| lock (_statuses) | ||
| { | ||
| return _statuses.ToArray(); | ||
| } |
There was a problem hiding this comment.
Locking on the collections themselves is safe here — they're private and never leave the
class. But I have no strong feelings about it, happy to add dedicated lock fields if that's
the style maintainers prefer.
| var conditions = $"Conditions:\n{_conditions.Select(x => x.ToString())!.Join("\n")}"; | ||
| var activity = $"Activity:\n{AllRecordsInOrder().Select(x => x.ToString()).Join("\n")}"; | ||
| var exceptions = $"Exceptions:\n{_exceptions.Select(x => x.ToString()).Join("\n")}"; | ||
| var exceptions = $"Exceptions:\n{exceptionsSnapshot().Select(x => x.ToString()).Join("\n")}"; | ||
|
|
||
| return $"{conditions}\n\n{activity}\\{exceptions}"; |
There was a problem hiding this comment.
Nice catch, but this PR only changed the line above it. Looks like it was meant to be \n\n.
I think it is out of scope for this PR, but if you want, I am happy to fix it.
| public IEnumerable<EnvelopeRecord> Records | ||
| { | ||
| get | ||
| { | ||
| lock (_lock) | ||
| { | ||
| return _records.ToArray(); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Agreed, the signature could say this better. I can change it to EnvelopeRecord[] or IReadOnlyList<EnvelopeRecord> or leave it as is — whichever @jeremydmiller prefers.
What & why
When tracked sessions run with
IncludeExternalTransports(), envelope records arrive concurrently from parallel transport listener threads (in our case ~90 Rabbit MQ queues on one host).EnvelopeHistorybacked its records with an unsynchronizedList<T>, so concurrentAdd+ LINQ iteration intermittently threw aNullReferenceExceptionfrom inside the tracking bookkeeping itself, surfacing throughAssertNoExceptionsWereThrown()and failing whatever test happened to have an active session (~1 in 4 runs of a ~50-test integration suite for us):TrackedSession's_statusesand_exceptionslists have the same unguarded add/enumerate pattern (Record(...)/LogException(...)append from listener threads whileAllRecordsInOrder()/AssertNoExceptionsWereThrown()enumerate).The fix
EnvelopeHistory: all members that touch_recordsnow synchronize on a private lock; theRecordsproperty returns a snapshot array instead of the live list.TrackedSession:_statusesand_exceptionsare appended under a lock on the collection instance, and all read sites go through snapshot helpers.This is test-support code on cold paths, so the locking has no practical overhead.
Testing
envelope_history_thread_safetyhammersRecordCrossApplication+ the read paths from 8 threads. Against the previous implementation it fails deterministically (in under 150 ms) with the exactNullReferenceExceptionfrom the report above; with this change it passes.CoreTests.Trackingtests pass on net9.0 and net10.0.Fixes #3069