Skip to content

fix(test): lock AllItemsReceived state in batch_processing tests#2692

Merged
jeremydmiller merged 1 commit intomainfrom
fix-marten-batch-processing-race
May 7, 2026
Merged

fix(test): lock AllItemsReceived state in batch_processing tests#2692
jeremydmiller merged 1 commit intomainfrom
fix-marten-batch-processing-race

Conversation

@jeremydmiller
Copy link
Copy Markdown
Member

Summary

  • MartenTests.batch_processing.end_to_end_with_tenancy has been failing on every recent Marten CI run with an AggregateException of NREs out of AllItemsReceived.IsCompleted. Root cause: in the multi-tenant variant the "blue" and "green" tenant batches complete in parallel and both call Record(EnvelopeRecord) from the tracking infrastructure on whichever thread the handler finishes on. The helper fed the records into a plain List<BatchItem> via AddRange, which is not thread-safe — concurrent callers race on the internal array's resize and writes, leaving null slots that surface as NRE when the polling thread reads r.Id in the IsCompleted predicate.
  • Locks both Record (writer) and IsCompleted (reader) so the polling thread sees a consistent snapshot.
  • No production code is touched — the lock lives entirely in the test helper class.

Repro / verification

  • Without fix: 1/10 fail locally on a fast box; deterministically failing in CI on slower runners (same NRE on every recent Marten run that's been retried twice and still red — see e.g. runs 25434414670, 25391326016, 25380002472, 25373294046, 25492035820).
  • With fix: 15/15 pass locally for end_to_end_with_tenancy; sibling end_to_end_with_durable (which uses the same helper) also passes.

Test plan

  • dotnet test --filter FullyQualifiedName=MartenTests.batch_processing.end_to_end_with_tenancy — 15/15 passed
  • dotnet test --filter FullyQualifiedName~MartenTests.batch_processing — 2/2 passed
  • CI Marten run on this PR clears

Note

This PR is also serving to put the Marten CI under load while you watch for any timeout regression.

🤖 Generated with Claude Code

`MartenTests.batch_processing.end_to_end_with_tenancy` has been
failing on every recent Marten CI run with an `AggregateException`
of NREs out of `AllItemsReceived.IsCompleted`. Root cause: in the
multi-tenant variant the "blue" and "green" tenant batches complete
in parallel and both call `Record(EnvelopeRecord)` from the tracking
infrastructure on whichever thread the handler finishes on. The
helper fed the records into a plain `List<BatchItem>` via
`AddRange`, which is not thread-safe — concurrent callers race on
the internal array's resize and writes, leaving null slots that
surface as NRE when the polling thread reads `r.Id` in the
`IsCompleted` predicate.

Local reproduction: 1/10 fail without the fix on a fast box,
deterministically failing in CI on slower runners (consistent
failure across the last several Marten runs, retried twice and
still red). With this lock both Record (writer) and IsCompleted
(reader) see a consistent snapshot — verified locally 15/15 plus
the sibling `end_to_end_with_durable` test (which uses the same
helper) still passes.

No production code is touched; the lock lives entirely in the test
helper class.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant