Skip to content

Bump Polecat to 4.2.1 and unskip the three Polecat scheduled-cascade tests#2947

Merged
jeremydmiller merged 1 commit into
mainfrom
chore-polecat-421
May 28, 2026
Merged

Bump Polecat to 4.2.1 and unskip the three Polecat scheduled-cascade tests#2947
jeremydmiller merged 1 commit into
mainfrom
chore-polecat-421

Conversation

@jeremydmiller
Copy link
Copy Markdown
Member

Follow-up to #2943.

What

Polecat 4.2.1 ships polecat#161, the upstream companion fix to GH-2941. Its DocumentSessionBase.SaveChangesAsync no longer early-returns when only ITransactionParticipants are queued, so the StoreIncomingEnvelopeParticipant added via Session.StoreIncoming(...) inside PolecatEnvelopeTransaction.PersistIncomingAsync actually runs for scheduled cascades from [ReadAggregate] / [DocumentExists] handlers that don't write any documents.

This PR:

Verification

  • Full dotnet build wolverine.slnx -c Release clean (0 warnings, 0 errors).
  • Local Marten Bugs sweep: 45/45 pass.
  • Polecat tests will be validated by CI — the local SQL Server 2025 image on Apple Silicon is too slow to run them (existing Polecat durable tests time out the same way against it; this is unrelated to the change here).

🤖 Generated with Claude Code

…scade tests

Polecat 4.2.1 ships polecat#161, the upstream companion fix to GH-2941. Its
DocumentSessionBase.SaveChangesAsync no longer early-returns when only
ITransactionParticipants are queued, so the StoreIncomingEnvelopeParticipant
added via Session.StoreIncoming(...) inside PolecatEnvelopeTransaction.
PersistIncomingAsync actually runs for scheduled cascades from
[ReadAggregate] / [DocumentExists] handlers that don't write any documents.

That lets the three previously [Skip]'d Polecat tests (added in #2943 with a
Skip reason pointing at polecat#161) exercise the end-to-end path on CI:
  - Bug_2941_read_aggregate_scheduled_cascade.read_aggregate_handler_schedules_its_cascading_message
  - Bug_2941_document_exists_scheduled_cascade.document_exists_handler_schedules_its_cascading_message
  - Bug_2941_document_exists_scheduled_cascade.document_does_not_exist_handler_schedules_its_cascading_message

Also tightens the PolecatPersistenceFrameProvider.CanApply comment so it
documents the pairing with polecat#161 / Polecat 4.2.1 rather than the
'necessary but not sufficient' caveat from the original commit.

Local: full wolverine.slnx -c Release builds clean (0 warnings, 0 errors).
Marten Bugs sweep 45/45 pass. Polecat tests will be validated by CI - the
local SQL Server 2025 image on Apple Silicon is too slow to run them
(existing Polecat durable tests time out the same way, unrelated to this
change).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jeremydmiller jeremydmiller merged commit a544812 into main May 28, 2026
23 of 24 checks passed
outofrange-consulting pushed a commit to outofrange-consulting/wolverine that referenced this pull request May 28, 2026
…JasperFx#2949)

Closes JasperFx#2949. The flake symptom on three consecutive PRs (JasperFx#2943, JasperFx#2947, JasperFx#2948
in May 2026) was:

  System.TimeoutException : Timed out waiting for expected response
  Wolverine.Runtime.Agents.AgentsStarted for original message <id>
  of type Wolverine.Runtime.Agents.StartAgents with a configured timeout
  of 10000 milliseconds

Tests affected: RavenDbTests.LeaderElection.leadership_election_compliance.
take_over_leader_ship_if_leader_becomes_stale and .leader_switchover_between_nodes.

Root cause: WolverineRuntime.Agents.InvokeAsync<T>(NodeDestination, IAgentCommand)
had an asymmetric timeout - same-node calls got 30s, remote-node calls got 10s.
The asymmetry is backwards: a remote request-reply traverses the control
endpoint + serialization + network, so it should have AT LEAST as much budget
as a same-node in-memory invocation, not less. Under load on shared GitHub
runners that 10s was a real timing race on the cross-node leadership-takeover
scenarios the failing tests exercise, where StartAgents -> AgentsStarted is
sent from the new leader to a target node and the runner can stall just long
enough for the reply not to land in time.

Fix: align remote with same-node at 30s. This is the production constant, not a
test-only setting - the same flake would (rarely) bite users on busy or
slow-network multi-node Wolverine clusters whose leadership-takeover scenario
hits the same StartAgents -> AgentsStarted ack. Same-node already accepts 30s
so the change is conservatively in-line with existing precedent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jeremydmiller added a commit that referenced this pull request May 28, 2026
Bug-fix + feature release on top of 6.1.0 — 13 PRs.

Notable additions:
- Custom Result<T> handler-return-value support (Phases 0+1+2+3, #2952, refs #2221)
- DbContext abstractions for EF Core transaction middleware (#2919 + docs/tests #2954)
- Outgoing Envelope pooling at MessageRouter.RouteForPublish (#2956, closes #2955)
  — ~-504 B/op on transport-bound sends per the CritterStackScalability
  WolverineTransportBenchmarks harness

Bug fixes: scheduled-cascade loss from [ReadAggregate]/[DocumentExists]
handlers (#2941), ancillary-store inbox routing (#2944), Postgres queue-name
length (#2942), MySQL node-record quoting (#2940), Pulsar batched-partition
ack KeyNotFoundException (#2883/#2950), remote-node agent reply timeout
(#2949), and additional resource-disposal cleanup (#2894 from
dmytro-pryvedeniuk).

Polecat bumped 4.1.1 -> 4.2.1 (#2947); Marten + JasperFx families unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant