Skip to content

Fault Events — Auto-Publish Fault<T> on Terminal Handler Failure#2695

Merged
jeremydmiller merged 21 commits intoJasperFx:mainfrom
BlackChepo:feature/2658_Fault
May 8, 2026
Merged

Fault Events — Auto-Publish Fault<T> on Terminal Handler Failure#2695
jeremydmiller merged 21 commits intoJasperFx:mainfrom
BlackChepo:feature/2658_Fault

Conversation

@BlackChepo
Copy link
Copy Markdown
Contributor

Fault Events — Auto-Publish Fault<T> on Terminal Handler Failure

Closes #2658.

Summary

Adds an opt-in mechanism that publishes a strongly-typed Fault<T> envelope whenever a handler for T terminally fails — i.e. retries are exhausted, the message is moved to the dead-letter queue, or (with explicit opt-in) discarded. Operators get a queryable, durable event stream of handler failures without scattering try/catch logic through every handler.

The feature is fully opt-in and additive. Hosts that don't enable it see no behaviour change. The hot-path cost when disabled is one Dictionary<Type, …> lookup.

Motivation

DLQ inspection alone forces operators into a generic, untyped store. Distributed consumers that want to react programmatically to failures (alerting, compensation, projections of "what failed and why") need a typed, queryable signal. This branch provides that signal as a regular Wolverine message: any handler can subscribe to Fault<T> for any T it cares about.

Public API surface

// Global opt-in (with optional redaction defaults).
opts.PublishFaultEvents(includeExceptionMessage: true, includeStackTrace: true);

// Per-type override.
opts.Policies.ForMessagesOfType<OrderPlaced>()
    .PublishFault(includeExceptionMessage: true, includeStackTrace: false);

opts.Policies.ForMessagesOfType<HighVolumeChatter>()
    .DoNotPublishFault();

// Per-type encryption auto-pairs Fault<T>.
opts.Policies.ForMessagesOfType<PaymentDetails>().Encrypt();
//   → Fault<PaymentDetails> is also encrypted on the wire and on receive.
public record Fault<T>(
    T Message,
    ExceptionInfo Exception,
    int Attempts,
    DateTimeOffset FailedAt,
    string? CorrelationId,
    Guid ConversationId,
    string? TenantId,
    string? Source,
    IReadOnlyDictionary<string, string?> Headers
) where T : class;

public static class FaultHeaders
{
    public const string AutoPublished = "wolverine.fault.auto";
    public const string OriginalId    = "wolverine.fault.original_id";
    public const string OriginalType  = "wolverine.fault.original_type";
}

Subscribers are normal Wolverine handlers — no special attribute, no opt-in registration:

public static class OrderPlacedFaultHandler
{
    public static void Handle(Fault<OrderPlaced> fault) { /* … */ }
}

For tests, auto-published faults are surfaced via ITrackedSession.AutoFaultsPublished. Hand-published bus.PublishAsync(new Fault<T>(...)) calls do not appear there — the FaultHeaders.AutoPublished distinguishes auto from manual.

What's covered (and what isn't)

Fault is published when:

  • Message moved to error queue (DLQ) — every retry policy that ends in DLQ.
  • Discarded — only when the failure rule was configured with discardWithFaultPublish: true.
  • Expired envelope — handler entry observes the envelope past its DeliverBy.

Fault is not published in these bypass paths (intentional):

  • Send-side failures (broker rejects the outbound publish).
  • Unknown message type at the receiver (cannot synthesize a T).
  • Pre-handler crypto failures (EncryptionPolicyViolationException, EncryptionMissingHeaderException, EncryptionDecryptionException).
  • Fault<T>-handler failures (recursion guard suppresses Fault<Fault<T>>; emits wolverine.fault.recursion_suppressed activity event instead).

Atomicity caveat (documented): fault publish is best-effort, not transactionally co-committed with the DLQ insert. The receive-side outbox does not enrol the fault enqueue in the DLQ row's transaction. Subscribers must be resilient to gaps; faults are not a strict audit log. Wired explicitly into the public XmlDoc on WolverineOptions.PublishFaultEvents.

Internal architecture

  • IFaultPublisher — registered via DI, resolved lazily through IWolverineRuntimeInternal to avoid circular dependency on WolverineRuntime.
  • FaultPublisher.PublishIfEnabledAsync — single entry point. Enforces a MUST NOT throw contract: try/catch wraps the entire path, with logger + counter + activity event + SetStatus(Error) on failure. No path can re-enter the failure pipeline.
  • FaultPublishingPolicy — per-type override store. Snapshotted into a FrozenDictionary at Freeze() time (called by WolverineRuntime.StartAsync) and read via Volatile.*, so cross-thread visibility of pre-Freeze override writes is guaranteed by the memory model rather than implicit host-startup synchronization.
  • Encryption pairingPolicies.ForMessagesOfType<T>().Encrypt() reflectively registers EncryptMessageTypeRule<Fault<T>> and adds typeof(Fault<T>) to the receive-side RequiredEncryptedTypes set. Skipped for value-type T (because Fault<T> requires T : class). wolverine.encryption.* headers are stripped from Fault<T>.Headers so encryption decisions are made fresh on the outbound fault hop.
  • Tracking integrationEnvelopeHistory records AutoFaultPublished events with IsComplete = true so tracked sessions complete instead of timing out.
  • DiscardEnvelope — fault publish runs before CompleteAsync (terminal-sweep race fix); DiscardedEnvelope tracking event is emitted from a finally block so it always fires, even when CompleteAsync throws.

Observability

Activity event Fires when
wolverine.fault.published A fault is enqueued for routing.
wolverine.fault.no_route No route exists for Fault<T>. Tagged with wolverine.fault.message_type.
wolverine.fault.recursion_suppressed Recursion guard short-circuits a Fault<Fault<T>>.
wolverine.fault.publish_failed The MUST NOT throw contract caught a publish-time exception.
Counter What
wolverine.fault.events_published Counter<int>, incremented per fault enqueued. Recursion-suppressed faults do not increment.

Outbound Fault<T> envelopes inherit ConversationId, CorrelationId, and TraceParent from the failing envelope, so distributed traces stay connected across the failure → fault hop.

Test coverage

Three layers, all green on dotnet test src/Testing/CoreTests/CoreTests.csproj (1617/1617 passing):

  • Unit testsFaultPublisher, FaultPublishingPolicy, DiscardEnvelope tracking-on-throw, recursion guard, redaction recursion through inner exceptions, encryption header strip, frozen-policy contract.
  • Integration testsPublishFaultEventsIntegrationTests, FaultRedactionIntegrationTests, FaultEncryptionRoundTripTests (byte-level wire-bytes assertion proves "ciphertext on the wire," not just "tagged for encryption"), FaultBypassTracingTests, FaultCryptoExceptionGuardTests (drives HandlerPipeline.TryDeserializeEnvelope directly with garbled bytes above the AEAD min-length so the AEAD failure path is exercised, not the early "too short" guard).
  • Compliance testsDurableFaultPublishingCompliance (Marten/Postgres, SQL Server, RavenDB) and TransportFaultRoutingCompliance (RabbitMQ, Kafka). Per-test isolation via Guid.NewGuid() queue/topic names.

Documentation

  • New page: docs/guide/handlers/fault-events.md — canonical reference covering API, anatomy, delivery semantics, subscribing, per-type config, redaction, encryption pairing, observability, ITrackedSession integration, pitfalls. Wired into the Vitepress sidebar under Handlers between Error Handling and Rate Limiting.
  • Cross-link: existing docs/guide/runtime/encryption.md already had a ### Fault events subsection from the encryption-pairing work; the new page links into it for byte-level mechanics.
  • Sample: src/Samples/FaultEventsDemo/ — runnable single-process console demo modelled on EncryptionDemo. In-memory transport, no Docker required. Demonstrates global opt-in, per-type override (PublishFault / DoNotPublishFault), the Encrypt() ↔ Fault<T> auto-pairing, and a fault subscriber that reads FaultHeaders.AutoPublished.

Backwards compatibility

  • 100% additive. No schema migration, no API break, no behaviour change for hosts that don't call PublishFaultEvents().
  • MoveToErrorQueue and DiscardEnvelope continuations gain the fault-publish call but are no-ops when the global mode is None.

BlackChepo added 21 commits May 5, 2026 21:26
…iguration API

Introduces the public Fault<T> and ExceptionInfo records, an internal
FaultPublishingMode enum + FaultPublishingPolicy aggregate, and the user-facing
configuration entry points: WolverineOptions.PublishFaultEvents and
MessageTypePolicies<T>.PublishFault / DoNotPublishFault.

The Fault<T> envelope-derived fields (CorrelationId, Headers) match the
nullability of Envelope's corresponding properties so the publisher can copy
them mechanically.
…ilure

Adds the internal IFaultPublisher and wires it into MoveToErrorQueue and
DiscardEnvelope via a new IWolverineRuntime extension method. The publisher
attaches the FaultHeaders.AutoPublished header to the outgoing envelope using
DeliveryOptions, so Wolverine's existing publish pipeline handles routing,
serialization, and outbox enrolment uniformly.

When a durable receiver has an active outbox transaction on the inbound
MessageContext, the fault publish enrols in that transaction and is committed
together with the DLQ insert; otherwise it is a best-effort post-DLQ-move
publish. Value-type messages and null-message envelopes silently no-op.
Publish failures are logged and counted under the new
wolverine-faults-publish-failed counter on the runtime's canonical Meter; the
DLQ-move itself is never affected.
…edSession

Adds MessageEventType.AutoFaultPublished and ITrackedSession.AutoFaultsPublished.
TrackedSession detects the FaultHeaders.AutoPublished marker on outgoing Sent
envelopes and records an additional AutoFaultPublished event for assertion in
integration tests. IMessageTracker and the runtime trackers stay unchanged —
detection is header-driven and lives entirely inside the test-tracking layer.
…shing

Adds end-to-end coverage of the auto-publish failure scenarios against the
in-memory transport: globally-enabled publish, per-type opt-in/opt-out,
discard with and without includeDiscarded, header round-trip, and the
InvokeAsync exclusion. Surfaces a missed switch case in EnvelopeHistory
that was swallowing AutoFaultPublished tracking records inside the
publisher's try/catch — fixed in the same commit as a no-op marker arm.
Two corrections surfaced by the integrated review:

- Skip publishing when the failing message itself is a Fault<>. Without this
  guard, a globally-enabled feature plus a failing Fault<X> subscriber would
  emit Fault<Fault<X>> and recurse on every subsequent failure.

- Emit ActivityEvents fault.published / fault.publish_failed on the existing
  MovedToErrorQueue / EnvelopeDiscarded span so OpenTelemetry consumers can
  correlate the auto-publish outcome with the failure span. The publisher
  signature gains an Activity? parameter that the IWolverineRuntime extension
  forwards from the call sites.
The internal FaultPublisher is now registered as a singleton in
HostBuilderExtensions and resolved lazily by WolverineRuntime from its
container. Production behaviour is unchanged — the publisher receives the
same FaultPublishingPolicy, ILogger, and Meter — but tests can now substitute
a decorated implementation through IServiceCollection.

The lazy resolution avoids a circular dependency that would otherwise
deadlock host startup: the DI factory reads IWolverineRuntime to obtain the
Meter, and that singleton is mid-construction at the point where the runtime
itself wants the publisher.
…ult<T>

Adds two abstract test bases — DurableFaultPublishingCompliance (smoke +
atomicity) and TransportFaultRoutingCompliance (smoke + two-host routing) —
plus the shared message types, handlers, and a CrashingFaultPublisherDecorator
that backend/transport test projects can derive from. The atomicity test
substitutes the decorator through IServiceCollection to force a crash after
Fault<T> publish but before CompleteAsync, then verifies the durable store
shows neither the DLQ row nor the outgoing fault row.
…t<T>

Adds Marten/Postgres, SQL Server, and RavenDB derivations of
DurableFaultPublishingCompliance. Each backend exercises two checks: smoke
publish via TrackedSession and the happy-path durable assertion that exactly
one DLQ row and one outgoing fault row are persisted.

A rollback test is intentionally deferred. Wolverine's durable local-queue
persistence enqueues immediately rather than enrolling in the receive-side
MessageContext's outbox transaction, so a crash between MoveToDeadLetterQueueAsync
and CompleteAsync would leave both rows committed in the most natural test
topology. A meaningful rollback test requires routing Fault<T> through a
destination whose persistence enrols in the active outbox transaction (TCP
loopback with UseDurableInbox, or similar) and is left as a follow-up.
Adds RabbitMQ and Kafka derivations of TransportFaultRoutingCompliance. Each
test stands up a Sender host (which fails terminal on OrderPlaced) and a
Receiver host (which subscribes to Fault<OrderPlaced> on the configured queue
or topic), then verifies the subscriber receives the fault with intact body
and that the wolverine.fault.auto header survived the broker round-trip.
…ired path,

unblock tracking completion, document delivery semantics

Resolves the four mandatory semantic-bug findings from the 2026-05-06
code review of the Fault<T> auto-publishing feature.

- DiscardEnvelope is now a per-instance continuation that carries the
  real triggering exception, paired with a DiscardEnvelopeSource
  singleton on the IContinuationSource side. The previous singleton
  synthesized EnvelopeDiscardedException and threw away the actual
  exception — so Fault<T>.Exception.Type was always
  EnvelopeDiscardedException for policy-discarded messages, exactly
  the diagnostic value PublishFaultEvents(includeDiscarded: true) was
  meant to surface. Mirrors the MoveToErrorQueueSource /
  MoveToErrorQueue split. EnvelopeDiscardedException is removed (no
  remaining references).

- Expired envelopes now surface as EnvelopeExpiredException instead of
  reusing the discard sentinel. HandlerPipeline.executeAsync constructs
  the per-instance DiscardEnvelope with the new exception, so when the
  expired path is reached, the Fault<T> subscriber sees a semantically
  correct exception type and message.

- AutoFaultPublished tracking records now self-complete in both
  RecordLocally and RecordCrossApplication. Previously the cross-app
  per-UniqueNodeId sweep never matched the sender-side AutoFaultPublished
  record, leaving TrackedSession to time out at 30s. RabbitMQ
  TransportFaultRoutingCompliance.fault_is_received_by_subscriber_with_intact_payload
  now completes in ~1s (was masked by DoNotAssertOnExceptionsDetected
  skipping the timeout assertion).

- PublishFaultEvents and MessageTypePolicies<T>.PublishFault XmlDoc now
  document the at-most-once delivery caveat: the DLQ insert and the
  fault publish are not transactionally co-committed, so a crash
  between them loses the fault event. The full enroll-in-outbox fix
  is a larger RDBMS change tracked separately.
boundary, dedicated policy property, document scope

- FaultPublisher now pre-checks routing before delegating to
  lifecycle.PublishAsync. When no routes are configured for Fault<T>,
  the publisher logs at debug level and emits a new
  wolverine.fault.no_route ActivityEvent instead of the misleading
  wolverine.fault.published. The MessageBus.PublishAsync silent no-op
  on no-route stays unchanged; the misleading telemetry only existed
  because FaultPublisher emitted FaultPublished unconditionally. The
  routing pre-check duplicates the lookup MessageBus.PublishAsync does
  internally — acceptable on the failure path.

- The IWolverineRuntime → IFaultPublisher capability is now expressed
  via a typed internal interface, IWolverineRuntimeInternal, instead
  of a runtime-class-name pattern-match in the extension method.
  WolverineRuntime exposes FaultPublisher through an explicit interface
  implementation — internal interface members can't have implicit
  implementations with internal visibility. The silent fallback for
  non-Wolverine IWolverineRuntime mocks is preserved; the intent now
  lives in the type system.

- FaultPublishingPolicy moved out of WolverineOptions.RegisteredPolicies
  (which it was never invoked through) into a dedicated internal
  WolverineOptions.FaultPublishing property. The IWolverinePolicy
  marker on the class is dropped. PerTypeOverrides is now a private
  field accessed through SetOverride/Resolve methods — no public
  mutable Dictionary surface to maintain a concurrency contract for.
  FindOrCreateFaultPublishingPolicy() is removed.

- PublishFaultEvents and MessageTypePolicies<T>.PublishFault XmlDoc now
  document the two scenarios that bypass auto-fault-publishing entirely:
  send-side dead-letter movements (sender retries exhausted) and
  envelopes whose message-type name doesn't resolve to a known handler.
…ual,

trace-context regression guards

Persistence test hygiene:
- Marten and SqlServer compliance derivations now drop their target
  schema (Postgres CASCADE; SqlServer dynamic-SQL: drop foreign keys
  first, then drop all schema tables) before host startup via a
  separate connection, instead of calling ClearAllAsync after
  StartAsync. Eliminates the recovery-agent race on stale rows from a
  previous run. Marten's redundant CompletelyRemoveAllAsync call is gone.
- SqlServerFaultPublishingTests now has [Collection("sqlserver")],
  matching the convention used in EfCoreTests for shared-fixture
  isolation under xUnit parallel execution.
- All three durable-fault compliance derivations replace LIKE '%Fault%'
  / Contains("Fault") in their snapshot queries with parameterised
  typed equality on typeof(Fault<OrderPlaced>).ToMessageTypeName().
  Future message types containing the substring "Fault" can no longer
  pollute the snapshot count.
- All three durable-fault compliance derivations explicitly set
  Durability.KeepAfterMessageHandling = 5.Minutes() — defensive
  against future config changes that could shorten the value below
  the snapshot wall-clock and race with DeleteExpiredEnvelopesOperation.
  DurableFaultPublishingCompliance.BuildCleanHostAsync XmlDoc now
  documents the contract.
- RavenDbFaultPublishingTests comment corrected: "all three [Fact]s"
  → "both [Fact]s".
correlation headers, observable recursion guard

Five small telemetry-shape changes to the auto-Fault<T> publishing path:

- WolverineTracing.FaultPublishFailed now emits "wolverine.fault.publish.failed"
  (dot-separated). Other Wolverine ActivityEvent constants use dot-separated
  paths consistently — wolverine.envelope.discarded,
  wolverine.circuit.breaker.triggered, wolverine.fault.published,
  wolverine.fault.no_route. The C# constant identifier is unchanged; only
  the wire string moves to the established convention. Operator-facing
  rename — dashboards or alert rules keying on the literal old string
  need updating.

- FaultPublisher's publish-failed counter is now Counter<int> instead of
  Counter<long>, matching the rest of Wolverine's counters
  (MessagesSent, MessagesSucceeded, MessagesFailed, ...). OTel exporters
  accept both; this is a consistency tightening, not a behavior change.

- On publish failure (MessageBus.PublishAsync throws), the captured
  Activity now records ActivityStatusCode.Error with the exception type
  name as description, mirroring HandlerPipeline.cs. Operators filtering
  traces by status.code = ERROR catch fault-publish failures alongside
  other runtime faults. The no-route and recursion-suppression paths
  intentionally do NOT set status — those are configuration conditions,
  not runtime faults; their dedicated ActivityEvents carry the diagnostic.

- The Fault<Fault<T>> recursion guard, previously a silent return, now
  emits wolverine.fault.recursion_suppressed (new ActivityEvent constant)
  and a debug log. Almost always indicates a misconfigured recursive
  Fault<T> handler. No counter — recursion suppression is a defensive
  config-error path, not a runtime metric to chart.

- Auto-published Fault<T> envelopes now carry two new headers so trace
  consumers and header-filter routes can correlate without parsing the
  message body:
    wolverine.fault.original_id    — original envelope's Id (Guid string)
    wolverine.fault.original_type  — original message's wire-format
                                      type name (ToMessageTypeName())
  Naming matches the existing wolverine.fault.auto header in
  FaultHeaders. Type uses ToMessageTypeName() for consistency with
  envelope.MessageType elsewhere in Wolverine telemetry.
publish to eliminate session-completion race

DiscardEnvelope.ExecuteAsync previously fired
runtime.MessageTracking.DiscardedEnvelope() before
PublishFaultIfEnabledAsync. The Discarded MessageEventType triggers a
sweep that marks the original envelope's tracking records IsComplete.
With _executionComplete already set (the user lambda has returned),
TrackedSession.IsCompleted() observes the original envelope's history
fully complete and signals session completion before the auto-published
Fault<T> envelope's history exists. Consumers awaiting
PublishMessageAndWaitAsync resume on a thread-pool continuation that
races against the still-pending fault publish — so collector.Order /
ITrackedSession.AutoFaultsPublished can be observed empty even though
the fault is about to be (or just was) published.

The reorder mirrors MoveToErrorQueue.ExecuteAsync: publish runs first,
CompleteAsync commits inbound state, then the terminal tracking event
fires. After the publish the Fault<T> envelope's EnvelopeHistory exists
with incomplete records; the subsequent Discarded sweep touches only
the original envelope's history (the per-envelope-Id dictionary keeps
the two histories separate). TrackedSession.IsCompleted() correctly
waits for the fault handler before completing the session.

Telemetry impact: the wolverine.envelope.discarded ActivityEvent still
fires at method entry — unchanged for trace consumers. The internal
Discarded MessageEventType / counter now fires after the fault publish
instead of before; total event and metric counts are unchanged.
…, drop wolverine.encryption.* from Fault<T>.Headers

Two confidentiality leaks on the auto-published Fault<T> path. Both
ship together because they are reachable only with PublishFaultEvents()
layered on top of per-type encryption, and a partial fix would leave
one of the leaks open.

Pairing: when Policies.ForMessagesOfType<T>().Encrypt() runs, also
register EncryptMessageTypeRule<Fault<T>> against the same encrypting
serializer and add typeof(Fault<T>) to RequiredEncryptedTypes. The
existing rule's runtime-type gate (CanBeCastTo<T>) is invariant in T,
so without the paired rule the auto-fault for an encrypted T was
serialized by the endpoint's default JSON serializer and reached the
broker as plaintext (with the original payload and the captured
exception in the body). Receive-side guard had a symmetric gap.
Skipped when T is a value type — Fault<T> requires T : class and the
publisher already silently no-ops on value-type messages. Reflective
construction keeps Encrypt<T>() callable for any T.

Headers: FaultPublisher.BuildFactory copied env.Headers verbatim into
the Fault<T> body, including wolverine.encryption.key-id and
wolverine.encryption.inner-content-type. Those are routing/AEAD
metadata for the original wire envelope, not application headers, and
they would mislead anyone reading a Fault and leak the active key-id.
Filter via a shared EncryptionHeaders.HeaderPrefix constant so the
rule stays consistent if more encryption headers are added later.
… document RequireEncryption() listener scope

Two related concerns on the auto-published Fault<T> path: a captured
exception message can carry payload-derived plaintext (e.g. a handler
that throws "Card {card} declined" puts the card number into
Fault<T>.Exception.Message), and operators marking a listener with
RequireEncryption() reasonably assumed the marker also constrained
outbound republishes — it does not.

Redaction: PublishFaultEvents and per-type PublishFault<T> gain two
defaulted bool args, includeExceptionMessage and includeStackTrace.
Both default to true (today's behavior unchanged). Setting either to
false applies recursively through inner exceptions and
AggregateException.InnerExceptions; the Type field is always preserved.
Redacted Message becomes string.Empty, redacted StackTrace becomes null.
Per-type calls are fully specified — they store the values you pass and
do not inherit subsequent changes to the global defaults. Implementation
threads a small FaultPublishingDecision struct from FaultPublishingPolicy
through FaultPublisher into a new ExceptionInfo.From overload.

Documentation: RequireEncryption() XmlDoc on both the listener interface
and implementation now spells out that scope is the inbound listener
only; outbound Fault<T> routing uses the global routing graph and the
outbound encryption knobs (per-type Encrypt() or per-endpoint
.Encrypted(), both of which auto-pair with Fault<T>). encryption.md
gains a "Fault events" subsection that brings together the auto-pairing
behavior, the new redaction knobs, and the listener-scope clarification.
…lope and pre-handler failure scope, with crypto-failure regression guards

Cleanup pass closing the remaining doc-only items on the auto-published
Fault<T> path: capture in-source the rationale for the post-reorder
publish-fault-then-CompleteAsync sequence in DiscardEnvelope.ExecuteAsync,
and surface in the PublishFaultEvents XmlDoc the three behaviors that
previously had to be reverse-engineered — that envelopes arriving expired
bypass fault publication (no T to wrap before deserialization), that
sensitive T should be paired with MessageTypePolicies<T>.Encrypt() so
Fault<T> travels encrypted, and that pre-handler crypto failures
(MessageDecryptionException, EncryptionKeyNotFoundException,
EncryptionPolicyViolationException) intentionally produce no fault event
because the message instance is unavailable.
…icy freeze, no-route tag, AutoFaultPublished ToString, ExceptionInfo depth cap, counter rename, null-route guard

Round of small polish on the auto-published Fault<T> path: telemetry
gaps, defensive hardening, and naming consistency. No behavior changes
to any existing path.

Telemetry:
  - Send-side and unknown-message-type DLQ moves now emit a one-line debug
    log + wolverine.fault.bypassed.{send_side,unknown_type} activity event,
    gated on PublishFaultEvents being globally enabled. Operators tracing a
    missing Fault<T> for envelope X can now correlate to the actual bypass
    path instead of dead trace data.
  - wolverine.fault.no_route activity event now carries a
    messaging.message_type tag.
  - EnvelopeRecord.ToString() formats AutoFaultPublished with a dedicated
    "Auto-published Fault for ..." line instead of falling through to the
    generic format.

Defensive hardening:
  - FaultPublishingPolicy.Freeze() runs at the end of WolverineRuntime
    startup; SetOverride after that point throws InvalidOperationException
    pointing the caller back to the bootstrap callback.
  - ExceptionInfo.From caps recursive inner-exception walks at depth 32 with
    a synthetic "__truncated__" marker entry; defends against pathological
    AggregateException graphs without throwing.
  - FaultPublisher's no-route check is null-defensive; a non-compliant
    IMessageRouter that returns null no longer NREs.

Naming:
  - MetricsConstants.FaultsPublishFailed renamed to FaultPublishFailures
    (metric value wolverine-fault-publish-failures). Singular fault, plural
    failures — describes the meter content, mirrors MessagesFailed's pattern.
…ejects unencrypted Fault<T> envelopes

Regression guard: a plaintext Fault<T> envelope arriving at a listener
marked with .RequireEncryption() goes through the same receive-side guard
as any other type and is routed to MoveToErrorQueue with
EncryptionPolicyViolationException. The behavior is already correct
(the guard at HandlerPipeline is type-agnostic) — this test prevents a
future refactor from accidentally adding Fault-specific handling that
would let unencrypted faults slip through.
…m Pass 3

- Emit the DiscardedEnvelope tracking event from a finally block in
  DiscardEnvelope.ExecuteAsync so it still fires when
  IEnvelopeLifecycle.CompleteAsync throws — TrackedSession no longer
  hangs on transient broker-commit failures.
- Snapshot FaultPublishingPolicy per-type overrides into a FrozenDictionary
  at Freeze() time, written/read via Volatile.* so cross-thread visibility
  of pre-Freeze override writes is guaranteed by the memory model rather
  than implicit host-startup synchronization. Pre-Freeze Resolve still
  reads from the mutable builder dictionary, preserving existing test
  surfaces.
- New guide page docs/guide/handlers/fault-events.md covering API,
  Fault<T> anatomy, delivery semantics and bypass paths, subscribing,
  per-type overrides, redaction, encryption pairing (cross-link to
  encryption.md), observability, ITrackedSession integration, and
  pitfalls. Wired into the Handlers section of the Vitepress sidebar
  between Error Handling and Rate Limiting.
- New src/Samples/FaultEventsDemo standalone console sample
  (Program.cs + DemoHandlers.cs + .csproj) modelled on EncryptionDemo.
  Single-process, in-memory transport, no Docker. Demonstrates global
  opt-in, per-type override (PublishFault / DoNotPublishFault), the
  Encrypt() ↔ Fault<T> auto-pairing, and a fault subscriber that reads
  FaultHeaders.AutoPublished.
@jeremydmiller jeremydmiller modified the milestone: 6.0 May 7, 2026
@jeremydmiller
Copy link
Copy Markdown
Member

@BlackChepo Hey, I'm going to bump this back. Just out of time to make it for 5.38.

@BlackChepo
Copy link
Copy Markdown
Contributor Author

Sure, no problem, there's no rush.

@jeremydmiller jeremydmiller merged commit 1e3d37a into JasperFx:main May 8, 2026
21 checks passed
jeremydmiller added a commit that referenced this pull request May 8, 2026
The Fault Events feature added in #2695 shipped as a standalone
docs/guide/handlers/fault-events.md page with its own sidebar entry
under "Message Handlers". On reflection, fault publishing is the tail
end of the same retry / requeue / DLQ pipeline the Error Handling page
already documents — splitting it across two pages forces readers
configuring DLQ semantics to context-switch into a sibling page for the
adjacent fault-publish behaviour.

Move the entire fault-events page in as a top-level `## Fault Events`
section at the bottom of `docs/guide/handlers/error-handling.md`,
demoting every interior heading by one level so the section structure
is preserved (Quickstart / Anatomy / Delivery semantics / Subscribing /
Per-type / Redaction / Encryption pairing / Observability / Testing /
Pitfalls / See also). Sub-section titles are scoped with "Fault" where
needed ("Per-type fault configuration", "Fault redaction", etc.) so
the in-page TOC reads cleanly when sitting beside the existing error-
handling sub-sections.

The "See also → Error Handling" line is dropped from the merged
content (we're on that page now); cross-links into the encryption page
anchor (`/guide/runtime/encryption#fault-events`) are preserved.

Sidebar entry for the standalone "Fault Events" link is removed from
docs/.vitepress/config.mts; the file
docs/guide/handlers/fault-events.md is deleted. No other docs page
linked into the standalone page directly, so nothing breaks.

Verified locally: `npm run docs:build` (vitepress build) completes
green in 17.55s, the merged Error Handling page contains the fault
content (5 grep hits for "Fault Events" / "fault.published" /
"FaultHeaders" in the rendered HTML), and dist has no orphan
fault-events.html.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: auto-publish Fault<T> events on terminal handler failure

2 participants