Skip to content

Message Delivery Assurances

Martin Thompson edited this page Mar 31, 2022 · 12 revisions

Aeron is described as a reliable messaging transport. You may ask, "what assurances does Aeron provide about the delivery of a message from the publisher to the subscriber?" Many messaging systems claim to guarantee that they will deliver a message once and only once which is misleading at best and a downright lie at worst. First up, it is important to make the distinction between guaranteed delivery and guaranteed processing of a message. Consider a message delivered to a subscriber and then just after the subscriber consumes the message from the messaging system the subscriber crashes. To be clear the only way to have guaranteed delivery and processing is to have application awareness and collaboration. This will generally mean the subscriber can idempotently process a message and know up to which point it has processed. In conjunction it requires an application level protocol to reliably acknowledge processing back to the publisher.

For the purpose of this documentation we will be referring to the reliable transport of messages between Aeron media drivers and the hand off from driver to subscriber is out of scope. The subscriber is responsible for controlled consumption by taking responsibility for what state the messages are in the archive or cluster.

The following modes are mutually exclusive on the same stream (channel and stream id pair) for the same media driver. If a later subscription conflicts with an pre-existing active subscription then a error will be raised with the new subscription is added.

It is possible to get a report on observed loss via the System Counters and Loss Reporting tool.

Reliable Transport Subscription

Aeron's default mode is to reliably transport messages between media drivers. The assurances are similar to that of TCP in that messages will be transported in order between the sender and receiver with loss detected and then NAK'ed for recovery. Also like TCP, Aeron transport sessions are valid for active temporal connections between publishers and subscribers. The reliable sessions do not exist when both ends are not active, i.e. Aeron is not providing store and forward semantics when the subscriber is not connected.

When loss is detected on a reliable transport Aeron will NAK for the gap in the message stream to request a re-transmission. This NAK'ing will continue until the gap is re-transmitted provided the sender is considered active.

A stream can be explicitly requested as reliable when subscribing by adding a URI parameter of reliable=true to the channel URI. This is not required for a reliable subscription as this is the default.

    // e.g.
    String channelUri = "aeron:udp?endpoint=localhost:54325|reliable=true";

Recorded/Durable Streams

By using the Aeron Archive streams of messages can be recorded to stable storage and replayed later. The Archive can be instructed to record a Publication and the status of a recording can be monitored by subscribing to the recording events stream local or remote, or locally via the more efficient RecordingPos counter.

Recordings can be replayed by requesting a replay from the Archive. This can be for a stream that has stopped recording or it can be a replay of a live stream being recorded where the replay will track the latest recorded position.

Recordings can be written to storage using regular IO writes, or the writes can be optionally synch'ed for just data, or data and metadata, as required.

A Subscriber can use a bounded controlled poll on an Image to consume up to a position as notified from the Archive as recorded.

Un-reliable Transport Subscription

For some subscribers, especially on multicast, it is preferable to accept some loss and have the stream gap filled rather than take the latency hit on recovering the loss. Aeron supports a mode of operation whereby a subscriber can indicate a stream is OK to be unreliable and that when loss is detected the media driver will fill the gap with padding which is automatically skipped over by the Subscription without waiting for the recovery NAK protocol to take effect.

A stream can be requested as unreliable when subscribing by adding a URI parameter of reliable=false to the channel URI.

    // e.g.
    String channelUri = "aeron:udp?endpoint=localhost:54325|reliable=false";

IPC (Inter Process Communication) with shared memory (SHM)

For IPC communication on the same machine the delivery is always reliable. This also applies to spy subscriptions.