Skip to content

Latest commit

 

History

History
175 lines (108 loc) · 8.94 KB

77.md

File metadata and controls

175 lines (108 loc) · 8.94 KB

NIP-77

Negentropy Syncing

draft optional

This document describes a protocol extension for syncing events. It works for both client-relay and relay-relay scenarios. If both sides of the sync have events in common, then this protocol will use less bandwidth than transferring the full set of events (or even just their IDs).

It is a Nostr-friendly wrapper around the Negentropy protocol, which uses a technique called Range-Based Set Reconciliation.

Since Negentropy is a binary protocol, this wrapper hex-encodes its messages. The specification for Negentropy Protocol V1 is attached as an appendix to this NIP below.

High-Level Protocol Description

We're going to call the two sides engaged in the sync the client and the relay (even though the initiator could be another relay instead of a client).

  • (1) Client (initiator) chooses a filter, and retrieves the set of events that it has locally that match this filter (or uses a cache), and constructs an initial message.
  • (2) Client sends a NEG-OPEN message to the relay, which includes the filter and the initial message.
  • (3) Relay selects the set of events that it has locally that match the filter (or uses a cache).
  • (4) Relay constructs a response and returns it to the client in a NEG-MSG message.
  • (5) Client parses the message to learn about IDs it has (and relay needs) and IDs it needs (and relay has).
    • If client wishes to continue, then it constructs a new message and sends it to the relay in a NEG-MSG message. Goto step 4.
    • If client wishes to stop, then it sends a NEG-CLOSE message or disconnects the websocket.

The above protocol only results in the client learning about IDs it has/needs, and does not actually transfer events. Given these IDs, the client can upload events it has with EVENT, and/or download events it needs with REQ. This can be performed over the same websocket connection in parallel with subsequent NEG-MSG messages. If a client is only interested in determining the number of unique events (ie, reaction counts), it may choose to not download/upload at all.

Nostr Messages

Initial message (client to relay):

[
    "NEG-OPEN",
    <subscription ID string>,
    <filter>,
    <initialMessage, hex-encoded>
]
  • The subscription ID is used by each side to identify which query a message refers to. It only needs to be long enough to distinguish it from any other concurrent subscriptions on this websocket connection (an integer that increments once per NEG-OPEN is fine). Subscription IDs are in a separate namespace from REQ subscription IDs. If a NEG-OPEN is issued for a currently open subscription ID, the existing subscription is first closed.
  • The filter is as described in NIP-01.
  • initialMessage is the initial Negentropy binary message, hex-encoded. See appendix.

Error message (relay to client):

If a request cannot be serviced by the relay, an error is returned to the client:

[
    "NEG-ERR",
    <subscription ID string>,
    <reason code string>
]

Error reasons are the same format as in NIP-01. They should begin with a machine-readable single-word prefix, followed by a : and then a human-readable message with more information.

The current suggested error reasons are

  • blocked
    • Relays can optionally reject queries that would require them to process too many records, or records that are too old
    • The maximum number of records that can be processed can optionally be returned as the 4th element in the response
    • Example: blocked: this query is too big
  • closed
    • Because the NEG-OPEN queries may be stateful, relays may choose to time-out inactive queries to recover memory resources
    • Example: closed: you took too long to respond!

After a NEG-ERR is issued, the subscription is considered to be closed.

Subsequent messages (bidirectional):

Relay and client alternate sending each other NEG-MSGs:

[
    "NEG-MSG",
    <subscription ID string>,
    <message, hex-encoded>
]
  • message is a Negentropy binary message, hex-encoded. Both message directions use the same format. See appendix.

Close message (client to relay):

When finished, the client should tell the relay it can release its resources with a NEG-CLOSE:

[
    "NEG-CLOSE",
    <subscription ID string>
]

Appendix: Negentropy Protocol V1

Preparation

There are two protocol participants: Client and server. The client creates an initial message and transmits it to the server, which replies with its own message in response. The client continues querying the server until it is satisifed, and then terminates the protocol. Messages in either direction have the same format.

Each participant has a collection of records. A records consists of a 64-bit numeric timestamp and a 256-bit ID. Each participant starts by sorting their items according to timestamp, ascending. If two timestamps are equal then items are sorted lexically by ID, ascending by first differing byte. Items may not use the max uint64 value (2**64 - 1) as a timestamp since this is reserved as a special "infinity" value.

The goal of the protocol is for the client to learn the set of IDs that it has and the server does not, and the set of items that the server has and it does not.

Varint

Varints (variable-sized unsigned integers) are represented as base-128 digits, most significant digit first, with as few digits as possible. Bit eight (the high bit) is set on each byte except the last.

Varint := <Digit+128>* <Digit>

Id

IDs are represented as byte-strings of length 32:

Id := Byte{32}

Message

A reconciliation message is a protocol version byte followed by an ordered list of ranges:

Message := <protocolVersion (Byte)> <Range>*

The current protocol version is 1, represented by the byte 0x61. Protocol version 2 will be 0x62, and so forth. If a server receives a message with a protocol version that it cannot handle, it should reply with a single byte containing the highest protocol version it supports, allowing the client to downgrade and retry its message.

Each Range corresponds to a contiguous section of the timestamp/ID space. The first Range starts at timestamp 0 and an ID of 0 bytes. Ranges are always adjacent (no gaps). If the last Range doesn't end at the special infinity value, an implicit Skip to infinity Range is appended. This means that the list of Ranges always covers the full timestamp/ID space.

Range

A Range consists of an upper bound, a mode, and a payload:

Range := <upperBound (Bound)> <mode (Varint)> <payload (Skip | Fingerprint | IdList)>

The contents of the payload is determined by mode:

  • If mode = 0, then payload is Skip, meaning the sender does not wish to process this Range further. This payload is empty:

    Skip :=
    
  • If mode = 1, then payload is a Fingerprint, which is a digest of all the IDs the sender has within the Range:

    Fingerprint := Byte{16}
    
  • If mode = 2, the payload is IdList, a variable-length list of all IDs the sender has within the Range:

    IdList := <length (Varint)> <ids (Id)>*
    

Bound

Each Range is specified by an inclusive lower bound and an exclusive upper bound. As defined above, each Range only includes an upper bound: the lower bound of a Range is the upper bound of the previous Range, or 0 timestamp/0 ID for the first Range.

A Bound consists of an encoded timestamp and a variable-length disambiguating prefix of an ID (in case multiple items have the same timestamp):

Bound := <encodedTimestamp (Varint)> <length (Varint)> <idPrefix (Byte)>*
  • The timestamp is encoded specially. The infinity timestamp is encoded as 0. All other values are encoded as 1 + offset, where offset is the difference between this timestamp and the previously encoded timestamp. The initial offset starts at 0 and resets at the beginning of each message.

    Offsets are always non-negative since the upper bound's timestamp is greater than or equal to the lower bound's timestamp, ranges in a message are always encoded in ascending order, and ranges never overlap.

  • The size of idPrefix is encoded in length, and can be between 0 and 32 bytes, inclusive. This allows implementations to use the shortest possible prefix to separate the first record of this Range from the last record of the previous Range. If these records' timestamps differ, then the length should be 0, otherwise it should be the byte-length of their common ID-prefix plus 1.

    If the idPrefix length is less than 32 then the omitted trailing bytes are implicitly 0 bytes.

Fingerprint Algorithm

The fingerprint of a Range is computed with the following algorithm:

  • Compute the addition mod 2256 of the element IDs (interpreted as 32-byte little-endian unsigned integers)
  • Concatenate with the number of elements in the Range, encoded as a Varint
  • Hash with SHA-256
  • Take the first 16 bytes