From 6c12fd79f42cb882a9e6586a38f41f9d7e66cdce Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?K=C3=A9vin=20Dunglas?= Date: Mon, 18 May 2020 10:04:20 +0200 Subject: [PATCH] Major spec changes (#288) * Major spec changes * Clarification --- spec/mercure.md | 270 ++++++++++++++++++++++++++++-------------------- 1 file changed, 159 insertions(+), 111 deletions(-) diff --git a/spec/mercure.md b/spec/mercure.md index 8f1d7402..1857ee59 100644 --- a/spec/mercure.md +++ b/spec/mercure.md @@ -38,21 +38,22 @@ The keywords **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **S NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL**, when they appear in this document, are to be interpreted as described in [@!RFC2119]. - * Topic: The unit to which one can subscribe to changes. The topic **MUST** be identified by an - IRI [@!RFC3987] or by a string. Using an HTTPS [@!RFC7230] or HTTP [@!RFC7230] URI [@!RFC3986] - is **RECOMMENDED**. + * Topic: The unit to which one can subscribe to changes. The topic **SHOULD** be identified + by an IRI [@!RFC3987]. Using an HTTPS [@!RFC7230] or HTTP [@!RFC7230] URI [@!RFC3986] is + **RECOMMENDED**. + + * Update: The message containing the updated version of the topic. An update can be marked as + private, consequently, it must be dispatched only to subscribers allowed to receive it. + + * Topic selector: An expression matching one or several topics. * Publisher: An owner of a topic. Notifies the hub when the topic feed has been updated. As in almost all pubsub systems, the publisher is unaware of the subscribers, if any. Other pubsub systems might call the publisher the "source". Typically a website or a web API, but can also be a web browser. - * Subscriber: A client application that subscribes to real-time updates of topics. Typically a - Progressive Web App or a Mobile App, but can also be a server. - - * Target: A subscriber, or a group of subscribers. A publisher is able to securely dispatch - updates to specific targets. The target **MUST** be identified by an IRI [@!RFC3987] or by a - string. Using an HTTPS [@!RFC7230] or HTTP [@!RFC7230] URI is **RECOMMENDED**. + * Subscriber: A client application that subscribes to real-time updates of topics using topic + selectors. Typically a web or a mobile application, but can also be a server. * Hub: A server that handles subscription requests and distributes the content to subscribers when the corresponding topics have been updated. Any hub **MAY** implement its own policies on who @@ -84,20 +85,21 @@ header). The target URL of these links **MUST** be a hub implementing the Mercur The publisher **MAY** provide the following target attributes in the Link Headers: - * `last-event-id`: the globally unique identifier of the last event dispatched by the publisher - at the time of the generation of this resource. If provided, it **MUST** be passed to the - hub through a query parameter called `Last-Event-ID` and will be used to ensure that possible - updates having been made during between the resource generation time and the connection to the - hub are not lost. See (#reconciliation). If this attribute is provided, the publisher **MUST** - always set the `id` parameter when sending updates to the hub. + * `last-event-id`: the identifier of the last event dispatched by the publisher at the time of + the generation of this resource. If provided, it **MUST** be passed to the hub through a query + parameter called `Last-Event-ID` and will be used to ensure that possible updates having been + made during between the resource generation time and the connection to the hub are not lost. See + (#reconciliation). * `content-type`: the content type of the updates that will pushed by the hub. If omitted, the subscriber **MUST** assume that the content type will be the same as that of the original resource. Setting the `content-type` attribute is especially useful to hint that partial updates will be pushed, using formats such as JSON Patch [@RFC6902] or JSON Merge Patch [@RFC7386]. - * `key-set=`: the key(s) to decrypt updates encoded in the JWKS (JSON Web Key Set) format. - See (#encryption). + * `key-set`: the URL of the key set to use to decrypt updates, encoded in the JWK set format + (JSON Web Key Set) [@!RFC7517]. See (#encryption). As this key set will contain a secret + key, the publisher must ensure that only the subscriber can access to this URL. To do so, the + authorization mechanism (see (#authorization)) can be reused. All these attributes are optional. @@ -182,6 +184,29 @@ Link: ; rel="mercure" {"@id": "/books/foo", "foo": "bar", "@context": {"@language": "fr-FR"}} ~~~ +# Topic Selector + +A topic selector is an expression intended to be matched by one or several topics. A topic selector +can also be used to match other topic selectors for authorization purposes. See (#authorization). + +A topic selector can be a any string including URI Templates [@!RFC6570] and the reserved string `*` +that matches all topics. It is **RECOMMENDED** to use IRIs, URI Templates or the reserved string `*` +as topic selectors. + +Note: a URL is a valid URI template. + +To determine if a string matches a selector, the following steps must be followed: + +1. If the topic selector is `*` then the string matches the selector. + +2. If the topic selector and the string are exactly the same, the string matches the selector. This + characteristic allows to compare a URI Template with another one. + +3. If the topic selector is a valid URI Template, and that the string matches this URI Template, + the string matches the selector. + +4. Otherwise the string does not match the selector. + # Subscription The subscriber subscribes to a URL exposed by a hub to receive updates from one or many topics. @@ -191,10 +216,8 @@ publisher. The `GET` HTTP method must be used. The connection **SHOULD** use HTT mutliplexing and other advanced features of this protocol. The subscriber specifies the list of topics to get updates from by using one or several query -parameters named `topic`. The value of these query parameters **SHOULD** be a URI templates -[@!RFC6570]. - -Note: a URL is also a valid URI template. +parameters named `topic`. The `topic` query parameters **MUST** contain topic selectors. See +(#topic-selectors). The protocol doesn't specify the maximum number of `topic` parameters that can be sent, but the hub **MAY** apply an arbitrary limit. @@ -205,17 +228,19 @@ interface](https://html.spec.whatwg.org/multipage/server-sent-events.html#the-ev including, but not limited to, readable streams [@W3C.NOTE-streams-api-20161129] and [XMLHttpRequest](https://xhr.spec.whatwg.org/) (used by popular polyfills) **MAY** also be used. -The hub sends to the subscriber updates for resources matching the provided URI templates and the -provided targets (see (#authorization)). If the value of a `topic` parameter is not a valid URI -template, then the provided identifier **MUST** match exactly the resource identifier. If no targets -are specified, the update is dispatched to all subscribers. The hub **MUST** send these updates as -`text/event-stream` compliant events [!@W3C.REC-eventsource-20150203]. +The hub sends to the subscriber updates for topics matching the provided topic selectors. + +If an update is marked as `private`, the hub **MUST NOT** dispatch it to subscribers not authorized +to receive it. See (#authorization). + +The hub **MUST** send these updates as `text/event-stream` compliant events +[!@W3C.REC-eventsource-20150203]. The `data` property **MUST** contain the new version of the topic. It can be the full resource, or a partial update by using formats such as JSON Patch [@RFC6902] or JSON Merge Patch [@RFC7386]. -All other properties defined in the Server-Sent Events specification **MAY** be used and **SHOULD** -be supported by hubs. +All other properties defined in the Server-Sent Events specification **MAY** be used and **MUST** be +supported by hubs. The resource **SHOULD** be represented in a format with hypermedia capabilities such as JSON-LD [@W3C.REC-json-ld-20140116], Atom [@RFC4287], XML [@W3C.REC-xml-20081126] or HTML @@ -230,10 +255,11 @@ Example: ~~~ javascript // The subscriber subscribes to updates -// for the https://example.com/foo topic +// for the https://example.com/foo topic, the bar topic, // and to any topic matching https://example.com/books/{name} const url = new URL('https://example.com/.well-known/mercure'); url.searchParams.append('topic', 'https://example.com/foo'); +url.searchParams.append('topic', 'bar'); url.searchParams.append('topic', 'https://example.com/bar/{id}'); const eventSource = new EventSource(url); @@ -244,7 +270,8 @@ eventSource.onmessage = function ({data}) { }; ~~~ -The hub **MAY** require that subscribers are authorized to receive updates. +The hub **MAY** require subscribers and publishers to be authenticated, and **MAY** apply extra +authorization rules not defined in this specification. # Publication @@ -254,22 +281,24 @@ update, the hub dispatches it to subscribers using the established server-sent e An application **CAN** send events directly to subscribers without using an external hub server, if it is able to do so. In this case, it **MAY NOT** implement the endpoint to publish updates. -The request **MUST** be encoded using the `application/x-www-form-urlencoded` format and contains -the following data: +The request **MUST** be encoded using the `application/x-www-form-urlencoded` format +[@W3C.REC-html52-20171214] and contains the following name-value tuples: - * `topic`: IRIs of the updated topic. If this key is present several times, the first occurrence - is considered to be the canonical URL of the topic, and other ones are considered to be - alternate URLs. The hub **MUST** dispatch this update to subscribers that are subscribed to both - canonical or alternate URLs. + * `topic`: The identifiers of the updated topic. It is **RECOMMENDED** to use an IRI as + identifier. If this name is present several times, the first occurrence is considered to be the + canonical IRI of the topic, and other ones are considered to be alternate IRIs. The hub **MUST** + dispatch this update to subscribers that are subscribed to both canonical or alternate IRIs. - * `data`: the content of the new version of this topic. + * `data` (optional): the content of the new version of this topic. - * `target` (optional): target audience of this update. This key can be present several times. See - (#authorization) for further information. + * `private` (optional): if this name is set, the update **MUST NOT** be dispatched to subscribers + not authorized to receive it. See (#authorization). It is recommended to set the value to `on` + but it **CAN** contain any value including an empty string. - * `id` (optional): the topic's revision identifier: it will be used as the SSE's `id` property. If - omitted, the hub **MUST** generate a valid globally unique id. It **MAY** be a UUID [@RFC4122]. - Even if provided, the hub **MAY** ignore the id provided by the client and generate its own id. + * `id` (optional): the topic's revision identifier: it will be used as the SSE's `id` property. + If omitted, the hub **MUST** generate a valid IRI [@!RFC3987]. An UUID [@RFC4122] or a DID + [@W3C.WD-did-core-20200421] **MAY** be used. Even if provided, the hub **MAY** ignore the id + provided by the client and generate its own id. * `type` (optional): the SSE's `event` property (a specific event type). @@ -306,10 +335,11 @@ set](https://html.spec.whatwg.org/multipage/server-sent-events.html#dom-eventsou If the publisher or the subscriber is a web browser, it **SHOULD** send a cookie called `mercureAuthorization` containing the JWS when connecting to the hub. -Whenever possible, the `mercureAuthorization` cookie **SHOULD** be set during the discovery to -improve the overall security. Consequently, if the cookie is set during the discovery, both the -publisher and the hub have to share the same second level domain. The `Domain` attribute **MAY** be -used to allow the publisher and the hub to use different subdomains. See (#discovery). +Whenever possible, the `mercureAuthorization` cookie **SHOULD** be set during discovery (see +(#discovery)) to improve the overall security. Consequently, if the cookie is set during the +discovery, both the publisher and the hub have to share the same second level domain. The `Domain` +attribute **MAY** be used to allow the publisher and the hub to use different subdomains. See +(#discovery). The cookie **SHOULD** have the `Secure`, `HttpOnly` and `SameSite` attributes set. The cookie's `Path` attribute **SHOULD** also be set to the hub's URL. See (#security-considerations). @@ -323,61 +353,71 @@ allowed to, a 403 HTTP status code **SHOULD** be returned. ## Publishers Publishers **MUST** be authorized to dispatch updates to the hub, and **MUST** prove that they are -allowed to send updates. +authorized to send updates for the specified topics. -To be allowed to publish an update, the JWT presented by the publisher **MUST** contain a claim -called `mercure`, and this claim **MUST** contain a `publish` key. `mercure.publish` **MUST** -contain an array of targets the publisher is allowed to dispatch updates to. +To be allowed to publish an update, the JWS presented by the publisher **MUST** contain a claim +called `mercure`, and this claim **MUST** contain a `publish` key. `mercure.publish` contains an +array of topic selectors. See (#topic-selectors). If `mercure.publish`: * is not defined, then the publisher **MUST NOT** be authorized to dispatch any update - * contains an empty array, then the publisher is only allowed to dispatch public updates + * contains an empty array, the publisher **MUST NOT** be authorized to publish private updates, + but can publish public updates for all topics. - * contains the reserved string `*` as an array value, then the publisher is authorized to dispatch - updates to all targets +Otherwise, the hub **MUST** check that every topics of the update to dispatch matches at least one +of the topic selectors contained in `mercure.publish`. -If a topic is not public, the `POST` request sent by the publisher to the hub **MUST** contain a -list of keys named `target`. Their values **MUST** be of type `string`, and it is **RECOMMENDED** to -use valid IRIs. They can be, for instance, a user ID or a list of group IDs. If an update contains -at least one target the publisher is not authorized for, the hub **MUST NOT** dispatch the update -(even if some targets in the list are allowed) and **SHOULD** return a 403 HTTP status code. +If the publisher is not authorized for all the topics of an update, the hub **MUST NOT** dispatch +the update (even if some topics in the list are allowed) and **MUST** return a 403 HTTP status code. ## Subscribers -Subscribers **MAY** need to be authorized to connect to the hub. To receive updates destined to -specific targets, they **MUST** be authorized, and **MUST** prove they belong to at least one of the -specified targets. If the subscriber is not authorized, it **MUST NOT** receive any update having at -least one target. +To receive updates marked as `private`, a subscriber **MUST** prove that it is authorized for at +least one of the topics of this update. If the subscriber is not authorized to receive an update +marked as `private`, it **MUST NOT** receive it. + +To receive updates marked as `private`, the JWS presented by the subscriber **MUST** have a +claim named `mercure` with a key named `subscribe` that contains an array of topic selectors. See +(#topic-selectors). + +The hub **MUST** check that at least one topic of the update to dispatch matches at least one topic +selector provided in `mercure.subscribe`. + +## Payload -To receive updates destined for specific targets, the JWS presented by the subscriber **MUST** have -a claim named `mercure` with a key named `subscribe` that contains an array of strings: a list of -targets the user is authorized to receive updates for. The targets **SHOULD** be IRIs. +The `mercure` claim of the JWS **CAN** also contain user-defined values under the `payload` key. +This JSON document will be attached to the subscription and made available in subscription events. +See (#subscriptions). -If at least one target is specified, the update **MUST NOT** be sent to the subscriber by the hub, -unless the `mercure.subscribe` array of the JWS presented by the subscriber contains at least one of -the specified targets. +For instance, `mercure.payload` can contain the user ID of the subscriber, a list of groups it +belongs to, or its IP address. Storing data in `mercure.payload` is a convenient way to share data +related to one subscriber to other subscribers. -If the `mercure.subscribe` array contains the reserved string value `*`, then the subscriber is -authorized to receive updates destined for all targets. +# Reconnection, State Reconciliation and Event Sourcing {#reconciliation} -# Reconnection and State Reconciliation {#reconciliation} +The protocol allows to reconciliate states after a reconnection. It can also be used to implement an +[Event store](https://en.wikipedia.org/wiki/Event_store). -To allow re-establishment in case of connection lost, events dispatched by the hub **SHOULD** -include an `id` property. The value contained in this `id` property **SHOULD** be a globally unique -identifier. To do so, a UUID [@RFC4122] **MAY** be used. +To allow re-establishment in case of connection lost, events dispatched by the hub **MUST** include +an `id` property. The value contained in this `id` property **SHOULD** be an IRI [@!RFC3987]. An +UUID [@RFC4122] or a DID [@W3C.WD-did-core-20200421] **MAY** be used. According to the server-sent events specification, in case of connection lost the subscriber will try to automatically re-connect. During the re-connection, the subscriber **MUST** send the last received event id in a [Last-Event-ID](https://html.spec.whatwg.org/multipage/iana.html#last-event-id) HTTP header. -The server-sent events specification doesn't allow this HTTP header to be set during the first -connection (before a reconnection). In order to fetch any update dispatched between the initial -resource generation by the publisher and the connection to the hub, the subscriber **MUST** send the -event id provided during the discovery in the `last-event-id` link's attribute in a query parameter -named `Last-Event-ID` when connecting to the hub. See (#discovery). +In order to fetch any update dispatched between the initial resource generation by the publisher and +the connection to the hub, the subscriber **MUST** send the event id provided during the discovery +in the `last-event-id` as the last event id. See (#discovery). + +`EventSource` implementations may not allow to set HTTP headers during the first connection (before +a reconnection) and implementations in web browsers don't allow to set it. + +To work around this problem, the hub **MUST** also allow to pass the last event id in a query +parameter named `Last-Event-ID`. If both the `Last-Event-ID` HTTP header and the query parameter are present, the HTTP header **MUST** take precedence. @@ -385,29 +425,45 @@ If both the `Last-Event-ID` HTTP header and the query parameter are present, the If the `Last-Event-ID` HTTP header or query parameter exists, the hub **SHOULD** send all events published following the one bearing this identifier to the subscriber. -The hub **MAY** discard some messages for operational reasons. The subscriber **MUST NOT** assume -that no update will be lost, and **MUST** re-fetch the original topic to ensure this (for instance, -after a long disconnection time). +The reserved value `-1` can be used to hint the hub to send all updates it has for the subscribed +topics. According to its own policy, the hub **MAY** or **MAY NOT** fulfil this request. + +The hub **MAY** discard some events for operational reasons. If the hub is not able to send all +requested events, it **MUST** set a `Last-Event-ID` header on the HTTP response containing the id of +event preceding the first sent to the subscriber. If such event doesn't exist, the hub **MUST** set +the `Last-Event-ID` header it sends to the reserved value `-1`. This value indicates that all events +stored for the subscribed topics have been sent to the subscriber. -The hub **MAY** also specify the reconnection time using the `retry` key, as specified in the +The subscriber **MUST NOT** assume that no events will be lost (it may happen, for example after +a long disconnection time). In some cases (for instance when sending partial updates in the JSON +Patch [@RFC6902] format, or when using the hub as an event store), updates lost can cause data lost. +To check if a data lost ocurred, the subscriber **CAN** check if the requested last event id and +the value of the received `Last-Event-ID` match. In case of data lost, the subscriber **SHOULD** +re-fetch the original topic. + +Note: Native `EventSource` implementations don't give access to headers associated with the HTTP +response, however polyfills and server-sent events clients in most programming languages allow it. + +The hub **CAN** also specify the reconnection time using the `retry` key, as specified in the server-sent events format. -# Subscription Events +# Subscription Events {#subscriptions} The hub **MAY** publish an update when a subscription to a topic is created or terminated. If this feature is implemented by the hub, an update **MUST** be dispatched every time that a subscription is created or terminated, and for each topic to which the client subscribes. The topic of this update **MUST** follow the pattern -`https://mercure.rocks/subscriptions/{topic}/{subscriptionID}` where `topic` is the URL-encoded -value of the subscribed topic and `subscriptionID` is an unique identifier for this subscription. -`subscriptionID` **MAY** be a UUID [@RFC4122]. +`/.well-known/mercure/subscriptions/{subscriptionID}/{topic}` where `subscriptionID` is an +unique identifier for this subscription and `topic` is the URL-encoded value of the subscribed +topic. `subscriptionID` **SHOULD** be an URL-encoded IRI [@!RFC3987]. An UUID [@RFC4122] or a DID +[@W3C.WD-did-core-20200421] **MAY** be used. The content of the update **MUST** be a JSON-LD [@!W3C.REC-json-ld-20140116] document containing at least the following properties: - * `@id`: the identifier of this update, it **MUST** be the same value as the subscription update's - topic + * `@id`: the identifier of this update, it **MUST** be the same value as the main subscription + update's topic * `@type`: the fixed value `https://mercure.rocks/Subscription` @@ -415,21 +471,13 @@ least the following properties: * `active`: `true` when the subscription is created, and `false` when it is terminated - * `subscribe`: the subscription targets provided by the subscriber, see (#authorization) - - * `publish`: the publication targets provided by the subscriber, see (#authorization) - - * `address` (optional): the IP address ([@!RFC0791], [@!RFC8200]) of the subscriber + * `payload` (optional): the content of `mercure.payload` in the subscriber's JWS (see + (#authorization)) The JSON-LD document **MAY** contain other properties. In order to only allow authorized subscribers to receive subscription events, the subscription -update **MUST** be marked as intended for subscribers providing the following targets: - - * the fixed value `https://mercure.rocks/targets/subscriptions` - - * a URL following the pattern `https://mercure.rocks/targets/subscriptions/{topic}` where topic is - the URL-encoded value of the subscribed topic +update **MUST** be marked as `private`. # Encryption @@ -437,13 +485,13 @@ Using HTTPS does not prevent the hub from accessing the update's content. Depend privacy of information contained in the update, it **MAY** be necessary to prevent eavesdropping by the hub. -To make sure that the message content can not be read by the hub, the publisher **MAY** encode the +To make sure that the message content can not be read by the hub, the publisher **MAY** encrypt the message before sending it to the hub. The publisher **SHOULD** use JSON Web Encryption [@!RFC7516] -to encrypt the update content. The publisher **MAY** provide the relevant encryption key(s) in the -`key-set` attribute of the Link HTTP header during the discovery. See (#discovery). The `key-set` -attribute **SHOULD** contain a key encoded using the JSON Web Key Set [@!RFC7517] format. Any -other out-of-band mechanism **MAY** be used instead to share the key between the publisher and the -subscriber. +to encrypt the update content. The publisher **MAY** provide the URL pointing to the relevant +encryption key(s) in the `key-set` attribute of the Link HTTP header during the discovery. See +(#discovery). The `key-set` attribute **MUST** contain a key encoded using the JSON Web Key Set +[@!RFC7517] format. Any other out-of-band mechanism **MAY** be used instead to share the key between +the publisher and the subscriber. Update encryption is considered a best practice to prevent mass surveillance. This is especially relevant if the hub is managed by an external provider. @@ -476,21 +524,21 @@ Type" registry with the following entry: # Security Considerations -The confidentiality of the secret key(s) used to generate the JWTs is a primary concern. The +The confidentiality of the secret key(s) used to generate the JWSs is a primary concern. The secret key(s) **MUST** be stored securely. They **MUST** be revoked immediately in the event of compromission. -Possessing valid JWTs allows any client to subscribe, or to publish to the hub. Their -confidentiality **MUST** therefore be ensured. To do so, JWTs **MUST** only be transmitted over +Possessing valid JWSs allows any client to subscribe, or to publish to the hub. Their +confidentiality **MUST** therefore be ensured. To do so, JWSs **MUST** only be transmitted over secure connections. -Also, when the client is a web browser, the JWT **SHOULD** not be made accessible +Also, when the client is a web browser, the JWS **SHOULD** not be made accessible to JavaScript scripts for resilience against [Cross-site Scripting (XSS) attacks](https://owasp.org/www-community/attacks/xss/). It's the main reason why, when the client is a web browser, using `HttpOnly` cookies as the authorization mechanism **SHOULD** always be preferred. -In the event of compromission, revoking JWTs before their expiration is often difficult. To that +In the event of compromission, revoking JWSs before their expiration is often difficult. To that end, using short-lived tokens is strongly **RECOMMENDED**. The publish endpoint of the hub may be targeted by [Cross-Site Request Forgery (CSRF)