Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC2197: Search Filtering in Federation /publicRooms #2197

Merged
merged 8 commits into from
Aug 20, 2019
108 changes: 108 additions & 0 deletions proposals/2197-search_filter_in_federation_publicrooms.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# MSC2197 – Search Filtering in Public Room Directory over Federation

This MSC proposes introducing the `POST` method to the `/publicRooms` Federation API endpoint,
including a `filter` argument which allows server-side filtering of rooms.

We are motivated by the opportunity to make searching the public Room Directory more efficient over
Federation.

## Motivation

Although the Client-Server API includes the filtering capability in `/publicRooms`, the Federation API
currently does not.

This leads to a situation that is wasteful of effort and network traffic for both homeservers; searching
a remote server involves first downloading its entire room list and then filtering afterwards.

## Proposal

Having a filtered `/publicRooms` API endpoint means that irrelevant or uninteresting rooms can be
excluded from a room directory query response. In turn, this means that these room directory query
responses can be generated more quickly and then, due to their smaller size, transmitted over the network more quickly, owing to their
reivilibre marked this conversation as resolved.
Show resolved Hide resolved
smaller size.

These benefits have been exploited in the Client-Server API, which implements search filtering
using the `filter` JSON body parameter in the `POST` method on the `/publicRooms` endpoint.

It should be noted that the Client-Server and Federation APIs both currently possess `/publicRooms`
endpoints which, whilst similar, are not equivalent.

Ignoring the `server` parameter in the Client-Server API, the following specific differences are
noticed:

* the Federation API endpoint only accepts the `GET` method whereas the Client-Server API accepts
the `POST` method as well.
* the Federation API accepts `third_party_instance_id` and `include_all_networks` parameters through
reivilibre marked this conversation as resolved.
Show resolved Hide resolved
the `GET` method, whereas the Client-Server API only features these in the `POST` method.

This MSC proposes to introduce support for the `POST` method in the Federation API's `/publicRooms`
endpoint, with all but one of the parameters from that of the Client-Server API. The copied parameters
shall have the same semantics as they do in the Client-Server API.

In the interest of clarity, the proposed parameter set is listed below, along with a repetition of the
definitions of used substructures. The response format has been omitted as it is the same as that of
the current Client-Server and Federation APIs, which do not differ in this respect.

### `POST /_matrix/federation/v1/publicRooms`

#### Query Parameters

There are no query parameters. Notably, we intentionally do not inherit the `server` query parameter
from the Client-Server API.

#### JSON Body Parameters

* `limit` (`integer`): Limit the number of search results returned.
* `since` (`string`): A pagination token from a previous request, allowing clients to get the next (or previous) batch of rooms. The direction of pagination is specified solely by which token is supplied, rather than via an explicit flag.
reivilibre marked this conversation as resolved.
Show resolved Hide resolved
* `filter` (`Filter`): Filter to apply to the results.
* `include_all_networks` (`boolean`): Whether or not to include all known networks/protocols from application services on the homeserver. Defaults to false.
* `third_party_instance_id` (`boolean`): The specific third party network/protocol to request from the homeserver. Can only be used if `include_all_networks` is false.

### `Filter` Parameters

* `generic_search_term` (`string`): A string to search for in the room metadata, e.g. name, topic, canonical alias etc. (Optional).

## Tradeoffs

An alternative approach might be for implementations to carry on as they are but also
cache (and potentially index) remote homeservers' room directories. This would not require
a spec change.

However, this would be unsatisfactory because it would lead to outdated room directory results and/or
caches that provide no benefit (as room directory searches are generally infrequent enough that a cache
would be outdated before being reused, on small – if not most – homeservers).

## Potential issues

### Backwards Compatibility

After this proposal is implemented, outdated homeservers will still exist which do not support the room
filtering functionality specified in this MSC. In this case, homeservers will have to fall-back to downloading
the entire room directory and performing the filtering themselves, as currently happens. This is not considered
a problem since it will not lead to a situation that is any worse than the current one, and it is expected that
large homeservers – which cause the most work with the current search implementations – would be quick to upgrade
to support this feature once it is available.

In addition, as the `POST` method was not previously accepted on the `/publicRooms` endpoint over federation,
then it is not a difficult task to use a `405 Method Not Allowed` HTTP response as a signal that fallback is required.

## Security considerations

There are no known security considerations.

## Privacy considerations

At current, remote homeservers do not learn about what a user has searched for.
reivilibre marked this conversation as resolved.
Show resolved Hide resolved

However, under this proposal, in the context of using the Federation API to forward on queries from the Client-Server
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can certainly see this being an issue for some users, although I would put forward that you'd expect your search request to be forwarded to a remote homeserver if you are searching within it. There probably needs to be a warning within clients that a remote search will take your data outside the jurisdiction of your own homeserver.

Hopefully something like this wouldn't prompt the need for a full blown consent screen.

API, a client's homeserver would end up sharing the client's search terms with a remote homeserver, which may not be
operated by the same party or even trusted. For example, users' search terms could be logged.

It is uncertain, to the author of this MSC, what implications this has with regards to legislation, such as GDPR.

## Conclusion

By allowing homeservers to pass on search filters, we enable remote homeservers' room directories to be
efficiently searched, because, realistically speaking, only the remote homeserver is in a position to be
able to perform search efficiently, by taking advantage of indexing and other such optimisations.