Skip to content

Commit

Permalink
Merge tag 'v1.90.0' into develop
Browse files Browse the repository at this point in the history
No significant changes since 1.90.0rc1.

- Scope transaction IDs to devices (implement [MSC3970](matrix-org/matrix-spec-proposals#3970)). ([\matrix-org#15629](matrix-org#15629))
- Remove old rows from the `cache_invalidation_stream_by_instance` table automatically (this table is unused in SQLite). ([\matrix-org#15868](matrix-org#15868))

- Fix a long-standing bug where purging history and paginating simultaneously could lead to database corruption when using workers. ([\matrix-org#15791](matrix-org#15791))
- Fix a long-standing bug where profile endpoint returned a 404 when the user's display name was empty. ([\matrix-org#16012](matrix-org#16012))
- Fix a long-standing bug where the `synapse_port_db` failed to configure sequences for application services and partial stated rooms. ([\matrix-org#16043](matrix-org#16043))
- Fix long-standing bug with deletion in dehydrated devices v2. ([\matrix-org#16046](matrix-org#16046))

- Add `org.opencontainers.image.version` labels to Docker containers [published by Matrix.org](https://hub.docker.com/r/matrixdotorg/synapse). Contributed by Mo Balaa. ([\matrix-org#15972](matrix-org#15972), [\matrix-org#16009](matrix-org#16009))

- Add a internal documentation page describing the ["streams" used within Synapse](https://matrix-org.github.io/synapse/v1.90/development/synapse_architecture/streams.html). ([\matrix-org#16015](matrix-org#16015))
- Clarify comment on the keys/upload over replication enpoint. ([\matrix-org#16016](matrix-org#16016))
- Do not expose Admin API in caddy reverse proxy example. Contributed by @NilsIrl. ([\matrix-org#16027](matrix-org#16027))

- Remove support for legacy application service paths. ([\matrix-org#15964](matrix-org#15964))
- Move support for application service query parameter authorization behind a configuration option. ([\matrix-org#16017](matrix-org#16017))

- Update SQL queries to inline boolean parameters as supported in SQLite 3.27. ([\matrix-org#15525](matrix-org#15525))
- Allow for the configuration of the backoff algorithm for federation destinations. ([\matrix-org#15754](matrix-org#15754))
- Allow modules to check whether the current worker is configured to run background tasks. ([\matrix-org#15991](matrix-org#15991))
- Update support for [MSC3958](matrix-org/matrix-spec-proposals#3958) to match the latest revision of the MSC. ([\matrix-org#15992](matrix-org#15992))
- Allow modules to schedule delayed background calls. ([\matrix-org#15993](matrix-org#15993))
- Properly overwrite the `redacts` content-property for forwards-compatibility with room versions 1 through 10. ([\matrix-org#16013](matrix-org#16013))
- Fix building the nix development environment on MacOS systems. ([\matrix-org#16019](matrix-org#16019))
- Remove leading and trailing spaces when setting a display name. ([\matrix-org#16031](matrix-org#16031))
- Combine duplicated code. ([\matrix-org#16023](matrix-org#16023))
- Collect additional metrics from `ResponseCache` for eviction. ([\matrix-org#16028](matrix-org#16028))
- Fix endpoint improperly declaring support for MSC3814. ([\matrix-org#16068](matrix-org#16068))
- Drop backwards compat hack for event serialization. ([\matrix-org#16069](matrix-org#16069))

* Update PyYAML to 6.0.1. ([\matrix-org#16011](matrix-org#16011))
* Bump cryptography from 41.0.2 to 41.0.3. ([\matrix-org#16048](matrix-org#16048))
* Bump furo from 2023.5.20 to 2023.7.26. ([\matrix-org#16077](matrix-org#16077))
* Bump immutabledict from 2.2.4 to 3.0.0. ([\matrix-org#16034](matrix-org#16034))
* Update certifi to 2023.7.22 and pygments to 2.15.1. ([\matrix-org#16044](matrix-org#16044))
* Bump jsonschema from 4.18.3 to 4.19.0. ([\matrix-org#16081](matrix-org#16081))
* Bump phonenumbers from 8.13.14 to 8.13.18. ([\matrix-org#16076](matrix-org#16076))
* Bump regex from 1.9.1 to 1.9.3. ([\matrix-org#16073](matrix-org#16073))
* Bump serde from 1.0.171 to 1.0.175. ([\matrix-org#15982](matrix-org#15982))
* Bump serde from 1.0.175 to 1.0.179. ([\matrix-org#16033](matrix-org#16033))
* Bump serde from 1.0.179 to 1.0.183. ([\matrix-org#16074](matrix-org#16074))
* Bump serde_json from 1.0.103 to 1.0.104. ([\matrix-org#16032](matrix-org#16032))
* Bump service-identity from 21.1.0 to 23.1.0. ([\matrix-org#16038](matrix-org#16038))
* Bump types-commonmark from 0.9.2.3 to 0.9.2.4. ([\matrix-org#16037](matrix-org#16037))
* Bump types-jsonschema from 4.17.0.8 to 4.17.0.10. ([\matrix-org#16036](matrix-org#16036))
* Bump types-netaddr from 0.8.0.8 to 0.8.0.9. ([\matrix-org#16035](matrix-org#16035))
* Bump types-opentracing from 2.4.10.5 to 2.4.10.6. ([\matrix-org#16078](matrix-org#16078))
* Bump types-setuptools from 68.0.0.0 to 68.0.0.3. ([\matrix-org#16079](matrix-org#16079))
  • Loading branch information
yingziwu committed Aug 21, 2023
2 parents ce16c05 + 2963822 commit 046021b
Show file tree
Hide file tree
Showing 65 changed files with 2,013 additions and 601 deletions.
14 changes: 13 additions & 1 deletion .github/workflows/docker.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,16 @@ jobs:
- name: Inspect builder
run: docker buildx inspect

- name: Checkout repository
uses: actions/checkout@v3

- name: Extract version from pyproject.toml
# Note: explicitly requesting bash will mean bash is invoked with `-eo pipefail`, see
# https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsshell
shell: bash
run: |
echo "SYNAPSE_VERSION=$(grep "^version" pyproject.toml | sed -E 's/version\s*=\s*["]([^"]*)["]/\1/')" >> $GITHUB_ENV
- name: Log in to GHCR
uses: docker/login-action@v2
with:
Expand All @@ -48,7 +58,9 @@ jobs:
uses: docker/build-push-action@v4
with:
push: true
labels: "gitsha1=${{ github.sha }}"
labels: |
gitsha1=${{ github.sha }}
org.opencontainers.image.version=${{ env.SYNAPSE_VERSION }}
tags: "${{ steps.set-tag.outputs.tags }}"
file: "docker/Dockerfile"
platforms: linux/amd64
70 changes: 70 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,73 @@
# Synapse 1.90.0 (2023-08-15)

No significant changes since 1.90.0rc1.


# Synapse 1.90.0rc1 (2023-08-08)

### Features

- Scope transaction IDs to devices (implement [MSC3970](https://github.com/matrix-org/matrix-spec-proposals/pull/3970)). ([\#15629](https://github.com/matrix-org/synapse/issues/15629))
- Remove old rows from the `cache_invalidation_stream_by_instance` table automatically (this table is unused in SQLite). ([\#15868](https://github.com/matrix-org/synapse/issues/15868))

### Bugfixes

- Fix a long-standing bug where purging history and paginating simultaneously could lead to database corruption when using workers. ([\#15791](https://github.com/matrix-org/synapse/issues/15791))
- Fix a long-standing bug where profile endpoint returned a 404 when the user's display name was empty. ([\#16012](https://github.com/matrix-org/synapse/issues/16012))
- Fix a long-standing bug where the `synapse_port_db` failed to configure sequences for application services and partial stated rooms. ([\#16043](https://github.com/matrix-org/synapse/issues/16043))
- Fix long-standing bug with deletion in dehydrated devices v2. ([\#16046](https://github.com/matrix-org/synapse/issues/16046))

### Updates to the Docker image

- Add `org.opencontainers.image.version` labels to Docker containers [published by Matrix.org](https://hub.docker.com/r/matrixdotorg/synapse). Contributed by Mo Balaa. ([\#15972](https://github.com/matrix-org/synapse/issues/15972), [\#16009](https://github.com/matrix-org/synapse/issues/16009))

### Improved Documentation

- Add a internal documentation page describing the ["streams" used within Synapse](https://matrix-org.github.io/synapse/v1.90/development/synapse_architecture/streams.html). ([\#16015](https://github.com/matrix-org/synapse/issues/16015))
- Clarify comment on the keys/upload over replication enpoint. ([\#16016](https://github.com/matrix-org/synapse/issues/16016))
- Do not expose Admin API in caddy reverse proxy example. Contributed by @NilsIrl. ([\#16027](https://github.com/matrix-org/synapse/issues/16027))

### Deprecations and Removals

- Remove support for legacy application service paths. ([\#15964](https://github.com/matrix-org/synapse/issues/15964))
- Move support for application service query parameter authorization behind a configuration option. ([\#16017](https://github.com/matrix-org/synapse/issues/16017))

### Internal Changes

- Update SQL queries to inline boolean parameters as supported in SQLite 3.27. ([\#15525](https://github.com/matrix-org/synapse/issues/15525))
- Allow for the configuration of the backoff algorithm for federation destinations. ([\#15754](https://github.com/matrix-org/synapse/issues/15754))
- Allow modules to check whether the current worker is configured to run background tasks. ([\#15991](https://github.com/matrix-org/synapse/issues/15991))
- Update support for [MSC3958](https://github.com/matrix-org/matrix-spec-proposals/pull/3958) to match the latest revision of the MSC. ([\#15992](https://github.com/matrix-org/synapse/issues/15992))
- Allow modules to schedule delayed background calls. ([\#15993](https://github.com/matrix-org/synapse/issues/15993))
- Properly overwrite the `redacts` content-property for forwards-compatibility with room versions 1 through 10. ([\#16013](https://github.com/matrix-org/synapse/issues/16013))
- Fix building the nix development environment on MacOS systems. ([\#16019](https://github.com/matrix-org/synapse/issues/16019))
- Remove leading and trailing spaces when setting a display name. ([\#16031](https://github.com/matrix-org/synapse/issues/16031))
- Combine duplicated code. ([\#16023](https://github.com/matrix-org/synapse/issues/16023))
- Collect additional metrics from `ResponseCache` for eviction. ([\#16028](https://github.com/matrix-org/synapse/issues/16028))
- Fix endpoint improperly declaring support for MSC3814. ([\#16068](https://github.com/matrix-org/synapse/issues/16068))
- Drop backwards compat hack for event serialization. ([\#16069](https://github.com/matrix-org/synapse/issues/16069))

### Updates to locked dependencies

* Update PyYAML to 6.0.1. ([\#16011](https://github.com/matrix-org/synapse/issues/16011))
* Bump cryptography from 41.0.2 to 41.0.3. ([\#16048](https://github.com/matrix-org/synapse/issues/16048))
* Bump furo from 2023.5.20 to 2023.7.26. ([\#16077](https://github.com/matrix-org/synapse/issues/16077))
* Bump immutabledict from 2.2.4 to 3.0.0. ([\#16034](https://github.com/matrix-org/synapse/issues/16034))
* Update certifi to 2023.7.22 and pygments to 2.15.1. ([\#16044](https://github.com/matrix-org/synapse/issues/16044))
* Bump jsonschema from 4.18.3 to 4.19.0. ([\#16081](https://github.com/matrix-org/synapse/issues/16081))
* Bump phonenumbers from 8.13.14 to 8.13.18. ([\#16076](https://github.com/matrix-org/synapse/issues/16076))
* Bump regex from 1.9.1 to 1.9.3. ([\#16073](https://github.com/matrix-org/synapse/issues/16073))
* Bump serde from 1.0.171 to 1.0.175. ([\#15982](https://github.com/matrix-org/synapse/issues/15982))
* Bump serde from 1.0.175 to 1.0.179. ([\#16033](https://github.com/matrix-org/synapse/issues/16033))
* Bump serde from 1.0.179 to 1.0.183. ([\#16074](https://github.com/matrix-org/synapse/issues/16074))
* Bump serde_json from 1.0.103 to 1.0.104. ([\#16032](https://github.com/matrix-org/synapse/issues/16032))
* Bump service-identity from 21.1.0 to 23.1.0. ([\#16038](https://github.com/matrix-org/synapse/issues/16038))
* Bump types-commonmark from 0.9.2.3 to 0.9.2.4. ([\#16037](https://github.com/matrix-org/synapse/issues/16037))
* Bump types-jsonschema from 4.17.0.8 to 4.17.0.10. ([\#16036](https://github.com/matrix-org/synapse/issues/16036))
* Bump types-netaddr from 0.8.0.8 to 0.8.0.9. ([\#16035](https://github.com/matrix-org/synapse/issues/16035))
* Bump types-opentracing from 2.4.10.5 to 2.4.10.6. ([\#16078](https://github.com/matrix-org/synapse/issues/16078))
* Bump types-setuptools from 68.0.0.0 to 68.0.0.3. ([\#16079](https://github.com/matrix-org/synapse/issues/16079))

# Synapse 1.89.0 (2023-08-01)

No significant changes since 1.89.0rc1.
Expand Down
30 changes: 15 additions & 15 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions debian/changelog
Original file line number Diff line number Diff line change
@@ -1,3 +1,15 @@
matrix-synapse-py3 (1.90.0) stable; urgency=medium

* New Synapse release 1.90.0.

-- Synapse Packaging team <[email protected]> Tue, 15 Aug 2023 11:17:34 +0100

matrix-synapse-py3 (1.90.0~rc1) stable; urgency=medium

* New Synapse release 1.90.0rc1.

-- Synapse Packaging team <[email protected]> Tue, 08 Aug 2023 15:29:34 +0100

matrix-synapse-py3 (1.89.0) stable; urgency=medium

* New Synapse release 1.89.0.
Expand Down
1 change: 1 addition & 0 deletions docs/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@
- [Cancellation](development/synapse_architecture/cancellation.md)
- [Log Contexts](log_contexts.md)
- [Replication](replication.md)
- [Streams](development/synapse_architecture/streams.md)
- [TCP Replication](tcp_replication.md)
- [Faster remote joins](development/synapse_architecture/faster_joins.md)
- [Internal Documentation](development/internal_documentation/README.md)
Expand Down
157 changes: 157 additions & 0 deletions docs/development/synapse_architecture/streams.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
## Streams

Synapse has a concept of "streams", which are roughly described in [`id_generators.py`](
https://github.com/matrix-org/synapse/blob/develop/synapse/storage/util/id_generators.py
).
Generally speaking, streams are a series of notifications that something in Synapse's database has changed that the application might need to respond to.
For example:

- The events stream reports new events (PDUs) that Synapse creates, or that Synapse accepts from another homeserver.
- The account data stream reports changes to users' [account data](https://spec.matrix.org/v1.7/client-server-api/#client-config).
- The to-device stream reports when a device has a new [to-device message](https://spec.matrix.org/v1.7/client-server-api/#send-to-device-messaging).

See [`synapse.replication.tcp.streams`](
https://github.com/matrix-org/synapse/blob/develop/synapse/replication/tcp/streams/__init__.py
) for the full list of streams.

It is very helpful to understand the streams mechanism when working on any part of Synapse that needs to respond to changes—especially if those changes are made by different workers.
To that end, let's describe streams formally, paraphrasing from the docstring of [`AbstractStreamIdGenerator`](
https://github.com/matrix-org/synapse/blob/a719b703d9bd0dade2565ddcad0e2f3a7a9d4c37/synapse/storage/util/id_generators.py#L96
).

### Definition

A stream is an append-only log `T1, T2, ..., Tn, ...` of facts[^1] which grows over time.
Only "writers" can add facts to a stream, and there may be multiple writers.

Each fact has an ID, called its "stream ID".
Readers should only process facts in ascending stream ID order.

Roughly speaking, each stream is backed by a database table.
It should have a `stream_id` (or similar) bigint column holding stream IDs, plus additional columns as necessary to describe the fact.
Typically, a fact is expressed with a single row in its backing table.[^2]
Within a stream, no two facts may have the same stream_id.

> _Aside_. Some additional notes on streams' backing tables.
>
> 1. Rich would like to [ditch the backing tables](https://github.com/matrix-org/synapse/issues/13456).
> 2. The backing tables may have other uses.
> For example, the events table serves backs the events stream, and is read when processing new events.
> But old rows are read from the table all the time, whenever Synapse needs to lookup some facts about an event.
> 3. Rich suspects that sometimes the stream is backed by multiple tables, so the stream proper is the union of those tables.
Stream writers can "reserve" a stream ID, and then later mark it as having being completed.
Stream writers need to track the completion of each stream fact.
In the happy case, completion means a fact has been written to the stream table.
But unhappy cases (e.g. transaction rollback due to an error) also count as completion.
Once completed, the rows written with that stream ID are fixed, and no new rows
will be inserted with that ID.

### Current stream ID

For any given stream reader (including writers themselves), we may define a per-writer current stream ID:

> The current stream ID _for a writer W_ is the largest stream ID such that
> all transactions added by W with equal or smaller ID have completed.
Similarly, there is a "linear" notion of current stream ID:

> The "linear" current stream ID is the largest stream ID such that
> all facts (added by any writer) with equal or smaller ID have completed.
Because different stream readers A and B learn about new facts at different times, A and B may disagree about current stream IDs.
Put differently: we should think of stream readers as being independent of each other, proceeding through a stream of facts at different rates.

**NB.** For both senses of "current", that if a writer opens a transaction that never completes, the current stream ID will never advance beyond that writer's last written stream ID.

For single-writer streams, the per-writer current ID and the linear current ID are the same.
Both senses of current ID are monotonic, but they may "skip" or jump over IDs because facts complete out of order.


_Example_.
Consider a single-writer stream which is initially at ID 1.

| Action | Current stream ID | Notes |
|------------|-------------------|-------------------------------------------------|
| | 1 | |
| Reserve 2 | 1 | |
| Reserve 3 | 1 | |
| Complete 3 | 1 | current ID unchanged, waiting for 2 to complete |
| Complete 2 | 3 | current ID jumps from 1 -> 3 |
| Reserve 4 | 3 | |
| Reserve 5 | 3 | |
| Reserve 6 | 3 | |
| Complete 5 | 3 | |
| Complete 4 | 5 | current ID jumps 3->5, even though 6 is pending |
| Complete 6 | 6 | |


### Multi-writer streams

There are two ways to view a multi-writer stream.

1. Treat it as a collection of distinct single-writer streams, one
for each writer.
2. Treat it as a single stream.

The single stream (option 2) is conceptually simpler, and easier to represent (a single stream id).
However, it requires each reader to know about the entire set of writers, to ensures that readers don't erroneously advance their current stream position too early and miss a fact from an unknown writer.
In contrast, multiple parallel streams (option 1) are more complex, requiring more state to represent (map from writer to stream id).
The payoff for doing so is that readers can "peek" ahead to facts that completed on one writer no matter the state of the others, reducing latency.

Note that a multi-writer stream can be viewed in both ways.
For example, the events stream is treated as multiple single-writer streams (option 1) by the sync handler, so that events are sent to clients as soon as possible.
But the background process that works through events treats them as a single linear stream.

Another useful example is the cache invalidation stream.
The facts this stream holds are instructions to "you should now invalidate these cache entries".
We only ever treat this as a multiple single-writer streams as there is no important ordering between cache invalidations.
(Invalidations are self-contained facts; and the invalidations commute/are idempotent).

### Writing to streams

Writers need to track:
- track their current position (i.e. its own per-writer stream ID).
- their facts currently awaiting completion.

At startup,
- the current position of that writer can be found by querying the database (which suggests that facts need to be written to the database atomically, in a transaction); and
- there are no facts awaiting completion.

To reserve a stream ID, call [`nextval`](https://www.postgresql.org/docs/current/functions-sequence.html) on the appropriate postgres sequence.

To write a fact to the stream: insert the appropriate rows to the appropriate backing table.

To complete a fact, first remove it from your map of facts currently awaiting completion.
Then, if no earlier fact is awaiting completion, the writer can advance its current position in that stream.
Upon doing so it should emit an `RDATA` message[^3], once for every fact between the old and the new stream ID.

### Subscribing to streams

Readers need to track the current position of every writer.

At startup, they can find this by contacting each writer with a `REPLICATE` message,
requesting that all writers reply describing their current position in their streams.
Writers reply with a `POSITION` message.

To learn about new facts, readers should listen for `RDATA` messages and process them to respond to the new fact.
The `RDATA` itself is not a self-contained representation of the fact;
readers will have to query the stream tables for the full details.
Readers must also advance their record of the writer's current position for that stream.

# Summary

In a nutshell: we have an append-only log with a "buffer/scratchpad" at the end where we have to wait for the sequence to be linear and contiguous.


---

[^1]: we use the word _fact_ here for two reasons.
Firstly, the word "event" is already heavily overloaded (PDUs, EDUs, account data, ...) and we don't need to make that worse.
Secondly, "fact" emphasises that the things we append to a stream cannot change after the fact.

[^2]: A fact might be expressed with 0 rows, e.g. if we opened a transaction to persist an event, but failed and rolled the transaction back before marking the fact as completed.
In principle a fact might be expressed with 2 or more rows; if so, each of those rows should share the fact's stream ID.

[^3]: This communication used to happen directly with the writers [over TCP](../../tcp_replication.md);
nowadays it's done via Redis's Pubsub.
2 changes: 1 addition & 1 deletion docs/reverse_proxy.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ matrix.example.com {
}
example.com:8448 {
reverse_proxy localhost:8008
reverse_proxy /_matrix/* localhost:8008
}
```

Expand Down
Loading

0 comments on commit 046021b

Please sign in to comment.