-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Conversation
Some differences: - We use GET instead of POST for the events endpoint. - We don't delete the device implicitly when fetching events. - We allow the fetch device messages endpoint for both dehydrated and the current requesters device. Signed-off-by: Nicolas Werner <[email protected]>
Return errors on invalid token formats and terminate regex in both cases
Co-authored-by: Hubert Chathi <[email protected]>
f6e1ff1
to
45783ef
Compare
45783ef
to
ccd6c12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable overall, I left a few questions (and a few questions on the MSC too).
@@ -347,14 +356,55 @@ async def on_POST(self, request: SynapseRequest) -> Tuple[int, JsonDict]: | |||
return 200, result | |||
|
|||
|
|||
class DehydratedDeviceEventsServlet(RestServlet): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for multiple clients to be hitting this endpoint at once? What solves that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be something like a scenario where (let's say) two devices have called GET /dehydrated_device
and received the dehydrated device id, and are now both calling this endpoint with the id?
I don't think there is anything currently preventing this - iirc from the MSC proposal, this sort of scenario was the rationale for deleting the dehydrated device the first time this endpoint is called, as that functions as "claiming" the dehydrated device so it can't be used by other clients. This behavior was contentious on the MSC though - I implemented deleting the device after all the messages are delivered, but that won't help in the case where two devices are calling this endpoint at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah probably a conversation for the MSC, but seems quite likely now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what will help, and will be necessary for perfect resumption, is that we shouldn't delete the events untill the device is fully deleted.
This way any device can attempt to rehydrated the device at any point in time, even if another one is already rehydrating it, until one device succeeds and deletes the dehydrated device.
e366aa1
to
a52a25a
Compare
Args: | ||
requester: the user requesting the messages | ||
device_id: ID of the dehydrated device | ||
since_token: stream id to start from when fetching messages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a /sync next_batch stream token or a stream id? (I'd expect stream ids to be integers)
Content-Type: application/json | ||
|
||
{ | ||
"device_id": "dehydrated_device_id", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusing---the MSC doesn't have a device_id field on PUT /dehydrated_device
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explained the reasoning here somewhat: #15929 (comment).
I think using the public key of the device as the device id makes sense here. It's probably something we should be doing across the board from the start for /login
as well, considering the various traps we have because a device ID can be assigned to differing device keys.
synapse/rest/client/devices.py
Outdated
DELETE /org.matrix.msc3814.v1/dehydrated_device | ||
HTTP/1.1 200 OK | ||
|
||
Content-Type: application/json | ||
{ | ||
"device_id": "dehydrated_device_id", | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems sane, but I can't see this in the MSC text.
@@ -283,19 +369,58 @@ async def on_GET(self, request: SynapseRequest) -> Tuple[int, JsonDict]: | |||
else: | |||
raise errors.NotFoundError("No dehydrated device available") | |||
|
|||
async def on_DELETE(self, request: SynapseRequest) -> Tuple[int, JsonDict]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this endpoint will always be available under /org.matrix.msc2697.v2/dehydrated_device
if msc2697
is True, even though it's not defined in that MSC. That is probably fine but I just wanted to call it out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, my patch created a separate class to avoid exactly this scenario. @H-Shay did my patch not apply correctly with git am
or why was his changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, my patch created a separate class to avoid exactly this scenario. @H-Shay did my patch not apply correctly with
git am
or why was his changed?
It appears that I incorrectly applied the patch - I believe this is fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, git am
still wasn't used, that's not directly problematic if you applied the patch correctly, but I rebased the patch onto your branch to make this easier. In the future you should be able to apply such patches by adding .patch
the URL and running:
curl https://github.com/matrix-org/synapse/commit/777b3056531ff37c91014abbd029ac079d8d7c25.patch | git am
This will then preserve the diff and commit message as it was provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@poljar - I applied the patch via git apply
. That being said I think it applied correctly, are you okay with me going ahead and merging this? I know there is a desire to get this in for the RC before tomorrow. - nevermind, just saw the comments below.
synapse/rest/client/devices.py
Outdated
class PutBody(RequestBodyModel): | ||
device_data: DehydratedDeviceDataModel | ||
device_id: Optional[StrictStr] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like we are allowing the clients to specify a dehydrated device ID. Sanity check: is there any risk that Alice and Bob can create dehydrated devices with the same ID, and in doing so clobber each other's devices?
From a quick look at this and the existing source, it seems that dehydrated devices are always keyed off a (user, device) pair, so it should be fine... but it might be worth double checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Devices are keyed off a user, yes, so Alice and Bob clashing can't happen. What can happen is that two Alice's devices try to create a dehydrated device, which is fine, one will overwrite the other and if they use the public key of the device as the device id a clash is very unlikely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have taken another look and I don't have any objections. You might want to wait for proper feedback from the other two though---I have minimal context.
cc2022d
to
f7e0933
Compare
@poljar if you are happy with this I think we are good to merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is fine for now. Tested this a bit with a client and it does what we need.
We'll have to iterate on this and the MSC to improve the resumption situation though.
"Device key(s) not found, these must be provided.", | ||
) | ||
|
||
# TODO: Those two operations, creating a device and storing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do something about these TODO items? I guess the second one is fine, and the first one can be worked on after this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a little confused about what this TODO meant, as storing/creating the device and storing the keys are already two different calls - could you clarify what you meant by this and I can address it in a follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that they are two different calls, but they shouldn't be. We shouldn't have the possibility of creating a dehydrated device that doesn't have keys attached to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah that makes sense, thank you for clarifying!
@@ -347,14 +356,55 @@ async def on_POST(self, request: SynapseRequest) -> Tuple[int, JsonDict]: | |||
return 200, result | |||
|
|||
|
|||
class DehydratedDeviceEventsServlet(RestServlet): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what will help, and will be necessary for perfect resumption, is that we shouldn't delete the events untill the device is fully deleted.
This way any device can attempt to rehydrated the device at any point in time, even if another one is already rehydrating it, until one device succeeds and deletes the dehydrated device.
errcode=Codes.INVALID_PARAM, | ||
) | ||
|
||
# if we have a since token, delete any to-device messages before that token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned already, I think we should not delete any delivered to-device messages for dehydrated devices.
We can do this in a later PR, but I think that this will be crucial to ease the resumption of rehydration and ensure that room keys don't get lost because a device aborted the rehydration step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can open a follow-up PR to stop deleting the delivered to-device messages and address the TODOs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The follow-up PR is here: #16010
Merged #15929 into develop. |
No significant changes since 1.89.0rc1. - Add Unix Socket support for HTTP Replication Listeners. [Document and provide usage instructions](https://matrix-org.github.io/synapse/v1.89/usage/configuration/config_documentation.html#listeners) for utilizing Unix sockets in Synapse. Contributed by Jason Little. ([\matrix-org#15708](matrix-org#15708), [\matrix-org#15924](matrix-org#15924)) - Allow `+` in Matrix IDs, per [MSC4009](matrix-org/matrix-spec-proposals#4009). ([\matrix-org#15911](matrix-org#15911)) - Support room version 11 from [MSC3820](matrix-org/matrix-spec-proposals#3820). ([\matrix-org#15912](matrix-org#15912)) - Allow configuring the set of workers to proxy outbound federation traffic through via `outbound_federation_restricted_to`. ([\matrix-org#15913](matrix-org#15913), [\matrix-org#15969](matrix-org#15969)) - Implement [MSC3814](matrix-org/matrix-spec-proposals#3814), dehydrated devices v2/shrivelled sessions and move [MSC2697](matrix-org/matrix-spec-proposals#2697) behind a config flag. Contributed by Nico from Famedly, H-Shay and poljar. ([\matrix-org#15929](matrix-org#15929)) - Fix a long-standing bug where remote invites weren't correctly pushed. ([\matrix-org#15820](matrix-org#15820)) - Fix background schema updates failing over a large upgrade gap. ([\matrix-org#15887](matrix-org#15887)) - Fix a bug introduced in 1.86.0 where Synapse starting with an empty `experimental_features` configuration setting. ([\matrix-org#15925](matrix-org#15925)) - Fixed deploy annotations in the provided Grafana dashboard config, so that it shows for any homeserver and not just matrix.org. Contributed by @wrjlewis. ([\matrix-org#15957](matrix-org#15957)) - Ensure a long state res does not starve CPU by occasionally yielding to the reactor. ([\matrix-org#15960](matrix-org#15960)) - Properly handle redactions of creation events. ([\matrix-org#15973](matrix-org#15973)) - Fix a bug where resyncing stale device lists could block responding to federation transactions, and thus delay receiving new data from the remote server. ([\matrix-org#15975](matrix-org#15975)) - Better clarify how to run a worker instance (pass both configs). ([\matrix-org#15921](matrix-org#15921)) - Improve [the documentation](https://matrix-org.github.io/synapse/v1.89/admin_api/user_admin_api.html#login-as-a-user) for the login as a user admin API. ([\matrix-org#15938](matrix-org#15938)) - Fix broken Arch Linux package link. Contributed by @SnipeXandrej. ([\matrix-org#15981](matrix-org#15981)) - Remove support for calling the `/register` endpoint with an unspecced `user` property for application services. ([\matrix-org#15928](matrix-org#15928)) - Mark `get_user_in_directory` private since it is only used in tests. Also remove the cache from it. ([\matrix-org#15884](matrix-org#15884)) - Document which Python version runs on a given Linux distribution so we can more easily clean up later. ([\matrix-org#15909](matrix-org#15909)) - Add details to warning in log when we fail to fetch an alias. ([\matrix-org#15922](matrix-org#15922)) - Remove unneeded `__init__`. ([\matrix-org#15926](matrix-org#15926)) - Fix bug with read/write lock implementation. This is currently unused so has no observable effects. ([\matrix-org#15933](matrix-org#15933), [\matrix-org#15958](matrix-org#15958)) - Unbreak the nix development environment by pinning the Rust version to 1.70.0. ([\matrix-org#15940](matrix-org#15940)) - Update presence metrics to differentiate remote vs local users. ([\matrix-org#15952](matrix-org#15952)) - Stop reading from column `user_id` of table `profiles`. ([\matrix-org#15955](matrix-org#15955)) - Build packages for Debian Trixie. ([\matrix-org#15961](matrix-org#15961)) - Reduce the amount of state we pull out. ([\matrix-org#15968](matrix-org#15968)) - Speed up updating state in large rooms. ([\matrix-org#15971](matrix-org#15971)) * Bump anyhow from 1.0.71 to 1.0.72. ([\matrix-org#15949](matrix-org#15949)) * Bump click from 8.1.3 to 8.1.6. ([\matrix-org#15984](matrix-org#15984)) * Bump cryptography from 41.0.1 to 41.0.2. ([\matrix-org#15943](matrix-org#15943)) * Bump jsonschema from 4.17.3 to 4.18.3. ([\matrix-org#15948](matrix-org#15948)) * Bump pillow from 9.4.0 to 10.0.0. ([\matrix-org#15986](matrix-org#15986)) * Bump prometheus-client from 0.17.0 to 0.17.1. ([\matrix-org#15945](matrix-org#15945)) * Bump pydantic from 1.10.10 to 1.10.11. ([\matrix-org#15946](matrix-org#15946)) * Bump pygithub from 1.58.2 to 1.59.0. ([\matrix-org#15834](matrix-org#15834)) * Bump pyo3-log from 0.8.2 to 0.8.3. ([\matrix-org#15951](matrix-org#15951)) * Bump sentry-sdk from 1.26.0 to 1.28.1. ([\matrix-org#15985](matrix-org#15985)) * Bump serde_json from 1.0.100 to 1.0.103. ([\matrix-org#15950](matrix-org#15950)) * Bump types-pillow from 9.5.0.4 to 10.0.0.1. ([\matrix-org#15932](matrix-org#15932)) * Bump types-requests from 2.31.0.1 to 2.31.0.2. ([\matrix-org#15983](matrix-org#15983)) * Bump typing-extensions from 4.5.0 to 4.7.1. ([\matrix-org#15947](matrix-org#15947))
No significant changes since 1.89.0rc1. - Add Unix Socket support for HTTP Replication Listeners. [Document and provide usage instructions](https://matrix-org.github.io/synapse/v1.89/usage/configuration/config_documentation.html#listeners) for utilizing Unix sockets in Synapse. Contributed by Jason Little. ([\matrix-org#15708](matrix-org#15708), [\matrix-org#15924](matrix-org#15924)) - Allow `+` in Matrix IDs, per [MSC4009](matrix-org/matrix-spec-proposals#4009). ([\matrix-org#15911](matrix-org#15911)) - Support room version 11 from [MSC3820](matrix-org/matrix-spec-proposals#3820). ([\matrix-org#15912](matrix-org#15912)) - Allow configuring the set of workers to proxy outbound federation traffic through via `outbound_federation_restricted_to`. ([\matrix-org#15913](matrix-org#15913), [\matrix-org#15969](matrix-org#15969)) - Implement [MSC3814](matrix-org/matrix-spec-proposals#3814), dehydrated devices v2/shrivelled sessions and move [MSC2697](matrix-org/matrix-spec-proposals#2697) behind a config flag. Contributed by Nico from Famedly, H-Shay and poljar. ([\matrix-org#15929](matrix-org#15929)) - Fix a long-standing bug where remote invites weren't correctly pushed. ([\matrix-org#15820](matrix-org#15820)) - Fix background schema updates failing over a large upgrade gap. ([\matrix-org#15887](matrix-org#15887)) - Fix a bug introduced in 1.86.0 where Synapse starting with an empty `experimental_features` configuration setting. ([\matrix-org#15925](matrix-org#15925)) - Fixed deploy annotations in the provided Grafana dashboard config, so that it shows for any homeserver and not just matrix.org. Contributed by @wrjlewis. ([\matrix-org#15957](matrix-org#15957)) - Ensure a long state res does not starve CPU by occasionally yielding to the reactor. ([\matrix-org#15960](matrix-org#15960)) - Properly handle redactions of creation events. ([\matrix-org#15973](matrix-org#15973)) - Fix a bug where resyncing stale device lists could block responding to federation transactions, and thus delay receiving new data from the remote server. ([\matrix-org#15975](matrix-org#15975)) - Better clarify how to run a worker instance (pass both configs). ([\matrix-org#15921](matrix-org#15921)) - Improve [the documentation](https://matrix-org.github.io/synapse/v1.89/admin_api/user_admin_api.html#login-as-a-user) for the login as a user admin API. ([\matrix-org#15938](matrix-org#15938)) - Fix broken Arch Linux package link. Contributed by @SnipeXandrej. ([\matrix-org#15981](matrix-org#15981)) - Remove support for calling the `/register` endpoint with an unspecced `user` property for application services. ([\matrix-org#15928](matrix-org#15928)) - Mark `get_user_in_directory` private since it is only used in tests. Also remove the cache from it. ([\matrix-org#15884](matrix-org#15884)) - Document which Python version runs on a given Linux distribution so we can more easily clean up later. ([\matrix-org#15909](matrix-org#15909)) - Add details to warning in log when we fail to fetch an alias. ([\matrix-org#15922](matrix-org#15922)) - Remove unneeded `__init__`. ([\matrix-org#15926](matrix-org#15926)) - Fix bug with read/write lock implementation. This is currently unused so has no observable effects. ([\matrix-org#15933](matrix-org#15933), [\matrix-org#15958](matrix-org#15958)) - Unbreak the nix development environment by pinning the Rust version to 1.70.0. ([\matrix-org#15940](matrix-org#15940)) - Update presence metrics to differentiate remote vs local users. ([\matrix-org#15952](matrix-org#15952)) - Stop reading from column `user_id` of table `profiles`. ([\matrix-org#15955](matrix-org#15955)) - Build packages for Debian Trixie. ([\matrix-org#15961](matrix-org#15961)) - Reduce the amount of state we pull out. ([\matrix-org#15968](matrix-org#15968)) - Speed up updating state in large rooms. ([\matrix-org#15971](matrix-org#15971)) * Bump anyhow from 1.0.71 to 1.0.72. ([\matrix-org#15949](matrix-org#15949)) * Bump click from 8.1.3 to 8.1.6. ([\matrix-org#15984](matrix-org#15984)) * Bump cryptography from 41.0.1 to 41.0.2. ([\matrix-org#15943](matrix-org#15943)) * Bump jsonschema from 4.17.3 to 4.18.3. ([\matrix-org#15948](matrix-org#15948)) * Bump pillow from 9.4.0 to 10.0.0. ([\matrix-org#15986](matrix-org#15986)) * Bump prometheus-client from 0.17.0 to 0.17.1. ([\matrix-org#15945](matrix-org#15945)) * Bump pydantic from 1.10.10 to 1.10.11. ([\matrix-org#15946](matrix-org#15946)) * Bump pygithub from 1.58.2 to 1.59.0. ([\matrix-org#15834](matrix-org#15834)) * Bump pyo3-log from 0.8.2 to 0.8.3. ([\matrix-org#15951](matrix-org#15951)) * Bump sentry-sdk from 1.26.0 to 1.28.1. ([\matrix-org#15985](matrix-org#15985)) * Bump serde_json from 1.0.100 to 1.0.103. ([\matrix-org#15950](matrix-org#15950)) * Bump types-pillow from 9.5.0.4 to 10.0.0.1. ([\matrix-org#15932](matrix-org#15932)) * Bump types-requests from 2.31.0.1 to 2.31.0.2. ([\matrix-org#15983](matrix-org#15983)) * Bump typing-extensions from 4.5.0 to 4.7.1. ([\matrix-org#15947](matrix-org#15947)) # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE1508oLYUKainYFJakD7OEIo53t0FAmTI2e4ACgkQkD7OEIo5 # 3t2x1RAAohu1Rmjv0mOqFR4P1YZpA5RFbYajcyq77n/ciDKSM1dqBelONqKOq2A9 # uGbVNm6rC+EFwIl5MF5TrFdsDQHvGcRgW6NpQDZ+uIUOYizjZH1g37BoNPLlGYQx # fmKG7/XqdWhSc5tHN9HsRHyHKmsndebjXoUCPKmieGZa1GLXvGwrNkWQlEpwd9Qu # mj3uewJxLFGgIIAOiplJ4UO8FaCbMD+By27hSiWtVsLT6pyav4HC2P8RQD1iv0jW # OXNHvEWyqfBPlsPOkCD4nQZrmZqa5GWLYfBm8zFgIBxNy+e33C07L4bO+QdCE86v # /SUKug/0nsp66jSZst1fM/M2ssXvjU+LNO9fqonOCZ4TiJ4i/yoa8AvmcAg5hy7C # HR9IBp9cMrQ2u1y2/knxF657AGHxgXEltgw0PDvZHowqsqoSb+5HWl0zv1wnVjMa # 2QYLKWPBk/AdlHkmC3S4/+gfVZVsT2RSBP3JUCbFyOqug9vXFvSGTfH07Lk4PDI3 # o5idBzumvyonsuC2ypkzlj49FAj21l/8DInxEpY9JcHdVncLWvu9gmLd+H7GY7H7 # ODa2gOynrsSGVH7IpOl6dpw/GH6R8ZlfHl87bFslOqVObBxquL/ODIoFOgld+MpT # YYXp+0tW564mg+AYw3+eo44JTq0lKh7eyENP3SqKN/Z8ssQL97c= # =Ar/g # -----END PGP SIGNATURE----- # gpg: Signature made Tue Aug 1 11:09:50 2023 BST # gpg: using RSA key D79D3CA0B61429A8A760525A903ECE108A39DEDD # gpg: key 903ECE108A39DEDD: new key but contains no user ID - skipped # gpg: Total number processed: 1 # gpg: w/o user IDs: 1 # gpg: Can't check signature: No public key # Conflicts: # poetry.lock # synapse/http/site.py # synapse/storage/databases/main/roommember.py
Signed-off-by: Nicolas Werner <[email protected]> Co-authored-by: Nicolas Werner <[email protected]> Co-authored-by: Nicolas Werner <[email protected]> Co-authored-by: Hubert Chathi <[email protected]>
Signed-off-by: Nicolas Werner <[email protected]> Co-authored-by: Nicolas Werner <[email protected]> Co-authored-by: Nicolas Werner <[email protected]> Co-authored-by: Hubert Chathi <[email protected]>
This PR builds on the work at #13581 by @nico-famedly.
I made some changes and added some tests. There was some discussion on the spec proposal as to whether fetching events should implicitly delete the dehydrated device - I did implement this, but only after all the to-device messages are fetched. For ref here is the original MSC. Note that the implementation has diverged somewhat from the MSC text.