-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC2783: Homeserver Migration Data Format #2783
base: old_master
Are you sure you want to change the base?
Conversation
ooh, interesting - thanks for starting this. heads up though that the chances are high that the core team ends up sinking time into decentralised accounts (#915, as unlocked by MSC #1228), which then solves the migration problem (at least on a per-user basis, but could obviously be automated for a serverwise migration), and which in turn is required for P2P, and also solves server scalability, vhosting, HA and geo-HA requirements... |
@ara4n, I seriously want to emphasise that i see decentralized user accounts as a non-solution for the problem this MSC is trying to solve; backing up, and easy sysadmin migration, and safekeeping of homeserver state, while also avoiding potential lock-in. This is about lateral movement on a software/admin-level, not lateral movement on a user-level. This sentiment is outlined in the Values of the foundation, which i plan to adhere to, and so i think its important that this problem is solved before it ever comes up. I understand the priorities the core team has, but I wanted to repeat my views on matrix-org/matrix-spec#246, as I saw this proposal being dismissed in favour of it, which I dont see as helpful. (Note: i currently cant participate in |
Judging by the strength of your reaction I may be missing something :) In an MSC1228 world: any server which hosts the private part of your user_key can participate in a room on behalf of that user. So if you were a server admin migrating from synapse to dendrite or whatever, you would copy all your user_keys over to the new server, and it would participate in the rooms as that user (and replicate over account_data somehow - probably by representing it as a room). The details are a bit fuzzy given they haven't been fleshed out yet, but it's the same mechanism by which a P2P user would log into two instances as the same user. To complete the migration, the old server would then be turned off. In other words: the migration is simplified to just copying keys, and then Matrix replicates the rest of the data over... via Matrix, obviating the need to specify and maintain a new interchange format in addition to core Matrix itself. That said, there have been some concerns voiced about giving servers freestyle responsibility for the user_key - in an E2EE-by-default world you could argue the user should look after their user_key themselves and sign which servers are allowed to host their account. If this came to pass, then sysadmins would not be able to force-migrate their users (which would be a pain both for this use case, as well for vhosting & HA purposes), and an interchange format would be more important. Surely you agree that if we can solve both use cases with one solution, we should do so to avoid the spec getting too sprawling? |
Sorry for the strong wording, I simply wanted to make something clear up-front
I do agree on the spec sprawling if this would be included, but I think this counters some other key problems: domain hot-potato, consistent backup format, atomic and reversible migration I don't see these problems being solved if 2 servers exchange data via p2p upon the live migration, both servers would need different domains (though if the server part is a pubkey, it's essentially the same problem, because clients or links could be keyed to that server, which doesn't exist then anymore, unless some other spec abstracts away serverparts even more to provide an interface for selection and redirection), and a latent need for a backup format (instead of just backing up application data) still isn't fulfilled. There is also the possibility of "dead links" in the form of room aliases currently not being resolvable because the old server is down, and the new server living on an unknown subdomain. I'm just saying this to counter some of your arguments, I agree that the spec should be reduced as possible, but I don't think a p2p framework overlaying this will fix most things. Please correct me if I make assumptions that are just plain wrong, though. I have yet to completely drill down to the core of the p2p ideas presented from matrix.org over the years, so I might be missing a crucial piece that indeed makes this much more possible. |
I think part of the problem here is that we don't have a full proposal for decentralised accounts yet. It is not at all p2p specific though: the intention is to find a solution that lets both server admins and end-users pick a set of servers to host the users' accounts, rather than just being stuck with one server per account. The user's account would then be replicated between the servers via normal Matrix (nothing P2P-specific at all). Concretely, one strawman approach (which I sketched out a few months ago but haven't put into a proper MSC yet, given MSC1228 is on the critical path first) would be:
...at which point I think you have decentralised accounts, without any P2P voodoo beyond switching mxids to be pubkeys (MSC1228), using normal Matrix to replicate the data around. This could be used as equally for migrating between servers on the same domain, as for balancing users within servers on the same domain, as for users migrating between domains, or indeed balancing themselves between domains. Now, it'd be easy to write this all off as scifi, but we have a really urgent need for it for P2P to work, as well as to support synapse->dendrite migrations etc, where we are not planning to spend any time on speccing interchange formats, but instead charge off and try to get the nirvana of decentralised accounts working. That said, that's just the current proposed direction for the core team, and very very happy to sanity-check it and consider alternatives :) |
Looking at that strawman approach, i think i would need to dive a little deeper before i could have an argument on that, and also would need to get some more details, but i see how that could indeed transparently migrate stuff (and effectively make servers temporary valets for users' account authority). I dont want to clutter this MSC with response to that, is there any channel which i can join to follow this development? I have some vague concerns, but i think i need some extra information before i could voice them correctly. (Particularly, administrative complexity, and "what is happening"-obscurity to the user, and how to communicate it simply, effectively, and correctly.) |
I'm going to try to spend time unlocking this issue, transitioning it from draft status to full status, and then i'll probably see to making PoCs for synapse to reliably (per major version) extract data from the database to the format described here. |
There are still a few problems with this, and i think i havent yet enumerated everything, but i consider this solid enough for now. I'll be trying to find time to make a PoC exporter/importer for Synapse 1.29.0 to this format, updating the format with my findings and challenges as i go along. There are some points left where i'm explicitly asking for feedback with |
I recognise this is a gargantuan effort, and that some parts of it aren't even clear if they should exist in the spec at all, so for now I'll take the ideas in here and put them into a personal repository with which I'll experiment and research further efforts, I'll get a good tested PoC going for some major matrix homeservers, while also thinking about which parts of this should be applicable to the spec, if any at all. For now, I'll say that I'm suspending this MSC until I've decided what should go into it, thanks for some of the feedback I've gotten that made me realise this, I'll take it into consideration. Edit: I made a room for this, you can discuss it on |
That is a data protection feature, as there is no in-band way of knowing that a server would not change operators during a migration. |
## General Structure | ||
|
||
The proposal mainly defines a directory structure, this directory structure can be captures in ZIP files, | ||
RAR files, `.tar.gz` files, or any other sort of archival or indexable directory "target". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Just pick one. Choice is bad for interoperability. I would probably start with .tar.gz
because gzip is good for text and tar+compression has cross-file compression which will probably be helpful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of the above formats can be interpreted as a file system directory structure, and I'm not going to mandate which one to pick, but to allow any sort of "input" to the importer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that a user will move from host $X to host $Y and won't be able to upload their data because $X exported a zipfile and $Y only accepts tarballs. It is better for the ecosystem to define the best supported export format to avoid this issue. For technical users and servers that accept many common formats this may not be an issue but I really don't see much benefit in providing choice here.
claim a specifier with this prefix (such as MSCs and custom implementations). | ||
|
||
However, also, when processing a manifest, *all* items prefixing with `m.` MUST be processed or otherwise handled, | ||
when an importer encounters a `m.`-prefixed item specifier it does not understand, it must abort the import process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems strong. What if an old deprecated feature is not supported? I think ideally the operator would be notified and given the choice.
I think the spec isn't the right place to decide that some imports MUST be aborted. In general for transfers like this "as much as possible with a report of failures" is what I want to see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"As much as possible" is lossy, I explicitly don't want a lossy import, the m.
-prefixed keys mandate core data structures, ones that cannot be ignored, such as events, keys, login details, and other such things.
Any other key is best-effort, in that namespace you can add impl-specific config or data, and there it's free game, but not with core data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But lossy is often better than nothing. I agree that it isn't the best option but who are we to decide. I would rather let the user decide what they need at that moment. I think we could mandate something like "MUST notify the operator that the import was not successful" but may allow the operator to accept the lossy import. Maybe only one room has an unknown event and I am fine just dropping that room.
To be honest it probably doesn't matter too much because the tools will just ignore this part of the spec but I would rather not require this in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core data is core data, if you lose out on core data, no matter what it is, your homeserver will still have scars, so no.
Lossiness will only apply to the non-m.
-prefixed areas, because non-m.
is "best effort only", but missing out a m.
is unacceptable (you wouldn't want user accounts to be malformatted when imported, or entirely missing, would you?).
|
||
It owns the directory `m.events/`, and files `events.*.cbor`. | ||
|
||
The files contain CBOR-encoded mappings of room ID -> array of events. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spec is all JSON except for #3079 which isn't accepted yet. Should we just stick with JSON for now to ensure that there are no ambiguities?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON is way too sparse of a data structure, costly without reason, CBOR is a direct alternative without compromises.
Though, I'm probably going to change this to SQLite databases because CBOR files of this type don't have iterative parsers/consumers for every language out there, and I want memory usage to be low, and not that a 1 GB CBOR file causes a 2 GB memory spike on top of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compromise is that the spec is defined in JSON and any incompatibility can be a source of issues.
I think COBR is fine but it makes sense to consider the alternatives. I am curious if you have a size comparison between compressed JSON and CBOR. I suspect there isn't that much of a difference in size. (Parse time is maybe more of a concern however I suspect that disk IO is a bigger bottleneck for most cases)
I am hesitant about requiring an implementation as the format. What about the simple alternative of newline separate JSON which can be parsed with log memory usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directly mapped JSON to CBOR has minimal size advantages. For a 352 byte standard event:
{"type":"m.room.member","sender":"@username:matrix.org","content":{"avatar_url":"mxc://matrix.org/FLOIFFxSxJVrtqHqKJAvaZGNRJ","membership":"join","displayname":"Username"},"event_id":"$15943751151sWhiO:matrix.org","unsigned":{"age":14,"replaces_state":"$15943751120CQQql:matrix.org"},"state_key":"@username:matrix.org","origin_server_ts":1594375115332}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just for curiosity's sake, what about gzipped CBOR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
254 bytes.
|
||
The files contain CBOR-encoded mappings of user ID -> user details. | ||
|
||
A user ID key mapping MUST only exist *once* across all files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? A server can have a lot of user info and it may be useful to split it into multiple files to enable parallel dumping/loading. For example imagine a bot user that is in many thousands of rooms. This also seems inconsistent with the m.rooms
export.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
User details contain information about account data and keys and such, user membership is in the state events given with the regular "event data", and a server can/should reconstruct membership based on that data, like the old server can.
If that data is over one gigabyte, then that's on the user and the admin partially, I don't expect to be handling a gigabyte of account data, but even if that's the case, I don't wanna prescribe hell to the migration tools by splitting and merging keys across multiple files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather prescribe "hell" than have import fail. For a lot of lower-end devices slow is acceptable.
You are right that rooms are not in this file but I see what room_tags
are which can still be unbounded in size. I guess a future change could define an alternate format for those that is size-bounded?
|
||
XXX: Need expertise for this, I don't know how much or what specifically i should or could capture here. | ||
|
||
## Potential issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import object size.
- Large files may be hard for servers to manage. Ideally we would allow non-streaming parsing to be viable for ease of client implementations. Maybe we need to limit files to be <16MiB or something? (Note that the in-memory representation of COBR/JSON can be multiple times larger than the serialized representation).
- Large files that must be parsed sequentially may prevent parallelism and cause unacceptably slow imports. For example large files are are a single JSON or COBR object can not be parsed in parallel. Maybe we should consider a format such as JSON objects on separate lines which can be parsed in parallel. (That being said tar archives can't fully be parsed in parallel anyways, but they can be streamed and the import can be parallelized)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Yes, large object files are bad, but many small files are also bad, as that is relatively heavy on the file system, so I decided to experiment with SQLite for the time being.
- The parallelisation needs to be at both ends, and I don't know if a server importer could support importing multiple events from multiple timelines at once, maybe "per room", but even then for the stateres (for it to be efficient) it has to happen sequentially per room.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather err on the size of many small files than many large files. Small files are slow, large files fail. I agree that the idea spot is somewhere in the middle which is why it makes sense to allow batching things into files but splitting them at sensible points.
Right now all of the files appear to be single JSON objects which can be very memory demanding to parse (unless you are using streaming parsing which is rare). A mitigation would be to ensure that the file format can be naturally parsed in a streaming manor such as JSON-lines or SQLite.
I agree that both ends must be able to be parallelized. I'm not sure if stateres can be parallelized but I wouldn't want the import format to be blocking that. I can also imagine it would be possible to do a bulk parallel import then run a stateres which may be able to avoid outdated work and may be significantly faster than doing a serial import even on the per-room level. For human chats the size of an individual room shouldn't be too bad but Matrix does try to be general in what rooms are for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parallelisation needs to be at both ends, and I don't know if a server importer could support importing multiple events from multiple timelines at once, maybe "per room", but even then for the stateres (for it to be efficient) it has to happen sequentially per room.
If you treat the import as a bulk data input, then you can do the state resolution after completing the import. Do not couple the import and processing.
This still seems extremely useful (and necessary) current day. |
Oh wow, neat. Looking forward to when one day I will be able to just move between server implementations when it turns out another one fits the use-case better. |
Rendered
Related to #2760
Signed-off-by: Jonathan de Jong <[email protected]>