Conversation
3bc7926 to
f8d8b58
Compare
|
|
||
| federationRemotesAPI :: ServerT BrigIRoutes.FederationRemotesAPI (Handler r) | ||
| federationRemotesAPI = | ||
| Named @"get-federation-remotes" (lift $ FederationDomainConfigs <$> wrapClient Data.getFederationRemotes) -- TODO: get this from TVar! also merge in config file! |
There was a problem hiding this comment.
Don't see the code for this, but perhaps this would require a full-table scan, which Cassandra doesn't like. If we have to list all the domain, perhaps we should store them differently.
There was a problem hiding this comment.
full table table scan is bad even if the table is <100 entries? then i'm not sure what the alternative is...
There was a problem hiding this comment.
It can be done, but needs some clever, not obvious things like this: https://nblair.github.io/2017/02/16/cassandra-full-table-scan/
There was a problem hiding this comment.
An alternative is to store the list in 1 row with id = 1. I think we already do this somewhere.
Then have a separate table to remember all the configuration for each endpoint. This also has the benefit of not forgetting preferences for a remote even if you stop federating with them.
There was a problem hiding this comment.
If we use RabbitMQ with mqtt, we can use retain flags to hold onto the last value sent for domain updates. When the clients connect, they will receive this message and can immediately update themselves to the latest state.
There was a problem hiding this comment.
https://nblair.github.io/2017/02/16/cassandra-full-table-scan/ says it's fine to pull 10k rows in one query. so wouldn't c553c0e be a good solution?
Perhaps we don't need to worry about this as federation is not very seriously used by anyone, so causing a bit a breakage while someone makes an API call to load the list in the DB is acceptable? If so we can get rid of this whole thing. |
I am assuming we're looking at storing this in |
we could make everybody poll once every second, but i have a feeling that using a queue isn't much harder, just a lot faster. come to think of it, we probably need to make sure that we don't have a (unlikely) race condition where pod A writes to cassandra, then pod B writes to cassandra, then pod B notifies everybody of the update, then pod A does. hum. would that be resolved by polling? i guess so. |
I meant what is the point of the |
|
From my talks with Matthias, using a message queue simplifies propagating the domain update notice to all services that need it. Since the various services can only talk within their node makes coordination hard. Having a system sitting outside of these VMs, like the databases, that can fan out messages would be greatly helpful. The K8S pods wouldn't have to know how many other pods are running, or where they are running, all updates for federation domains can be passed through an MQ broker that handles fanout for us. |
I think @akshaymankar you are suggesting that we do a rest api request to brig and table lookup every time the current code does an MVar lookup in federator or brig. Yes, that could be done, but wouldn't that add a few (10s? 100s?) milliseconds to every remote request or notification? I'm not aware if we have specified acceptable lag limits, but i feel it's easier to get this right to begin, doing lots of api requests isn't that much easier. Instead of using a rabbit-queue we could also make every pod poll the table via rest once a second or so, for caching in a TVar. This is certainly slower during changes of the connection graph, but that's a rare operation and we need to allow for race conditions there anyway. I still like the fastest option best. @akshaymankar do you really think it'll be expensive to implement? Maybe discuss on the phone monday? |
From discussin in JIRA I guess y'all already have a decision, let me know if you still want to have a call. |
|
closing in favor of #3260 (which contains this branch in full). |
https://wearezeta.atlassian.net/browse/FS-1115
store federation remotes in cassandra in brig, in the same form as the remote search policies in the config file.
the idea is that this will become the source of truth for all services which nodes we federate with, and that it can be dynamically updated (via internal end-points for now, and possibly by events like "node down for too long" in the future).
it's not ideal that we store this in brig, but federator doesn't have a cassandra instance.
[not on this PR yet] we're also working on a TVar maintained by all federators and other services that are interested (including brig) that keeps a cache of the cassandra table, and a rabbitMQ broadcast queue that propagates updates to the brig pod siblings that haven't received it from the operator, and to the other services. the federator config will be simplified to not contain a list of federating nodes any more; federators will also get them from brig.
migration: my current idea is to always consider the union of brig's config file and the cassandra table the global truth. this way the operator can safely remove the list from the config file after the cassandra table has been updated (next deployment or so). the downside of this is that removing edges that are in the config file will silently fail (we could make it fail loudly, of course).
Checklist
changelog.d