Skip to content

Dynamic federation remotes#3235

Closed
fisx wants to merge 8 commits intodevelopfrom
dynamic-federator-remotes
Closed

Dynamic federation remotes#3235
fisx wants to merge 8 commits intodevelopfrom
dynamic-federator-remotes

Conversation

@fisx
Copy link
Contributor

@fisx fisx commented Apr 19, 2023

https://wearezeta.atlassian.net/browse/FS-1115

store federation remotes in cassandra in brig, in the same form as the remote search policies in the config file.

the idea is that this will become the source of truth for all services which nodes we federate with, and that it can be dynamically updated (via internal end-points for now, and possibly by events like "node down for too long" in the future).

it's not ideal that we store this in brig, but federator doesn't have a cassandra instance.

[not on this PR yet] we're also working on a TVar maintained by all federators and other services that are interested (including brig) that keeps a cache of the cassandra table, and a rabbitMQ broadcast queue that propagates updates to the brig pod siblings that haven't received it from the operator, and to the other services. the federator config will be simplified to not contain a list of federating nodes any more; federators will also get them from brig.

migration: my current idea is to always consider the union of brig's config file and the cassandra table the global truth. this way the operator can safely remove the list from the config file after the cassandra table has been updated (next deployment or so). the downside of this is that removing edges that are in the config file will silently fail (we could make it fail loudly, of course).

Checklist

  • Add a new entry in an appropriate subdirectory of changelog.d
  • Read and follow the PR guidelines

@zebot zebot added the ok-to-test Approved for running tests in CI, overrides not-ok-to-test if both labels exist label Apr 19, 2023
@fisx fisx changed the title Dynamic federator remotes Dynamic federation remotes Apr 20, 2023
@fisx fisx force-pushed the dynamic-federator-remotes branch from 3bc7926 to f8d8b58 Compare April 20, 2023 10:06

federationRemotesAPI :: ServerT BrigIRoutes.FederationRemotesAPI (Handler r)
federationRemotesAPI =
Named @"get-federation-remotes" (lift $ FederationDomainConfigs <$> wrapClient Data.getFederationRemotes) -- TODO: get this from TVar! also merge in config file!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't see the code for this, but perhaps this would require a full-table scan, which Cassandra doesn't like. If we have to list all the domain, perhaps we should store them differently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full table table scan is bad even if the table is <100 entries? then i'm not sure what the alternative is...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be done, but needs some clever, not obvious things like this: https://nblair.github.io/2017/02/16/cassandra-full-table-scan/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is to store the list in 1 row with id = 1. I think we already do this somewhere.
Then have a separate table to remember all the configuration for each endpoint. This also has the benefit of not forgetting preferences for a remote even if you stop federating with them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use RabbitMQ with mqtt, we can use retain flags to hold onto the last value sent for domain updates. When the clients connect, they will receive this message and can immediately update themselves to the latest state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://nblair.github.io/2017/02/16/cassandra-full-table-scan/ says it's fine to pull 10k rows in one query. so wouldn't c553c0e be a good solution?

@akshaymankar
Copy link
Member

migration: my current idea is to always consider the union of brig's config file and the cassandra table the global truth. this way the operator can safely remove the list from the config file after the cassandra table has been updated (next deployment or so). the downside of this is that removing edges that are in the config file will silently fail (we could make it fail loudly, of course).

Perhaps we don't need to worry about this as federation is not very seriously used by anyone, so causing a bit a breakage while someone makes an API call to load the list in the DB is acceptable? If so we can get rid of this whole thing.

@akshaymankar
Copy link
Member

[not on this PR yet] we're also working on a TVar maintained by all federators and other services that are interested (including brig) that keeps a cache of the cassandra table, and a rabbitMQ broadcast queue that propagates updates to the brig pod siblings that haven't received it from the operator, and to the other services. the federator config will be simplified to not contain a list of federating nodes any more; federators will also get them from brig.

I am assuming we're looking at storing this in TVar and using RabbitMQ to propagate changes in order to reduce latency which goes with checking whether or not a domain is allowed to be federated with. Do you have ballpark estimate of how much we're saving by doing this? To me this sounds like a lot of upfront optimization, or perhaps I am missing something.

@fisx
Copy link
Contributor Author

fisx commented Apr 20, 2023

[not on this PR yet] we're also working on a TVar maintained by all federators and other services that are interested (including brig) that keeps a cache of the cassandra table, and a rabbitMQ broadcast queue that propagates updates to the brig pod siblings that haven't received it from the operator, and to the other services. the federator config will be simplified to not contain a list of federating nodes any more; federators will also get them from brig.

I am assuming we're looking at storing this in TVar and using RabbitMQ to propagate changes in order to reduce latency which goes with checking whether or not a domain is allowed to be federated with. Do you have ballpark estimate of how much we're saving by doing this? To me this sounds like a lot of upfront optimization, or perhaps I am missing something.

we could make everybody poll once every second, but i have a feeling that using a queue isn't much harder, just a lot faster.

come to think of it, we probably need to make sure that we don't have a (unlikely) race condition where pod A writes to cassandra, then pod B writes to cassandra, then pod B notifies everybody of the update, then pod A does. hum. would that be resolved by polling? i guess so.

@akshaymankar
Copy link
Member

we could make everybody poll once every second, but i have a feeling that using a queue isn't much harder, just a lot faster.

I meant what is the point of the TVar, everybody who needs to know can just make an internal call to brig? How slow is that and is it too slow? As in do you have an SLO in mind?

@lepsa
Copy link
Contributor

lepsa commented Apr 21, 2023

From my talks with Matthias, using a message queue simplifies propagating the domain update notice to all services that need it. Since the various services can only talk within their node makes coordination hard. Having a system sitting outside of these VMs, like the databases, that can fan out messages would be greatly helpful. The K8S pods wouldn't have to know how many other pods are running, or where they are running, all updates for federation domains can be passed through an MQ broker that handles fanout for us.

@fisx
Copy link
Contributor Author

fisx commented Apr 21, 2023

From my talks with Matthias, using a message queue simplifies propagating the domain update notice to all services that need it. Since the various services can only talk within their node makes coordination hard. Having a system sitting outside of these VMs, like the databases, that can fan out messages would be greatly helpful. The K8S pods wouldn't have to know how many other pods are running, or where they are running, all updates for federation domains can be passed through an MQ broker that handles fanout for us.

I think @akshaymankar you are suggesting that we do a rest api request to brig and table lookup every time the current code does an MVar lookup in federator or brig. Yes, that could be done, but wouldn't that add a few (10s? 100s?) milliseconds to every remote request or notification? I'm not aware if we have specified acceptable lag limits, but i feel it's easier to get this right to begin, doing lots of api requests isn't that much easier.

Instead of using a rabbit-queue we could also make every pod poll the table via rest once a second or so, for caching in a TVar. This is certainly slower during changes of the connection graph, but that's a rare operation and we need to allow for race conditions there anyway.

I still like the fastest option best. @akshaymankar do you really think it'll be expensive to implement? Maybe discuss on the phone monday?

@akshaymankar
Copy link
Member

akshaymankar commented Apr 24, 2023

I still like the fastest option best. @akshaymankar do you really think it'll be expensive to implement? Maybe discuss on the phone monday?

From discussin in JIRA I guess y'all already have a decision, let me know if you still want to have a call.

@fisx
Copy link
Contributor Author

fisx commented May 1, 2023

closing in favor of #3260 (which contains this branch in full).

@fisx fisx closed this May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ok-to-test Approved for running tests in CI, overrides not-ok-to-test if both labels exist

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants