[FS-1940] Start sending backend notifications through rabbitMQ and consuming them#3276
[FS-1940] Start sending backend notifications through rabbitMQ and consuming them#3276akshaymankar merged 54 commits intodevelopfrom
Conversation
14c7228 to
eb93c00
Compare
Send onUserDeleteConnections notification using rabbitmq
eb93c00 to
8fa3b8d
Compare
9641a09 to
a0ec535
Compare
57eea77 to
85680e2
Compare
85680e2 to
9ca845e
Compare
a219655 to
7a1ea7e
Compare
dec03a3 to
93179f9
Compare
Also introduce App Monad
f755d44 to
d668582
Compare
484fb92 to
ef32b84
Compare
|
|
||
| The above are the default values (except for secrets, which do not have | ||
| defaults), if they work they are not required to be configured. | ||
| `background-worker.config.remoteDomains` should contain all the remote domains |
There was a problem hiding this comment.
this will change again with #3260, but let's worry about that there.
| When federation is enabled, wire-server will require running RabbitMQ. The helm | ||
| chart in `rabbitmq` can be used to install RabbitMQ. Please refer to the | ||
| documentation at https://docs.wire.com to install RabbitMQ in Kubernetes. These | ||
| new configurations are required: |
There was a problem hiding this comment.
move the yaml snippet to docs.wire.com? or if it's already there: link to it rather than redundantly copy it? on the other hand, that would complicate the situation in which we change docs.wire.com in a later release, and somebody upgrades to this release afterwards.
There was a problem hiding this comment.
The way docs.wire.com tells people about these options is by telling them to copy some example config from wire-server-deploy. I think that is quite useless for people doing upgrades. So, I added this yaml here. As you say it is also useful for someone doing upgrades 1 version at a time, which something could have changed in future in docs.wire.com.
| rollingUpdate: | ||
| maxUnavailable: 0 | ||
| maxSurge: {{ .Values.replicaCount }} | ||
| # Ensures only one version of the background worker is running at any given |
There was a problem hiding this comment.
| # Ensures only one version of the background worker is running at any given | |
| # Ensures only one instance (or k8s pod) of the background worker is running at any given |
is that what you mean?
There was a problem hiding this comment.
No, many K8s pods can run at once. This way of deploying just deletes previous version before deploying the new version. Basically does the opposite of a rolling deployment.
| limits: | ||
| memory: "512Mi" | ||
| # TODO(elland): Create issue for a metrics endpoint | ||
| # FUTUREWORK: Implement metrics |
There was a problem hiding this comment.
we can do this in a separate PR, but we should make sure we won't forget and have a ticket:
https://wearezeta.atlassian.net/browse/FS-2020
btw, which ticket is this PR about?
There was a problem hiding this comment.
It is this: https://wearezeta.atlassian.net/browse/FS-1940
It was in the title, also added it to description now.
There was a problem hiding this comment.
Thanks for creating the issue, I had asked Marco Conti to create it as I wasn't sure how to get it in right JIRA places. The one you create doesn't seem to be in the right epic. Maybe someone will notice and put it there? Is this our process?
There was a problem hiding this comment.
I added a parent of relation, maybe that's all that was needed 🤞
|
|
||
| build-depends: | ||
| aeson | ||
| , amqp |
There was a problem hiding this comment.
this package is starting to bundle a lot of unrelated dependencies together, and inheriting them to a lot of places. but i don't think we should worry about this yet.
| @@ -0,0 +1,98 @@ | |||
| {-# LANGUAGE NumericUnderscores #-} | |||
There was a problem hiding this comment.
can we put this into the cabal files in the future? i don't see any harm in it being the default.
| deriving (Arbitrary) via (GenericUniform Component) | ||
| deriving (ToJSON, FromJSON) via (Schema Component) | ||
|
|
||
| instance ToSchema Component where |
There was a problem hiding this comment.
FedAwareService? (anything but component. it's neither the term we usually use, nor is it very expressive.)
There was a problem hiding this comment.
It would be so much easier once we merge these services 😄
| test p "delete with connected remote users" $ testDeleteWithRemotes opts b, | ||
| test p "delete with connected remote users and failed remote notifcations" $ testDeleteWithRemotesAndFailedNotifications opts b c, |
There was a problem hiding this comment.
you've removed these, but not added them again to the new integration test suite? is this not useful any more?
There was a problem hiding this comment.
The first test is already being tested in end-to-end tests. It is very hard to test in the integration tests.
The second case is not valid anymore because we cannot fail due to remote backend being down, we can fail due to rabbitmq being down, but that is just a 500. I didn't think we should test 500 scenarios.
services/run-services
Outdated
| 'AWS_SECRET_ACCESS_KEY': "dummysecret", | ||
| 'RABBITMQ_USERNAME': 'guest', | ||
| 'RABBITMQ_PASSWORD': 'alpaca-grapefruit' | ||
| 'RABBITMQ_PASSWORD': 'guest' |
There was a problem hiding this comment.
i suggested to change this to something less obvious earlier. it's an easy way to counter quite a few potential security issues. why did you change it back?
There was a problem hiding this comment.
There was botched merge, perhaps this happened due to that. I will add it back.
| -- this message blocks the whole queue. Perhaps there is a better way to | ||
| -- deal with this. |
There was a problem hiding this comment.
Maybe blocking the queue is the safer choice than data loss until the FUTUREWORK is completed? It would only be blocked until a newer worker is able to ack the new message, right?
There was a problem hiding this comment.
In case of a reverting a deployment, it would get blocked forever.
| -- FUTUREWORK: Recosider using 1 channel for many consumers. It shouldn't matter | ||
| -- for a handful of remote domains. |
There was a problem hiding this comment.
Wouldn't that result in a blocked (or very slow) channel in case a domain is unreachable?
There was a problem hiding this comment.
The channel itself doesn't block when a domain is unreachable. It only blocks the particular consumer who is supposed to be pushing notifications to the unreachable domain. Other consumers keep going.
I suspect that this will reach some limit of consumers or just be slow when we have hundreds of consumers running. But perhaps we can solve that problem when we get there.
| let returnSuccessSecondTime _ = | ||
| atomicModifyIORef isFirstReqRef $ \isFirstReq -> | ||
| if isFirstReq | ||
| then (False, ("text/html", "<marquee>down for maintenance</marquee>")) |
| defaults), if they work they are not required to be configured. | ||
| `background-worker.config.remoteDomains` should contain all the remote domains |
There was a problem hiding this comment.
We should help customers setting all the federation domain related configuration. We already have
federator.config.optSettings.federationStrategy.allowedDomainssetFederationDomainin various placesbrig.config.optSettings.setFederationDomainConfigsgalley.config.settings.featureFlags.classifiedDomains
c85e708 to
41d4c81
Compare
https://wearezeta.atlassian.net/browse/FS-1940
Checklist
changelog.d