[FS-1940] Start sending backend notifications through rabbitMQ and consuming them by elland · Pull Request #3276 · wireapp/wire-server

elland · 2023-05-09T09:22:45Z

https://wearezeta.atlassian.net/browse/FS-1940

Checklist

Add a new entry in an appropriate subdirectory of changelog.d
Read and follow the PR guidelines

Send onUserDeleteConnections notification using rabbitmq

CLAassistant · 2023-05-10T07:50:04Z

All committers have signed the CLA.

Also introduce App Monad

Depends on wireapp/wire-server-deploy#612

fisx · 2023-05-24T08:45:49Z

changelog.d/0-release-notes/background-worker

+
+The above are the default values (except for secrets, which do not have
+defaults), if they work they are not required to be configured.
+`background-worker.config.remoteDomains` should contain all the remote domains


this will change again with #3260, but let's worry about that there.

fisx · 2023-05-24T08:47:42Z

changelog.d/0-release-notes/background-worker

+When federation is enabled, wire-server will require running RabbitMQ. The helm
+chart in `rabbitmq` can be used to install RabbitMQ. Please refer to the
+documentation at https://docs.wire.com to install RabbitMQ in Kubernetes. These
+new configurations are required:


move the yaml snippet to docs.wire.com? or if it's already there: link to it rather than redundantly copy it? on the other hand, that would complicate the situation in which we change docs.wire.com in a later release, and somebody upgrades to this release afterwards.

The way docs.wire.com tells people about these options is by telling them to copy some example config from wire-server-deploy. I think that is quite useless for people doing upgrades. So, I added this yaml here. As you say it is also useful for someone doing upgrades 1 version at a time, which something could have changed in future in docs.wire.com.

fisx · 2023-05-24T08:49:02Z

charts/background-worker/templates/deployment.yaml

-    rollingUpdate:
-      maxUnavailable: 0
-      maxSurge: {{ .Values.replicaCount }}
+    # Ensures only one version of the background worker is running at any given


Suggested change

# Ensures only one version of the background worker is running at any given

# Ensures only one instance (or k8s pod) of the background worker is running at any given

is that what you mean?

No, many K8s pods can run at once. This way of deploying just deletes previous version before deploying the new version. Basically does the opposite of a rolling deployment.

fisx · 2023-05-24T08:52:40Z

charts/background-worker/values.yaml

  limits:
    memory: "512Mi"
-# TODO(elland): Create issue for a metrics endpoint
+# FUTUREWORK: Implement metrics


we can do this in a separate PR, but we should make sure we won't forget and have a ticket:

https://wearezeta.atlassian.net/browse/FS-2020

btw, which ticket is this PR about?

It is this: https://wearezeta.atlassian.net/browse/FS-1940
It was in the title, also added it to description now.

Thanks for creating the issue, I had asked Marco Conti to create it as I wasn't sure how to get it in right JIRA places. The one you create doesn't seem to be in the right epic. Maybe someone will notice and put it there? Is this our process?

I added a parent of relation, maybe that's all that was needed 🤞

fisx · 2023-05-24T09:09:45Z

libs/extended/extended.cabal


  build-depends:
      aeson
+    , amqp


this package is starting to bundle a lot of unrelated dependencies together, and inheriting them to a lot of places. but i don't think we should worry about this yet.

fisx · 2023-05-24T09:10:23Z

libs/extended/src/Network/AMQP/Extended.hs

@@ -0,0 +1,98 @@
+{-# LANGUAGE NumericUnderscores #-}


can we put this into the cabal files in the future? i don't see any harm in it being the default.

fisx · 2023-05-24T09:15:01Z

libs/wire-api/src/Wire/API/MakesFederatedCall.hs

  deriving (Arbitrary) via (GenericUniform Component)
+  deriving (ToJSON, FromJSON) via (Schema Component)
+
+instance ToSchema Component where


FedAwareService? (anything but component. it's neither the term we usually use, nor is it very expressive.)

It would be so much easier once we merge these services 😄

fisx · 2023-05-24T09:17:07Z

services/brig/test/integration/API/User/Account.hs

-      test p "delete with connected remote users" $ testDeleteWithRemotes opts b,
-      test p "delete with connected remote users and failed remote notifcations" $ testDeleteWithRemotesAndFailedNotifications opts b c,


you've removed these, but not added them again to the new integration test suite? is this not useful any more?

The first test is already being tested in end-to-end tests. It is very hard to test in the integration tests.

The second case is not valid anymore because we cannot fail due to remote backend being down, we can fail due to rabbitmq being down, but that is just a 500. I didn't think we should test 500 scenarios.

fisx · 2023-05-24T09:17:56Z

services/run-services

        'AWS_SECRET_ACCESS_KEY': "dummysecret",
        'RABBITMQ_USERNAME': 'guest',
-        'RABBITMQ_PASSWORD': 'alpaca-grapefruit'
+        'RABBITMQ_PASSWORD': 'guest'


i suggested to change this to something less obvious earlier. it's an easy way to counter quite a few potential security issues. why did you change it back?

There was botched merge, perhaps this happened due to that. I will add it back.

smatting · 2023-05-24T09:06:19Z

services/background-worker/src/Wire/BackendNotificationPusher.hs

+        -- this message blocks the whole queue. Perhaps there is a better way to
+        -- deal with this.


Maybe blocking the queue is the safer choice than data loss until the FUTUREWORK is completed? It would only be blocked until a newer worker is able to ack the new message, right?

In case of a reverting a deployment, it would get blocked forever.

smatting · 2023-05-24T09:17:33Z

services/background-worker/src/Wire/BackendNotificationPusher.hs

+-- FUTUREWORK: Recosider using 1 channel for many consumers. It shouldn't matter
+-- for a handful of remote domains.


Wouldn't that result in a blocked (or very slow) channel in case a domain is unreachable?

The channel itself doesn't block when a domain is unreachable. It only blocks the particular consumer who is supposed to be pushing notifications to the unreachable domain. Other consumers keep going.

I suspect that this will reach some limit of consumers or just be slow when we have hundreds of consumers running. But perhaps we can solve that problem when we get there.

smatting · 2023-05-24T09:27:53Z

services/background-worker/test/Test/Wire/BackendNotificationPusherSpec.hs

+    let returnSuccessSecondTime _ =
+          atomicModifyIORef isFirstReqRef $ \isFirstReq ->
+            if isFirstReq
+              then (False, ("text/html", "<marquee>down for maintenance</marquee>"))


smatting · 2023-05-24T09:39:26Z

changelog.d/0-release-notes/background-worker

+defaults), if they work they are not required to be configured.
+`background-worker.config.remoteDomains` should contain all the remote domains


We should help customers setting all the federation domain related configuration. We already have

federator.config.optSettings.federationStrategy.allowedDomains

setFederationDomain in various places

brig.config.optSettings.setFederationDomainConfigs

galley.config.settings.featureFlags.classifiedDomains

I believe #3260 is going to solve this.

It was removed in #3276

…gh RabbitMQ (#3333) * galley: Send on-user-deleted-conversations backend notification through RabbitMQ * charts/galley: Support configuring rabbitmq * brig: Remove unnecessary annotation for on-user-deleted-connections It was removed in #3276 * brig: validate options like galley does

zebot added the ok-to-test Approved for running tests in CI, overrides not-ok-to-test if both labels exist label May 9, 2023

elland changed the title ~~[FS-1207] Start sending backend notifications through rabbitMQ and consuming them~~ [FS-1930] Start sending backend notifications through rabbitMQ and consuming them May 9, 2023

elland changed the title ~~[FS-1930] Start sending backend notifications through rabbitMQ and consuming them~~ [FS-19340] Start sending backend notifications through rabbitMQ and consuming them May 9, 2023

elland changed the title ~~[FS-19340] Start sending backend notifications through rabbitMQ and consuming them~~ [FS-1940] Start sending backend notifications through rabbitMQ and consuming them May 9, 2023

elland force-pushed the backend-notification-rises branch from 14c7228 to eb93c00 Compare May 9, 2023 09:35

elland added 2 commits May 9, 2023 14:24

Reapplying bg-worker

cecc44d

Send onUserDeleteConnections notification using rabbitmq

Updated tests that no longer made sense with rabbit in place

8fa3b8d

elland force-pushed the backend-notification-rises branch from eb93c00 to 8fa3b8d Compare May 9, 2023 12:29

elland force-pushed the backend-notification-rises branch from 9641a09 to a0ec535 Compare May 10, 2023 08:41

Restoring CI setup.

d6f2b9c

elland force-pushed the backend-notification-rises branch 2 times, most recently from 57eea77 to 85680e2 Compare May 11, 2023 13:27

Added error handling to "flaky" dpop test

9ca845e

elland force-pushed the backend-notification-rises branch from 85680e2 to 9ca845e Compare May 11, 2023 13:53

Changed rabbitmq values.yaml

7a1ea7e

elland force-pushed the backend-notification-rises branch from a219655 to 7a1ea7e Compare May 15, 2023 08:53

elland added 2 commits May 15, 2023 11:33

Added exception handler to rabbitmq

67483a4

Format

93179f9

elland force-pushed the backend-notification-rises branch from dec03a3 to 93179f9 Compare May 15, 2023 11:23

akshaymankar and others added 10 commits May 15, 2023 15:32

background-worker.integration.yaml: Add logLevel

3d2f7e7

brig: Rename rabbitMQ -> rabbitmq, RabbitMQOpts -> RabbitMqOpts

298e5ee

background-worker: Deal with rabbitmq channel exceptions

a122f26

Add logs

f029344

Refactor rabbitmq / background worker life-cycle

8d9e69b

Remove unnecessary todos

5c28915

Remove unnecessary comment

6e8615e

background-worker: Add retries to connect tor rabbitmq

972173b

Also introduce App Monad

backgroun-worker: Move rabbitmq code to separate module, log retries

8ec2d61

Better logs when rabbitmq connection dies

99bf675

akshaymankar added 7 commits May 17, 2023 16:14

brig: Add retries to rabbitmq enqueuing code

38f92a7

No need to catch an error that never happens

568dbe7

brig: Better logging for rabbitmq errors

5b50cde

brig: redundant deps

f01f0d4

Recreate all background workers at once

9c3181b

TODO -> FUTUREWORK

da343ca

FederatorClient: Also include the Accept header

d668582

akshaymankar force-pushed the backend-notification-rises branch from f755d44 to d668582 Compare May 23, 2023 08:17

akshaymankar added 3 commits May 23, 2023 10:41

Better serialization format for notifications

2a11487

Retry pushing notifications on any exception

f9ab997

Document how to install RabbitMQ

010c6da

Depends on wireapp/wire-server-deploy#612

akshaymankar marked this pull request as ready for review May 23, 2023 14:18

akshaymankar requested a review from fisx May 23, 2023 14:18

Add changelog

ef32b84

akshaymankar force-pushed the backend-notification-rises branch from 484fb92 to ef32b84 Compare May 23, 2023 14:54

smatting self-requested a review May 24, 2023 07:51

fisx reviewed May 24, 2023

View reviewed changes

smatting approved these changes May 24, 2023

View reviewed changes

Harder to guess password for rabbitmq

41d4c81

akshaymankar force-pushed the backend-notification-rises branch from c85e708 to 41d4c81 Compare May 24, 2023 11:54

akshaymankar merged commit 7b37e4e into develop May 24, 2023

akshaymankar deleted the backend-notification-rises branch May 24, 2023 12:41

smatting mentioned this pull request May 25, 2023

Merge develop into mls-olaf #3317

Merged

akshaymankar added a commit that referenced this pull request May 31, 2023

brig: Remove unnecessary annotation for on-user-deleted-connections

b3f39f4

It was removed in #3276

akshaymankar mentioned this pull request May 31, 2023

galley: Send on-user-deleted-conversations backend notification through RabbitMQ #3333

Merged

2 tasks

zebot mentioned this pull request Aug 11, 2023

Release 2023-08-11 - (expected chart version 4.36.0) #3493

Merged

	# Ensures only one version of the background worker is running at any given
	# Ensures only one instance (or k8s pod) of the background worker is running at any given

		test p "delete with connected remote users" $ testDeleteWithRemotes opts b,
		test p "delete with connected remote users and failed remote notifcations" $ testDeleteWithRemotesAndFailedNotifications opts b c,

		-- this message blocks the whole queue. Perhaps there is a better way to
		-- deal with this.

		-- FUTUREWORK: Recosider using 1 channel for many consumers. It shouldn't matter
		-- for a handful of remote domains.

		defaults), if they work they are not required to be configured.
		`background-worker.config.remoteDomains` should contain all the remote domains

Comments

Conversation

elland commented May 9, 2023 • edited by akshaymankar Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Uh oh!

CLAassistant commented May 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

elland commented May 9, 2023 •

edited by akshaymankar

Loading

CLAassistant commented May 10, 2023 •

edited

Loading