Add an ADR to keep document history in sync

alphagov · Nov 28, 2024 · 3932918 · 3932918
1 parent 5a8efc2
commit 3932918
Showing 1 changed file with 64 additions and 0 deletions.
diff --git a/docs/adr/0005-keep-document-history-in-sync-with-rabbit-mq.md b/docs/adr/0005-keep-document-history-in-sync-with-rabbit-mq.md
@@ -0,0 +1,64 @@
+# 5. Keep Document history in sync with RabbitMQ
+
+Date: 2025-11-27
+
+## Status
+
+Accepted
+
+## Context
+
+When Content Blocks created in the Content Block Manager are used in documents, we want to be able to 
+record when a change to a content block triggers an update to the host document. Currently this works
+like so:
+
+* Content block is updated
+* We find all documents that use the content block
+* Each document is then represented to the content store with the updated content block details
+
+This all happens in Publishing API, so there is no record in Whitehall (or any other publishing apps) 
+of when a change to a document has been triggered by an update to a content block.
+
+In order to do this, we need to update the Publishing API to record an event when a document has been
+republished as a result to a change to a content block. We can then have an endpoint that allows us to
+see the events for a particular document.
+
+However, we still need a way to include these events in the history. Whitehall is particularly complex as
+the document history is stored in the database and [paginated][1]. This means we can't fetch the events and
+weave them into the history, as we don't have the entire history to hand to ensure we add the events to the
+right place within the history.
+
+We could send a request to the Publishing API endpoint before we fetch the history and then create
+new events, however:
+
+1. This will result in an API call every time a user views a document; and
+2. Carrying out an INSERT query on a GET request isn't a pattern we want to encourage
+
+## Decision
+
+With this in mind, we are proposing adding a new message queue consumer in Whitehall. Rabbit MQ messages
+are already sent by Publishing API when documents are republished, so we can consume the existing 
+`published_documents` queue. We will listen for events with the `host_content` key, so we only listen
+for events triggered by a content object update. When we receive a message, we will:
+
+* Make a call to the `events` endpoint in Publishing API for that Content ID to find the latest 
+`HostContentUpdateJob` event
+* Create a new `EditorialRemark` for the latest live edition for the Whitehall Document with that 
+Content ID, informing the user that the document was republished by a change to the content block
+
+Included in the events payload will be information about the triggering content block. We did consider
+sending this information as part of the payload, but concluded that we should make the effort to make
+the payload as small as possible, minimising bandwidth and reducing complexity in the Publishing API
+code.
+
+## Consequences
+
+We will need to set up a RabbitMQ consumer in Whitehall, which will require some minor work on the 
+ops side of things. It will also mean we will need to consider two-way communication between the
+two applications when thinking about the publishing platform architecture.
+
+However, once this is set up, this could potentially open up the possibility of  more two way 
+communication between Whitehall and Publishing API in the future, such as feeding back to
+the user when something has not published successfully.
+
+[1]: https://github.com/alphagov/whitehall/blob/main/app/models/document/paginated_timeline.rb