Skip to content

Commit 905691f

Browse files
committed
Add an ADR to keep document history in sync
1 parent 2bebe2f commit 905691f

File tree

1 file changed

+147
-0
lines changed

1 file changed

+147
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# 5. Keep document history in sync with Publishing API via RabbitMQ
2+
3+
Date: 2025-11-27
4+
5+
## Status
6+
7+
Accepted
8+
9+
## Context
10+
11+
When Content Blocks created in the Content Block Manager are used in documents, we want to be able to
12+
record when a change to a content block triggers an update to the host document. Currently this works
13+
like so:
14+
15+
* Content block is updated
16+
* We find all documents that use the content block
17+
* Each document is then represented to the content store with the updated content block details
18+
19+
This all happens in Publishing API, so there is no record in Whitehall (or any other publishing apps)
20+
of when a change to a document has been triggered by an update to a content block.
21+
22+
With this in mind, we need to find some way of enabling two-way communication between Publishing API
23+
and Whitehall, so publishers can see when content blocks that their document uses have been updated.
24+
25+
There are two potential solutions, each with their own advantages and drawbacks:
26+
27+
### Solution 1: Interweave content block updates in with Whitehall's history
28+
29+
In order to do this, we need to update the Publishing API to record an event when a document has been
30+
republished as a result to a change to a content block, as well as add an endpoint that allows us to
31+
see the events for a particular document, as well as filtering by event type and date.
32+
33+
An JSON representation of event object will look like this:
34+
35+
```json
36+
{
37+
"id": 115,
38+
"action": "HostContentUpdateJob",
39+
"user_uid": null,
40+
"created_at": "2024-11-28T14:14:11.375Z",
41+
"updated_at": "2024-11-28T14:14:11.375Z",
42+
"request_id": "91cfbab2f3ff8889ff55a1c7b308d60c",
43+
"content_id": "0c643225-b5ae-4bd4-8c5d-9d8911433e28",
44+
"payload": {
45+
"locale": "en",
46+
"message": "Host content updated by content block update",
47+
"content_id": "0c643225-b5ae-4bd4-8c5d-9d8911433e28",
48+
"source_block": {
49+
"title": " Universal Credit Helpline ",
50+
"content_id": "a55a917b-740f-466b-9b31-9a9df4526de4",
51+
}
52+
}
53+
}
54+
```
55+
56+
When a document is loaded in Whitehall, we could then call the API and weave these events into the timeline.
57+
However, this is complicated by the fact that Whitehall's document history is paginated, so we won't necessarily
58+
have the full Whitehall history at load time and won't necessarily know the full date window of Publishing events
59+
to fetch. For example:
60+
61+
A document has the following range of event datetimes for the first page:
62+
63+
```
64+
2024-03-23T09:23:00
65+
.....
66+
2023-12-10T11:13:00
67+
```
68+
69+
And a range of event datetimes for the second page
70+
71+
```
72+
2023-11-22T12:27:00
73+
...
74+
2023-09-12T15:17:00
75+
```
76+
77+
If we have an event that happens between `2023-11-22T12:27:00` (the newest event for the second page) and
78+
`2023-12-10T11:13:00` (the oldest event for the first page) it won't get picked up because it doesn't occur
79+
within that range of events.
80+
81+
We could get around this by making a request to get the datetime of the first event on the next page, thus
82+
giving us a full window of dates to interleave, but this makes an already [complex class][1] harder to understand.
83+
84+
Additionally, making an extra database query and calling out to an API endpoint could have performance impacts.
85+
86+
It's also worth considering that currently, we display 10 events on each "page" of results. If we are interleaving
87+
new events with each page of results, this could be confusing for the user if they only expect to see 10 results.
88+
89+
Another solution could be sending a request to the Publishing API endpoint before we fetch the history and then creating
90+
new events, however:
91+
92+
1. This will result in an API call every time a user views a document; and
93+
2. Carrying out an INSERT query on a GET request isn't a pattern we want to encourage
94+
95+
## Solution 2: Add a new message consumer in Whitehall
96+
97+
This would involve setting up a new RabbitMQ message topic in Publishing API that sends
98+
messages when a content block update triggers a change to a document. This would be a brand new
99+
topic that contains a thin message that includes the `content_id` of the document that has
100+
been updated, when it was updated and information about the content block that triggered the update:
101+
102+
```json
103+
{
104+
"locale": "en",
105+
"content_id": "0c643225-b5ae-4bd4-8c5d-9d8911433e28",
106+
"updated_at": "2024-11-28T14:14:11.375Z",
107+
"content_block": {
108+
"title": " Universal Credit Helpline ",
109+
"content_id": "a55a917b-740f-466b-9b31-9a9df4526de4",
110+
}
111+
}
112+
```
113+
114+
We will then set up a queue in Whitehall to listen for events with the relevant key. When an
115+
event has been received, we create a new event in Whitehall (something like an `EditorialRemark`)
116+
for the document with that `content_id`.
117+
118+
This will require a bit more work on both the Publishing API and Whitehall side and will involve
119+
a degree of opacity (as well as extra lines on an architecture graph), but this will avoid complexity
120+
when rendering the history of the document.
121+
122+
## Decision
123+
124+
We propose going with Solution 2.
125+
126+
## Consequences
127+
128+
We will need to set up a RabbitMQ consumer in Whitehall, which will require some minor work on the
129+
ops side of things. It will also mean we will need to consider two-way communication between the
130+
two applications when thinking about the publishing platform architecture.
131+
132+
However, once this is set up, this could potentially open up the possibility of more two way
133+
communication between Whitehall and Publishing API in the future, such as feeding back to
134+
the user when something has not published successfully.
135+
136+
## Alternatives considered
137+
138+
We could remove pagination entirely from the events, or carry out in-memory pagination, but these
139+
options could result in performance issues, especially with older documents. We would also have to
140+
make an API call to Publishing API each time a document is loaded, which could slow things down.
141+
142+
Another option could be to treat Publishing API as the source of truth for the history of a document,
143+
but this could be a considerably more complex piece of work, which we would have limited resource for.
144+
If we decided in the future that it was worth the investment of time, we could still do this further
145+
down the line.
146+
147+
[1]: https://github.com/alphagov/whitehall/blob/main/app/models/document/paginated_timeline.rb

0 commit comments

Comments
 (0)