-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an ADR to keep document history in sync #9666
base: main
Are you sure you want to change the base?
Add an ADR to keep document history in sync #9666
Conversation
@@ -0,0 +1,64 @@ | |||
# 5. Keep Document history in sync with RabbitMQ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it's worth grouping or labelling this and our last ADR somehow, to show that these are changes relating specifically to the content block manager? Or referencing them in our engine? Just aware that if the engine was moved out of Whitehall they might lose their context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case, it's more related to actual Whitehall code. I do think anything related directly to Content Block Manager should go elsewhere though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Couple of suggestions from me
3932918
to
c868f93
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up. A few comments below.
More broadly - I wonder if there's a simpler alternative. If we consider two sources of 'history':
- Changenotes/editorial remarks in Whitehall, pushed to Publishing API
- Republishing of content via content-block-updates in Publishing API, pushed back to Whitehall
Have we considered moving all history into Publishing API? On the surface of it, that seems cleaner to me (albeit we'd have the complexity of a one-off data migration from Whitehall to Publishing API):
- Public changenotes already exist in Publishing API - we'd just need to find a way of pushing 'internal' changenotes up too (editorial remarks etc)
- Let Publishing API worry about consolidating the history of public/internal/content-block-driven updates
- On page load of a document, Whitehall can pull a document's entire history from Publishing API
(This idea, or any others you've considered, would make a good "Alternatives considered" section at the bottom of the ADR, as we often do in RFCs 🙏 ).
of when a change to a document has been triggered by an update to a content block. | ||
|
||
In order to do this, we need to update the Publishing API to record an event when a document has been | ||
republished as a result to a change to a content block. We can then have an endpoint that allows us to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth being explicit that this would be a second change to Publishing API (the first to record the UpdateContentBlock events, and the second to expose a new API route for fetching UpdateContentBlock events for a given document by its content ID)?
(You refer to the events
endpoint in the "Decision" section - is that what you're proposing to call it? If so, would you also be looking to expose other event types?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a draft PR here, but I didn't want to date it by linking to a PR that will have either been closed or merged by then alphagov/publishing-api#2993. Will add a bit more context.
However, we still need a way to include these events in the history. Whitehall is particularly complex as | ||
the document history is stored in the database and [paginated][1]. This means we can't fetch the events and | ||
weave them into the history, as we don't have the entire history to hand to ensure we add the events to the | ||
right place within the history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider changing Whitehall to make it possible to retrieve its entire history in one call? The PR that introduces the pagination doesn't really explain why they went down the pagination route. One could assume performance optimisation, but do we know if the query would be particularly slow without pagination?
|
||
Included in the events payload will be information about the triggering content block. We did consider | ||
sending this information as part of the payload, but concluded that we should make the effort to make | ||
the payload as small as possible, minimising bandwidth and reducing complexity in the Publishing API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite follow this? Just thinking it through to see if I understand:
- A "content block update" call is made to Publishing API, sending full payload info of the nature of the change
- All documents consuming that content block are republished with the new "UpdatedViaContentBlock" event described earlier...
- ...and put onto the "published_documents" queue
- Whitehall, subscribed to the queue, grabs each item on the queue that has been republished as a result of a content block update
- For each item, make an API call to the
/events
endpoint in Publishing API to grab the full content block update details, and generate an EditorialRemark.
I'm confused about step 3. Line 56 sounds like you're wanting to put a minimal payload on the queue, but line 41 suggests you're going to reuse the existing queue (& its presumably large / unaltered payload)?
FWIW this was my first suggestion when I discussed it with Stu as well. I quite like the idea of Publishing API being responsible for all the data about, well, publishing. There would probably be some challenges though, e.g. change notes are part of Whitehall model validation at the moment |
c868f93
to
73a0177
Compare
To add another voice to the suggestion of changenotes living in Publishing API: we'e just had a request for internal changenotes to Manuals Publisher. Could obviously replicate the internal changenotes feature in Whitehall, but feels like there's a more widespread need here that Publishing API could accommodate. |
I'm definitely of the opinion that this should be the direction of travel going forward, but given that Content Modelling is a small team, and the Whitehall team already have their own OKRs to focus on, I think it might be too big a piece of work to bite off at this point. It'd be good for us to find a way forward that addresses the need in this ADR, but also gets us towards the goal of having a single source of truth for document history... |
@pezholio we've just chatted this through at our team's Dev Huddle and we think there's a way forward that addresses the need in the ADR, that also goes down an architectural route that's more scaleable long term. The nice thing about the below approach is that you could stop at the MVP stage and it wouldn't be too bad, but there's also a nice end state in sight without too much additional effort (he says optimistically!). Shall we see about having a chat this week (perhaps with our TA or Lead Dev in attendance to advise)? MVPAll that would need doing right now to deliver an MVP:
It does mean making an extra call to Publishing API on document load, but that's the direction we feel would be beneficial going all-in on further down the line (and has been done a lot elsewhere, eg Specialist Publisher is all Publishing API calls). Future workThe remaining work could be tackled in three phases (and not necessarily by the Content Modelling team!). First, supporting editorial remarks / internal changenotes in Publishing API:
Second, sending all new Whitehall changenote history to Publishing API:
Then (finally) the last phase would be to consolidate and then remove all unused code:
|
@ChrisBAshton Sorry, was deep in user research last week, but picking this up now. Happy to give that approach a go, but the only issue I get with the approach of only merging the first page of results is we won't necessarily know what date window(s) to cover. For example: A document has the following range of event datetimes for the first page: 2024-03-23T09:23:00 And a range of event datetimes for the second page 2023-11-22T12:27:00 If we have an event that happens between |
3e6e422
to
905691f
Compare
I've updated this ADR spelling out the issues with interleaving the events, as well as tweaked the proposal for RabbitMQ to propose a brand new topic (rather than using an existing topic). I'm still leaning towards the RabbitMQ solution, but I'm keeping an open mind. Thoughts welcome! |
905691f
to
7a1b27f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding to this Stu. Would be interested to get your thoughts on the below comment.
This would involve setting up a new RabbitMQ message topic in Publishing API that sends | ||
messages when a content block update triggers a change to a document. This would be a brand new | ||
topic that contains a thin message that includes the `content_id` of the document that has | ||
been updated, when it was updated and information about the content block that triggered the update: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interesting. In my imagination we were going to send just a message for the content block and then leave Whitehall to do the lookup of all the documents that link to the block. However, I realise now that would mean searching through govspeak because Whitehall doesn't have a structured reference to content blocks. I can see now why the original plan was to use the existing message, because we'd now have to send two messages for each document. Hmm.
Whitehall does a thing where it parses out the references from Govspeak to contacts and to other editions each time an edition is saved (link). We could do the same with content block references perhaps, but I have not really thought through the consequences of that (performance etc.). It would allow us to just send the content block ID to Whitehall and update the document history though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have got a similar thing (contained within the Content Block Tools Gem) that we use to extract references from Govspeak (It's actually used in Whitehall at the mo too), so that's entirely a thing we could do
This adds an ADR to document our decision to add a RabbitMQ consumer to Whitehall to allow us to keep a record of when document updates have been triggered by an update to a dependent content block.