Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix tear-down by switching to single query #674

Merged
merged 4 commits into from
Jan 21, 2024
Merged

Conversation

Cruikshanks
Copy link
Member

@Cruikshanks Cruikshanks commented Jan 19, 2024

https://eaflood.atlassian.net/browse/WATER-4336

In Improve tear down speed we made sweeping changes to the tear-down service which is used to support our water-abstraction-acceptance-tests. The new changes improved the performance of the clean-down.

One of the critical changes was disabling and enabling table triggers before deleting records. Where a table has a foreign key constraint these would be checked on every deletion. We know we eventually intend to delete everything related to the test data so these checks are redundant in the context of the tear-down. Disabling the checks before running the delete got a big bump in performance.

We therefore wrapped any delete statement for a table with a foreign key constraint with ALTER TABLE my_table DISABLE TRIGGER ALL; and ALTER TABLE my_table ENABLE TRIGGER ALL;.

All seemed well until we started seeing random failures in the tear-down when running the full suite. A further dive into triggers found that these checks are also being done the other way. For example, water.regions has no foreign key constraint. But water.licences has one that links to water.regions. If you try and delete a region that is linked to a licence before deleting the licence you get an error.

To avoid these issues we need to disable all triggers first, then run the delete statements before finishing with re-enabling them all. 😩🤦

But that's not all! 😱

As we started refactoring the tear-down services we realised there were also occasions where we needed to delete test data from tables in a specific order, for example, billing data.

To determine which billing transactions to delete you have to be able to link from them to the 'test' region via billing invoice licences, billing invoices, and then billing batches. If we delete the test data from one of these linking tables before the delete billing transactions statement gets called nothing will get cleansed. This might be the real reason why the Promise all strategy came out tops; it wasn't always deleting everything.

So, this change switches the tear-down strategy to Single query.

https://eaflood.atlassian.net/browse/WATER-4336

In [Improve tear down speed](#671) we made sweeping changes to the tear-down service which is used to support our [water-abstraction-acceptance-tests](https://github.com/DEFRA/water-abstraction-acceptance-tests). The new changes improved the performance of the clean-down.

One of the critical changes was disabling and enabling table triggers before deleting records. Where a table has a foreign key constraint these would be checked on every deletion. We know we eventually intend to delete _everything_ related to the test data so these checks are redundant in the context of the tear-down. Disabling the checks before running the delete got a big bump in performance.

We therefore wrapped any delete statement for a table with a foreign key constraint with `ALTER TABLE my_table DISABLE TRIGGER ALL;` and `ALTER TABLE my_table ENABLE TRIGGER ALL;`.

All seemed well, until we started seeing random failures in the tear-down when running the full suite. A further dive into triggers found that these checks are also being done the other way. For example, `water.regions` has no foreign key constraint. But `water.licences` has one that links to `water.regions`. If you try and delete a region that is linked to a licence before deleting the licence you get an error.

So, to avoid these issues we need to disable all triggers first, _then_ run the delete statements before finishing with re-enabling them all! 😩🤦
@Cruikshanks Cruikshanks added the bug Something isn't working label Jan 19, 2024
@Cruikshanks Cruikshanks self-assigned this Jan 19, 2024
We started with the original aim of calling the triggers before and after the main delete statements for each schema. But then we also spotted that there are occasions where we also should have been controlling the order of the deletions, the billing tables being a primary example.

To determine which billing transactions to delete you have to be able to link from them to the 'test' region via billing invoice licences, billing invoices, and then billing batches. If we delete the test data from one of these linking tables before the delete billing transactions statement gets called nothing will get cleansed. This might be the real reason why the **Promise all** strategy came out tops; it wasn't always deleting everything.

We've also taken this opportunity to split the crm service into `crm` and `crm_v2` so we have a one-to-one mapping of tear-down service and schema.

Finally, we've updated the README to record the change in strategy and reason why.
@Cruikshanks Cruikshanks changed the title Fix sporadic failures in tear-down process Fix tear-down by switching to single query Jan 19, 2024
Found there is also a JOIN delete happening across the schemas. So, the query needs to be sent as one for both!
@Cruikshanks Cruikshanks marked this pull request as ready for review January 21, 2024 21:34
@Cruikshanks Cruikshanks merged commit 7a80417 into main Jan 21, 2024
6 checks passed
@Cruikshanks Cruikshanks deleted the fix-flaky-tear-down branch January 21, 2024 21:35
Cruikshanks added a commit that referenced this pull request Jan 21, 2024
https://eaflood.atlassian.net/browse/WATER-4336

Ok, we've made a bit of a hash with trying to [improve the tear-down speed](#671).

We realised in the re-write we'd dropped a licence agreement DELETE statement that needed including so [fixed it](https://github.com/DEFRA/water-abstraction-system/pull/673/files).

Then we realised the tear-down was now faster but more flaky so [switched to the single query strategy](#674).

Only in the switch what do you know, we lost the licence agreement delete statement again.

Hopefully for the last time, we fix this!
Cruikshanks added a commit that referenced this pull request Jan 21, 2024
https://eaflood.atlassian.net/browse/WATER-4336

Ok, we've made a bit of a hash with trying to [improve the tear-down speed](#671).

We realised in the rewrite we'd dropped a licence agreement DELETE statement that needed including so [fixed it](https://github.com/DEFRA/water-abstraction-system/pull/673/files).

Then we realised the tear-down was now faster but more flaky so [switched to the single query strategy](#674).

Only in the switch what do you know, we lost the licence agreement DELETE statement again.

Hopefully, for the last time, we fix this!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant