Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve tear down speed #671

Merged
merged 10 commits into from
Jan 19, 2024
Merged

Improve tear down speed #671

merged 10 commits into from
Jan 19, 2024

Conversation

Cruikshanks
Copy link
Member

https://eaflood.atlassian.net/browse/WATER-4336

The tear-down feature supports our water-abstraction-acceptance-tests. At the start of each test, it will make an HTTP call to POST /data/tear-down, resulting in TearDownService.go() being called.

The service and those it calls are responsible for clearing the various schemas we use of test data. Some tables have a is_test field that immediately identifies the record as being created for a test.

But each test will create data that won't be flagged this way. So, the tear-down services also remove records we can identify as being test data.

  • it might be linked to a is_test record
  • it has a known value, for example, name, that is specific to the acceptance tests

Ideally, we wouldn't need to remove anything between the tests. We could then avoid this step and the delay it causes. But until we re-architect fully the acceptance tests we've inherited we have to work with what we've got.

When we took on WRLS water-abstraction-service had a tear-down endpoint. But it would take more than a minute to complete. This caused the tests to take a long time to finish and often caused them to fail.

We built a more performant version. It would typically get the job done in 15 to 20 seconds. But sometimes this would creep to 40 seconds which again would cause test runs to fail.

So, we've gone back and tried a series of alternate strategies for clearing the data. Luckily, we've been able to identify a faster solution. This change updates tear-down to use the new strategy. We've also included a README covering what we tried and the results that led us to this decision (repeated below).

The average times are in milliseconds. Just divide by 1000 for secs, for example, 1770 / 1000 = 1.7 secs

Schema No change Just triggers Single Query Promise all Best of All in one
CRM 1770 2106 1185 964 1444 N/A
IDM 58 73 46 852 1199 N/A
PERMIT 165 266 156 893 1282 N/A
RETURNS 7176 6121 3517 4474 7528 N/A
WATER 42268 4413 5266 5106 5289 N/A
- - - - - -
Total 42269 6556 5266 5106 7529 7457

@Cruikshanks Cruikshanks added the enhancement New feature or request label Jan 18, 2024
@Cruikshanks Cruikshanks self-assigned this Jan 18, 2024
The SQL was originally grabbed from calling `toString()` on the original Knex queries. They have all now been run through a SQL formatter to make them more readable.
@Cruikshanks Cruikshanks marked this pull request as ready for review January 19, 2024 09:09
@Cruikshanks Cruikshanks marked this pull request as draft January 19, 2024 09:09
@Cruikshanks Cruikshanks marked this pull request as ready for review January 19, 2024 09:50
@Cruikshanks Cruikshanks merged commit cbb71c8 into main Jan 19, 2024
6 checks passed
@Cruikshanks Cruikshanks deleted the improve-tear-down-speed branch January 19, 2024 10:30
Cruikshanks added a commit that referenced this pull request Jan 19, 2024
https://eaflood.atlassian.net/browse/WATER-4336

In [Improve tear down speed](#671) we made sweeping changes to the tear-down service which is used to support our [water-abstraction-acceptance-tests](https://github.com/DEFRA/water-abstraction-acceptance-tests). The new changes improved the performance of the clean down.

However, we've spotted that one of the delete statements got lost in the whirlwind of our strategy testing; licence agreements not marked as 'test' but linked to 'test' licences. This a common issue the tear-down service has to deal with due to the way the tests are currently written and the inherited test seed process works.

This changes adds the missing statement.
Cruikshanks added a commit that referenced this pull request Jan 19, 2024
https://eaflood.atlassian.net/browse/WATER-4336

In [Improve tear down speed](#671) we made sweeping changes to the tear-down service which is used to support our [water-abstraction-acceptance-tests](https://github.com/DEFRA/water-abstraction-acceptance-tests). The new changes improved the performance of the clean-down.

However, we've spotted that one of the delete statements got lost in the whirlwind of our strategy testing; licence agreements not marked as 'test' but linked to 'test' licences. This a common issue the tear-down service has to deal with due to the way the tests are currently written and the inherited test seed process works.

This change adds the missing statement.
Cruikshanks added a commit that referenced this pull request Jan 19, 2024
https://eaflood.atlassian.net/browse/WATER-4336

In [Improve tear down speed](#671) we made sweeping changes to the tear-down service which is used to support our [water-abstraction-acceptance-tests](https://github.com/DEFRA/water-abstraction-acceptance-tests). The new changes improved the performance of the clean-down.

One of the critical changes was disabling and enabling table triggers before deleting records. Where a table has a foreign key constraint these would be checked on every deletion. We know we eventually intend to delete _everything_ related to the test data so these checks are redundant in the context of the tear-down. Disabling the checks before running the delete got a big bump in performance.

We therefore wrapped any delete statement for a table with a foreign key constraint with `ALTER TABLE my_table DISABLE TRIGGER ALL;` and `ALTER TABLE my_table ENABLE TRIGGER ALL;`.

All seemed well, until we started seeing random failures in the tear-down when running the full suite. A further dive into triggers found that these checks are also being done the other way. For example, `water.regions` has no foreign key constraint. But `water.licences` has one that links to `water.regions`. If you try and delete a region that is linked to a licence before deleting the licence you get an error.

So, to avoid these issues we need to disable all triggers first, _then_ run the delete statements before finishing with re-enabling them all! 😩🤦
Cruikshanks added a commit that referenced this pull request Jan 21, 2024
In [Improve tear down speed](#671) we made sweeping changes to the tear-down service which is used to support our [water-abstraction-acceptance-tests](https://github.com/DEFRA/water-abstraction-acceptance-tests). The new changes improved the performance of the clean-down.

One of the critical changes was disabling and enabling table triggers before deleting records. Where a table has a foreign key constraint these would be checked on every deletion. We know we eventually intend to delete _everything_ related to the test data so these checks are redundant in the context of the tear-down. Disabling the checks before running the delete got a big bump in performance.

We therefore wrapped any delete statement for a table with a foreign key constraint with `ALTER TABLE my_table DISABLE TRIGGER ALL;` and `ALTER TABLE my_table ENABLE TRIGGER ALL;`.

All seemed well until we started seeing random failures in the tear-down when running the full suite. A further dive into triggers found that these checks are also being done the other way. For example, `water.regions` has no foreign key constraint. But `water.licences` has one that links to `water.regions`. If you try and delete a region that is linked to a licence before deleting the licence you get an error.

To avoid these issues we need to disable all triggers first, _then_ run the delete statements before finishing with re-enabling them all. 😩🤦

But that's not all! 😱

As we started refactoring the tear-down services we realised there were also occasions where we needed to delete test data from tables in a specific order, for example, billing data.

To determine which billing transactions to delete you have to be able to link from them to the 'test' region via billing invoice licences, billing invoices, and then billing batches. If we delete the test data from one of these linking tables before the delete billing transactions statement gets called nothing will get cleansed. This might be the real reason why the **Promise all** strategy came out tops; it wasn't always deleting everything.

So, this change switches the tear-down strategy to **Single query**.
Cruikshanks added a commit that referenced this pull request Jan 21, 2024
https://eaflood.atlassian.net/browse/WATER-4336

Ok, we've made a bit of a hash with trying to [improve the tear-down speed](#671).

We realised in the re-write we'd dropped a licence agreement DELETE statement that needed including so [fixed it](https://github.com/DEFRA/water-abstraction-system/pull/673/files).

Then we realised the tear-down was now faster but more flaky so [switched to the single query strategy](#674).

Only in the switch what do you know, we lost the licence agreement delete statement again.

Hopefully for the last time, we fix this!
Cruikshanks added a commit that referenced this pull request Jan 21, 2024
https://eaflood.atlassian.net/browse/WATER-4336

Ok, we've made a bit of a hash with trying to [improve the tear-down speed](#671).

We realised in the rewrite we'd dropped a licence agreement DELETE statement that needed including so [fixed it](https://github.com/DEFRA/water-abstraction-system/pull/673/files).

Then we realised the tear-down was now faster but more flaky so [switched to the single query strategy](#674).

Only in the switch what do you know, we lost the licence agreement DELETE statement again.

Hopefully, for the last time, we fix this!
Cruikshanks added a commit that referenced this pull request Feb 20, 2024
https://eaflood.atlassian.net/browse/WATER-4336

We made changes to [Improve tear down speed](#671). What we didn't spot when doing this is a table that needs clearing down has been missing from the `TearDownService` from the very start.

This change ensures `water.charge_purposes` is cleared down when the `/tear-down` endpoint is hit.
Cruikshanks added a commit that referenced this pull request Feb 20, 2024
https://eaflood.atlassian.net/browse/WATER-4336

We made changes to [Improve tear down speed](#671). What we didn't spot when doing this is a table that needs clearing down has been missing from the `TearDownService` from the very start.

This change ensures `water.charge_purposes` is cleared down when the `/tear-down` endpoint is hit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants