Restructure 'seeding' to be Knex based #1230

Cruikshanks · 2024-08-02T14:11:45Z

https://eaflood.atlassian.net/browse/WATER-4575

We have been working on replicating how the water-abstraction-import service imports a licence. See Import Licence versions for example.

This is in readiness for moving to ReSP managing licences rather than NALD.

What it exposed was that our test data setup needed expanding. The import relies on 'real' reference data being present. So, it makes testing easier if this is the case in our test database when writing unit tests.

We started with create wrapper for licence-version-purpose-condition table which added the ability to seed licence-purpose condition types. This was then expanded by Add new test reference data seeders which added some more reference data, for example, purposes.

The issue we've had is that the existing changes are straddling two different mechanisms. We've included them as a seed file in our Knex seeding capabilities. This is reference data, so why not seed it?!

And we are seeding it whenever we perform a database clean. Our intent is to have real reference data when unit testing, and until we've removed calls to DatabaseSupport.clean() from all our tests we need to do this each time. That has led to the logic and data itself living in the test/ folder.

There is a conceptual problem with this: something we depend on 'for real' lives in our test/ folder. Plus, the functionality is split into 2 places.

It also means our test/support/seeders have a mix of types. Original 'unit test seeders' that were intended to help move bulky test setup away from within the test itself, with 'knex seeders', things we'll also call when we run npm run seed against the non-test DB.

So, on reflection, we felt the need to carry out a little restructuring. All the 'seeding' functionality is moved to db/seeds but with the ability to still call it from clean() until such time as we have updated all tests and can drop the command.

While we're doing this housekeeping, we also update our existing user seed to match the conventions that have been introduced while also re-implementing some existing ones, for example, preferring Objection.js to raw SQL queries.

The end result is that we can seed both the real and test databases with reference data, even when a database already has existing records, and this functionality will live in one place and be built on Knex.

But there's more!

Having made these changes to ensure both the real and test databases are correctly populated with reference data, it no longer made any sense to be adding new reference records within our tests.

So, we updated the relevant test helpers to provide a select() in place of the add() method. This means you can select the details of a reference record without having to first create or query for it.

We then went through the affected tests and 'test-seeders' and swapped calls to Helper.add() with Helper.select(). We also performed some housekeeping on some of the tests we touched.

https://eaflood.atlassian.net/browse/WATER-4575 We have been working on replicating how the [water-abstraction-import](https://github.com/DEFRA/water-abstraction-import) service imports a licence. See [Import Licence versions](#1195) for example. This is in readiness for moving to ReSP managing licences rather than NALD. What it exposed was that our test data setup needed expanding. We started with [create wrapper for licence-version-purpose-condition table](#1181) which added the ability to seed licence purpose condition types. This was then expanded by [Add new test reference data seeders](#1206) which added some more reference data, for example, purposes. The issue we've got is the existing changes are straddling two different mechanisms. We've included them as a seed file in our [Knex seeding capabilities](https://knexjs.org/guide/migrations.html#seed-files). This is reference data so why not seed it?! And we are seeding it whenever we perform a Database clean. Our intent is to have real reference data when unit testing, and until we've removed calls to `DatabaseSupport.clean()` from all our tests we need to do this each time. That has led to the logic and data itself living in the `test/` folder. There is a conceptual problem with this; something we depend on 'for real' lives in our `test/` folder. Plus the functionality is split into 2 places. It also means our `test/support/seeders` has a mix of types. Original 'unit test seeders' which were intended to help tidy up bulky test setup away from within the test itself, with 'knex seeders', things we'll also call when we run `npm run seed` against the non-test DB. So, on reflection we think we need to carry out a little restructuring. _All_ the 'seeding' functionality needs to move to `db/seeds` but with the ability to still call it from `clean()` until such time as we have updated all tests and can drop the command. Whilst we're doing this housekeeping we'll also update our existing users seed to match some of the conventions that have been introduced, whilst also re-implementing some existing ones. The end result is that we can seed both the real and test databases with reference data, even when a DB already has existing records and this functionality will live in one place and be built on **Knex**.

Rather than bundling _all_ the reference data into a single seeder we'll be splitting them out. This will hopefully make it easy to see what data is being seeded, and maintain in the future. We're going to start with regions.

We move the data into a `db/seeds` and away from `test/support/seeders/data` so that the seeding and the seed data will live together. We also make some changes to the contents. Though possible to export values directly we think it is clearer to first declare them then export them. We also make changes to the seed data itself. We shouldn't de hard coding the created and updated at times as these will be set when the records are created. In our tests we sometime do need to set these. But only ever when dealing with transactional data. There is no scenario where the created or updated at values of a reference record matters.

The original seeder was using raw SQL to perform the update/insert. However, there is no reason why we cannot use Objection.js as we do in the majority of our database interactions. We've also changed what kind of queries get run. With no constraint on the table, we'll always need to run 2 queries. The original ran both the update and insert, each with clauses that would mean only one would actually make any changes. On reflection, we think it is clearer and safer to only run the 'mutating' query needed. So, instead our first query is to determine if a region already exists. If it does, we perform an update. If it doesn't we perform the insert. We've also extended the seeder to cater for the fact `water.regions` has an `is_test` column we don't expose in `public.regions`. Our existing tear down and acceptance tests depend on the 'Test Region' having this flag set. So, we cater for it in our new seeder. The final change is the file name. Knex runs the seed files in the order found. Because of this it is considered good practise to name your seed files in such a way that you control the order they are run in. Hence the pattern of prefixing seed file names with numbers.

Needed else Lab fails. Another change to highlight is moving where we have referenced the regions data into its own section in the test file header. We think this will make it clearer that this is test data we are referencing, as opposed to a standard test helper.

The need to group them was more a helper/simplifier for the call we need to make in Database.clean(). Now we have the files split Knex seeding can call them as normal and we can move that simplification to where it is actually used.

These tables all default to the current date and time when inserting a new record for the `createdAt` and `updatedAt` fields. But unlike our tables they don't have triggers that will ensure `updatedAt` gets set whenever an `UPDATE` is called. So, this change ensures the seeders set `updatedAt`.

Spotted when we update it to reference the new location for purpose data. We were still generating an external ID using the helper generate code methods but then selecting values from the source reference data. An external ID is built from the codes of the purposes the return requirement purpose is linked to. So, it _should_ be in sync with the purposes actually linked. We think there are other places where selecting a random entry from the reference data would also be needed. So, we also refactor some of the logic to a new helper method.

We're about to update all references to `Region.add()` to be `Region.select` so that we are using the pre-existing reference data rather than adding duplicate records solely for a test. This one test currently relies heavily on injecting data into the DB. However, it is calling 2 other services that handle fetching the data which have their own tests. So, unlike the changes we're about to make, we can update this test to be 100% stub dependent.

test/services/idm/fetch-user-roles-and-groups.service.test.js

This is an env var we don't default in our config and now we've included seeding users in our test setup we depend on it existing for CI to work.

This means, should we ever have to, we could run the seeds in `production` and not add a bunch of users we've identified in our source code!

We got intermittent failures in CI with the user testing, especially where we are having to create user-role records. This change reduces the number of times we are having to create them by moving all the setup of the user tests into a `before()` rather than `beforeEach()`. We also remove random record selection and ensure a spread of users, groups and roles across the tests.

test/models/role.model.test.js

test/models/user-role.model.test.js

test/models/user.model.test.js

Having ensured they stuck around for the UserHelper and UserGroupHelper because it made sense, it made us think again about their use in reference-only helpers. They weren't being used and would be unlikely to. We've also found that in most cases we've been defining indexes in the tests. So, `DEFAULT_INDEX` as also not really needed.

https://eaflood.atlassian.net/browse/WATER-4579 We recently started [seeding the test DB with reference data](#1206). We then [Restructured that seeding to be Knex based](#1230). We thought we'd covered all reference data with the seeding. But we've just spotted that we missed charge version change reasons (`ChangeReasonModel`). We're about to add some new unit tests in [Add history attributes to ChargeVersionModel](#1272) that depend on them. So, rather than insert yet more into the DB, this change switches change reasons to be 'reference' based, i.e. tests should select from pre-seeded reasons rather than adding generated ones.

https://eaflood.atlassian.net/browse/WATER-4579 We recently started [seeding the test DB with reference data](#1206). We then [Restructured that seeding to be Knex based](#1230). We thought we'd covered all reference data with the seeding. But we've just spotted that we missed charge version change reasons (`ChangeReasonModel`). We're about to add some new unit tests in [Add history attributes to ChargeVersionModel](#1272) that depend on them. So, rather than inserting more into the DB, this change switches change reasons to be 'reference' based, i.e., tests should select from pre-seeded reasons rather than adding generated ones.

Currently with the way data is being inserted into `charge_categories` using the helper to create records during unit testing. We are getting intermittent failures when running the unit tests when the charge `reference` gets duplicated due to it's lack of randomness. We have therefore decided that since the data in `charge_categories` is reference data we shouldn't really be creating a new charge category records each time a unit test requires one. Instead we should just be seeding the data once and then writing the unit tests to use this seeded data. The bulk of this work has been done for other tables in this PR #1230

Currently, with the way data is being inserted into `charge_categories` using the helper to create records during unit testing. We are getting intermittent failures when running the unit tests when the charge `reference` gets duplicated due to its lack of randomness. We have therefore decided that since the data in `charge_categories` is reference data we shouldn't be creating new charge category records each time a unit test requires one. Instead, we should just be seeding the data once and then writing the unit tests to use this seeded data. The bulk of this work has been done for other tables in this PR #1230 This PR will create the functions required to seed the `charge_categories` and then fix any code affected by this change.

Cruikshanks added the housekeeping Refactoring, tidying up or other work which supports the project label Aug 2, 2024

Cruikshanks self-assigned this Aug 2, 2024

Cruikshanks added 10 commits August 2, 2024 15:13

Split into separate seeders starting with Regions

c3f6fec

Rather than bundling _all_ the reference data into a single seeder we'll be splitting them out. This will hopefully make it easy to see what data is being seeded, and maintain in the future. We're going to start with regions.

Merge branch 'main' into restructure-seeders

2e60f60

Move purposes data

e6d2c56

Replace seeder

b46b246

Update references to old purposes seeder

e9812fa

Merge branch 'main' into restructure-seeders

9cce7f8

Fix missed/incorrect purpose test references

84308df

Cruikshanks force-pushed the restructure-seeders branch from 5285d3c to 84308df Compare August 5, 2024 10:00

Cruikshanks added 11 commits August 5, 2024 11:21

Move primary purposes data

e9094be

Replace primary purposes seeder

bbba714

Update references to old primary purposes seeder

70754e3

Move secondary purposes data

fcb5f15

Replace secondary purposes seeder

7ab8204

Update references to old secondary purposes seeder

3868310

Move licence version purpose condition types

b3c7a36

Replace licence version purpose cond. type seeder

973be7e

Update references to old condition types seeder

88ceb1a

Housekeeping - rename arg

eef2e27

Remove redundant seed builder

2d83927

Cruikshanks mentioned this pull request Aug 5, 2024

Fix region seeder name #1233

Closed

Cruikshanks added 4 commits August 5, 2024 12:25

Remove ReferenceData module

6b4bd27

The need to group them was more a helper/simplifier for the call we need to make in Database.clean(). Now we have the files split Knex seeding can call them as normal and we can move that simplification to where it is actually used.

Rename (re-index) seed files

b866fa1

github-advanced-security bot found potential problems Aug 7, 2024

View reviewed changes

test/services/idm/fetch-user-roles-and-groups.service.test.js Dismissed Show dismissed Hide dismissed

test/services/idm/fetch-user-roles-and-groups.service.test.js Dismissed Show dismissed Hide dismissed

Need DEFAULT_USER_PASSWORD in CI

b722b51

This is an env var we don't default in our config and now we've included seeding users in our test setup we depend on it existing for CI to work.

Cruikshanks mentioned this pull request Aug 7, 2024

Remove more DatabaseSupport.clean() from tests #1241

Merged

Cruikshanks added 4 commits August 7, 2024 17:28

Merge remote-tracking branch 'origin/main' into restructure-seeders

287dce6

Merge branch 'main' into restructure-seeders

8f0bf2a

Don't seed users & their groups in production

f98dfad

This means, should we ever have to, we could run the seeds in `production` and not add a bunch of users we've identified in our source code!

github-advanced-security bot found potential problems Aug 7, 2024

View reviewed changes

test/models/role.model.test.js Dismissed Show dismissed Hide dismissed

test/models/user-role.model.test.js Dismissed Show dismissed Hide dismissed

test/models/user.model.test.js Dismissed Show dismissed Hide dismissed

Cruikshanks added 11 commits August 7, 2024 22:42

Add new Knex seed for financial agreements

5025916

Include FinancialAgreements in DB clean seed trigger

7247b54

Update FinancialAgreementHelper

5004438

Switch tests to use FinancialAgreementHelper select()

ece6226

Remove add from FinancialAgreementHelper

c96b1a8

Remove generateCode from FinancialAgreementHelper

9b3a46c

Access the reference data in a consistent manner

0fef3ac

Fix broken test

0acf3f6

Added a README to the seeds folder

64494a4

Stop Test Region from being seeded in production

295e7fc

Cruikshanks marked this pull request as ready for review August 7, 2024 23:04

Cruikshanks merged commit e8619c5 into main Aug 7, 2024
6 checks passed

Cruikshanks deleted the restructure-seeders branch August 7, 2024 23:04

Cruikshanks mentioned this pull request Aug 20, 2024

Add Charge version change reasons to seeder #1274

Merged

Jozzey mentioned this pull request Sep 5, 2024

Create seeder for charge_categories #1312

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure 'seeding' to be Knex based #1230

Restructure 'seeding' to be Knex based #1230

Cruikshanks commented Aug 2, 2024 •

edited

Loading

Restructure 'seeding' to be Knex based #1230

Restructure 'seeding' to be Knex based #1230

Conversation

Cruikshanks commented Aug 2, 2024 • edited Loading

Cruikshanks commented Aug 2, 2024 •

edited

Loading