Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure 'seeding' to be Knex based #1230

Merged
merged 103 commits into from
Aug 7, 2024
Merged

Restructure 'seeding' to be Knex based #1230

merged 103 commits into from
Aug 7, 2024

Conversation

Cruikshanks
Copy link
Member

@Cruikshanks Cruikshanks commented Aug 2, 2024

https://eaflood.atlassian.net/browse/WATER-4575

We have been working on replicating how the water-abstraction-import service imports a licence. See Import Licence versions for example.

This is in readiness for moving to ReSP managing licences rather than NALD.

What it exposed was that our test data setup needed expanding. The import relies on 'real' reference data being present. So, it makes testing easier if this is the case in our test database when writing unit tests.

We started with create wrapper for licence-version-purpose-condition table which added the ability to seed licence-purpose condition types. This was then expanded by Add new test reference data seeders which added some more reference data, for example, purposes.

The issue we've had is that the existing changes are straddling two different mechanisms. We've included them as a seed file in our Knex seeding capabilities. This is reference data, so why not seed it?!

And we are seeding it whenever we perform a database clean. Our intent is to have real reference data when unit testing, and until we've removed calls to DatabaseSupport.clean() from all our tests we need to do this each time. That has led to the logic and data itself living in the test/ folder.

There is a conceptual problem with this: something we depend on 'for real' lives in our test/ folder. Plus, the functionality is split into 2 places.

It also means our test/support/seeders have a mix of types. Original 'unit test seeders' that were intended to help move bulky test setup away from within the test itself, with 'knex seeders', things we'll also call when we run npm run seed against the non-test DB.

So, on reflection, we felt the need to carry out a little restructuring. All the 'seeding' functionality is moved to db/seeds but with the ability to still call it from clean() until such time as we have updated all tests and can drop the command.

While we're doing this housekeeping, we also update our existing user seed to match the conventions that have been introduced while also re-implementing some existing ones, for example, preferring Objection.js to raw SQL queries.

The end result is that we can seed both the real and test databases with reference data, even when a database already has existing records, and this functionality will live in one place and be built on Knex.

But there's more!

Having made these changes to ensure both the real and test databases are correctly populated with reference data, it no longer made any sense to be adding new reference records within our tests.

So, we updated the relevant test helpers to provide a select() in place of the add() method. This means you can select the details of a reference record without having to first create or query for it.

We then went through the affected tests and 'test-seeders' and swapped calls to Helper.add() with Helper.select(). We also performed some housekeeping on some of the tests we touched.

https://eaflood.atlassian.net/browse/WATER-4575

We have been working on replicating how the [water-abstraction-import](https://github.com/DEFRA/water-abstraction-import) service imports a licence. See [Import Licence versions](#1195) for example.

This is in readiness for moving to ReSP managing licences rather than NALD.

What it exposed was that our test data setup needed expanding. We started with [create wrapper for licence-version-purpose-condition table](#1181) which added the ability to seed licence purpose condition types. This was then expanded by [Add new test reference data seeders](#1206) which added some more reference data, for example, purposes.

The issue we've got is the existing changes are straddling two different mechanisms. We've included them as a seed file in our [Knex seeding capabilities](https://knexjs.org/guide/migrations.html#seed-files). This is reference data so why not seed it?!

And we are seeding it whenever we perform a Database clean. Our intent is to have real reference data when unit testing, and until we've removed calls to `DatabaseSupport.clean()` from all our tests we need to do this each time. That has led to the logic and data itself living in the `test/` folder.

There is a conceptual problem with this; something we depend on 'for real' lives in our `test/` folder. Plus the functionality is split into 2 places.

It also means our `test/support/seeders` has a mix of types. Original 'unit test seeders' which were intended to help tidy up bulky test setup away from within the test itself, with 'knex seeders', things we'll also call when we run `npm run seed` against the non-test DB.

So, on reflection we think we need to carry out a little restructuring. _All_ the 'seeding' functionality needs to move to `db/seeds` but with the ability to still call it from `clean()` until such time as we have updated all tests and can drop the command.

Whilst we're doing this housekeeping we'll also update our existing users seed to match some of the conventions that have been introduced, whilst also re-implementing some existing ones.

The end result is that we can seed both the real and test databases with reference data, even when a DB already has existing records and this functionality will live in one place and be built on **Knex**.
@Cruikshanks Cruikshanks added the housekeeping Refactoring, tidying up or other work which supports the project label Aug 2, 2024
@Cruikshanks Cruikshanks self-assigned this Aug 2, 2024
Rather than bundling _all_ the reference data into a single seeder we'll be splitting them out. This will hopefully make it easy to see what data is being seeded, and maintain in the future.

We're going to start with regions.
We move the data into a `db/seeds` and away from `test/support/seeders/data` so that the seeding and the seed data will live together.

We also make some changes to the contents. Though possible to export values directly we think it is clearer to first declare them then export them.

We also make changes to the seed data itself. We shouldn't de hard coding the created and updated at times as these will be set when the records are created. In our tests we sometime do need to set these. But only ever when dealing with transactional data. There is no scenario where the created or updated at values of a reference record matters.
The original seeder was using raw SQL to perform the update/insert. However, there is no reason why we cannot use Objection.js as we do in the majority of our database interactions.

We've also changed what kind of queries get run. With no constraint on the table, we'll always need to run 2 queries. The original ran both the update and insert, each with clauses that would mean only one would actually make any changes.

On reflection, we think it is clearer and safer to only run the 'mutating' query needed. So, instead our first query is to determine if a region already exists. If it does, we perform an update. If it doesn't we perform the insert.

We've also extended the seeder to cater for the fact `water.regions` has an `is_test` column we don't expose in `public.regions`. Our existing tear down and acceptance tests depend on the 'Test Region' having this flag set. So, we cater for it in our new seeder.

The final change is the file name. Knex runs the seed files in the order found. Because of this it is considered good practise to name your seed files in such a way that you control the order they are run in. Hence the pattern of prefixing seed file names with numbers.
Needed else Lab fails.

Another change to highlight is moving where we have referenced the regions data into its own section in the test file header. We think this will make it clearer that this is test data we are referencing, as opposed to a standard test helper.
@Cruikshanks Cruikshanks force-pushed the restructure-seeders branch from 5285d3c to 84308df Compare August 5, 2024 10:00
The need to group them was more a helper/simplifier for the call we need to make in Database.clean().

Now we have the files split Knex seeding can call them as normal and we can move that simplification to where it is actually used.
These tables all default to the current date and time when inserting a new record for the `createdAt` and `updatedAt` fields.

But unlike our tables they don't have triggers that will ensure `updatedAt` gets set whenever an `UPDATE` is called. So, this change ensures the seeders set `updatedAt`.
Spotted when we update it to reference the new location for purpose data. We were still generating an external ID using the helper generate code methods but then selecting values from the source reference data.

An external ID is built from the codes of the purposes the return requirement purpose is linked to. So, it _should_ be in sync with the purposes actually linked.

We think there are other places where selecting a random entry from the reference data would also be needed. So, we also refactor some of the logic to a new helper method.
We're about to update all references to `Region.add()` to be `Region.select` so that we are using the pre-existing reference data rather than adding duplicate records solely for a test.

This one test currently relies heavily on injecting data into the DB. However, it is calling 2 other services that handle fetching the data which have their own tests.

So, unlike the changes we're about to make, we can update this test to be 100% stub dependent.
This is an env var we don't default in our config and now we've included seeding users in our test setup we depend on it existing for CI to work.
This means, should we ever have to, we could run the seeds in `production` and not add a bunch of users we've identified in our source code!
We got intermittent failures in CI with the user testing, especially where we are having to create user-role records. This change reduces the number of times we are having to create them by moving all the setup of the user tests into a `before()` rather than `beforeEach()`. We also remove random record selection and ensure a spread of users, groups and roles across the tests.
test/models/role.model.test.js Dismissed Show dismissed Hide dismissed
test/models/user-role.model.test.js Dismissed Show dismissed Hide dismissed
test/models/user.model.test.js Dismissed Show dismissed Hide dismissed
@Cruikshanks Cruikshanks marked this pull request as ready for review August 7, 2024 23:04
@Cruikshanks Cruikshanks merged commit e8619c5 into main Aug 7, 2024
6 checks passed
@Cruikshanks Cruikshanks deleted the restructure-seeders branch August 7, 2024 23:04
Cruikshanks added a commit that referenced this pull request Aug 20, 2024
https://eaflood.atlassian.net/browse/WATER-4579

We recently started [seeding the test DB with reference data](#1206). We then [Restructured that seeding to be Knex based](#1230).

We thought we'd covered all reference data with the seeding. But we've just spotted that we missed charge version change reasons (`ChangeReasonModel`).

We're about to add some new unit tests in [Add history attributes to ChargeVersionModel](#1272) that depend on them. So, rather than insert yet more into the DB, this change switches change reasons to be 'reference' based, i.e. tests should select from pre-seeded reasons rather than adding generated ones.
Cruikshanks added a commit that referenced this pull request Aug 20, 2024
https://eaflood.atlassian.net/browse/WATER-4579

We recently started [seeding the test DB with reference data](#1206). We then [Restructured that seeding to be Knex based](#1230).

We thought we'd covered all reference data with the seeding. But we've just spotted that we missed charge version change reasons (`ChangeReasonModel`).

We're about to add some new unit tests in [Add history attributes to ChargeVersionModel](#1272) that depend on them. So, rather than inserting more into the DB, this change switches change reasons to be 'reference' based, i.e., tests should select from pre-seeded reasons rather than adding generated ones.
Cruikshanks added a commit that referenced this pull request Aug 20, 2024
https://eaflood.atlassian.net/browse/WATER-4579

We recently started [seeding the test DB with reference data](#1206). We then [Restructured that seeding to be Knex based](#1230).

We thought we'd covered all reference data with the seeding. But we've just spotted that we missed charge version change reasons (`ChangeReasonModel`).

We're about to add some new unit tests in [Add history attributes to ChargeVersionModel](#1272) that depend on them. So, rather than inserting more into the DB, this change switches change reasons to be 'reference' based, i.e., tests should select from pre-seeded reasons rather than adding generated ones.
Jozzey added a commit that referenced this pull request Sep 5, 2024
Currently with the way data is being inserted into `charge_categories` using the helper to create records during unit testing. We are getting intermittent failures when running the unit tests when the charge `reference` gets duplicated due to it's lack of randomness.

We have therefore decided that since the data in `charge_categories` is reference data we shouldn't really be creating a new charge category records each time a unit test requires one. Instead we should just be seeding the data once and then writing the unit tests to use this seeded data. The bulk of this work has been done for other tables in this PR #1230
Jozzey added a commit that referenced this pull request Sep 11, 2024
Currently, with the way data is being inserted into `charge_categories` using the helper to create records during unit testing. We are getting intermittent failures when running the unit tests when the charge `reference` gets duplicated due to its lack of randomness.

We have therefore decided that since the data in `charge_categories` is reference data we shouldn't be creating new charge category records each time a unit test requires one. Instead, we should just be seeding the data once and then writing the unit tests to use this seeded data. The bulk of this work has been done for other tables in this PR #1230

This PR will create the functions required to seed the `charge_categories` and then fix any code affected by this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
housekeeping Refactoring, tidying up or other work which supports the project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant