Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make timestamps consistent at model layer #85

Merged
merged 16 commits into from
Jan 13, 2023

Conversation

Cruikshanks
Copy link
Member

@Cruikshanks Cruikshanks commented Jan 10, 2023

DEFRA/water-abstraction-team#69

We identified that there are inconsistencies in the legacy database, timestamps in the tables being one of them. We don't want these issues to make a mess of our code. So, until we're ready for some full-scale data migration we're going to deal with them in our model layer.

Timestamps being different across tables is the main inconsistency we noted and this change deals with it.

We use Objection.js $parseDatabaseJson() and $formatDatabaseJson() to present a consistent API in our models whilst still working with the DB. This is built into the LegacyBaseModel with every child model expected to include a translations() getter that will be used by both. There is an argument to move translations() into WaterBaseModel and then just override it for exceptions like EventModel. But we have an eye to also use this logic to make the ID property for each model actually .id. And that will be different in each which means every model is going to need to set translations (). So, it'd be better to keep the translations together in one place for each model, rather than splitting them.

WARNING! Queries cannot use the 'code' version of a column name. They must use the database version. We've included a unit test to demonstrate this and documentation to explain the reasons why.

@Cruikshanks Cruikshanks added the housekeeping Refactoring, tidying up or other work which supports the project label Jan 10, 2023
@Cruikshanks Cruikshanks self-assigned this Jan 10, 2023
@Cruikshanks Cruikshanks marked this pull request as ready for review January 11, 2023 23:45
@Cruikshanks
Copy link
Member Author

If it helps, most of the changes are documenting what is going on! 😳 Also, for @StuAA78 benefit the key section you need to read is this

But that leads us to the second problem; at this level translations are global. So, any reference to date_created would get translated to created_at, irrespective of the table referenced. If we only had to worry about database to model it would be okay. The problem is to make queries work we need to be able to translate back from our names to the database version. That is impossible where created_at might need to become date_created, created, or modified. With no access to the table name for context we cannot make that decision.

This is why we have accepted we can only support translations when dealing with instances of a model for now.

Hopefully that explains why I had to abandon what we discussed and accept the 'middle option' 😬

Jozzey
Jozzey previously approved these changes Jan 13, 2023
Copy link
Contributor

@Jozzey Jozzey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@StuAA78 StuAA78 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Just a couple of comment tweaks as discussed

app/models/legacy-base.model.js Outdated Show resolved Hide resolved
app/models/legacy-base.model.js Outdated Show resolved Hide resolved
Cruikshanks and others added 16 commits January 13, 2023 16:06
DEFRA/water-abstraction-team#69

We identified that there are inconsistencies in the legacy database, timestamps in the tables being one of them. We don't want these issues making a mess of our code. So, until we're ready for some full scale data migration we're going to deal with them in our model layer.

Timestamps being different across tables is the main inconsistency we noted and this change deals with it.
Though we break some other things this initial pass confirms the idea works. Using `$parseDatabaseJson()` and `$formatDatabaseJson()` we can present a consistent API in our models whilst still working with the DB.

Next up some refactoring so see if we can make this reusable at a lower level.
This change shows we can do the grunt work in a reusable way and just have the model tell us what fields need translating.

Next will think of a test for this.
We didn't think it necessary to worry about how the data actually looks in the DB. So, we call the overridden methods directly that Objection.js will use.

They'll be picked up just by the act of using the helpers in our other unit tests that do write to the DB. Plus going lower felt like we're starting to test Objection.
We only made them to confirm this idea would work.
Looking at this change there is an argument to move the translations into `WaterBaseModel` and then just override the exceptions like `EventModel`. But we have an eye to also using this logic to make the ID property for each model _actually_ `.id`. And that will be different in each which means every model is going to need to set `translations ()`. So, it'd better to keep the translations together in one place for each model, rather than splitting them.
A number of tests started failing, all with the issue that `date_created` could not be null. The source was the helpers but we're generally not setting those fields in the helpers. We expect the DB to default them??

Obviously it was our new code, and we tracked it down to the fact it was _always_ applying the translation to the model JSON, even if the property didn't exist.

For example, `RegionHelper.add()` adds a new region but doesn't set `date_created` as a property on the model. The DB should be default it on insert. But we have a translation for it. The `$formatDatabaseJson()` was seeing the translation and calling

```javascript
json[translation.database] = json[translation.model]
```

So, `dateCreated` doesn't exist in the JSON, but we've added the property. Then we set it to `undefined`, because we're again referring to something that wasn't added to the model instance. Result, Objection.js tries to INSERT `date_created` as `null`.

We've updated the logic in $formatDatabaseJson() and $parseDatabaseJson(), just to be on the safe side, to only consider translations that exist on the json object being modified. The javascript _in_ operator returns true or false whether the named property exists in the object. An undefined or null property still exists so we get `true` back in those cases, which is what we want as there may be times we are intending to set a field to null.
This keeps it consistent with schema and forces to remember the concept of translations every time we add a new model to the repo.
In the course of implementing this change we've realised that we cannot use standard property names in queries. We still have to use the legacy column names.

We've chased down every possible option we could find to support both working with instances _and_ queries but it's no use.

We'll fully document what we found and tried in the code when we start adding documentation.
This includes a detailed explanation as to why we are limited to only seeing the translations when working with an instance of a model.
Co-authored-by: Stuart Adair <[email protected]>
@Cruikshanks Cruikshanks force-pushed the make-timestamps-consistent-in-models branch from 8dce417 to 3b0d991 Compare January 13, 2023 16:06
@Cruikshanks Cruikshanks merged commit 967b634 into main Jan 13, 2023
@Cruikshanks Cruikshanks deleted the make-timestamps-consistent-in-models branch January 13, 2023 17:55
Cruikshanks added a commit that referenced this pull request Feb 24, 2023
https://eaflood.atlassian.net/browse/WATER-3896

Whilst working on [Add SROC Supplementary Billing Invoice Service](#119) we hit an issue. It was the first time we needed to query the `crm_v2` schema. But that threw errors due to the way [Objection.js knexSnakeCaseMappers()](https://vincit.github.io/objection.js/api/objection/#knexsnakecasemappers) works. It uses Knex's [wrapIdentifier()](https://knexjs.org/guide/#wrapidentifier) and [postProcessResponse()](https://knexjs.org/guide/#postprocessresponse) hooks to see each 'identifier' name. It can then test whether it needs converting, either from or to snake case.

For example, `knex('table').withSchema('foo').select('table.field as otherName').where('id', 1)` will call `wrapIdentifier()` for the values `'table'`, `'foo'`, `'table'`, `'field'`, `'otherName'` and `'id'`.

**knexSnakeCaseMappers()** takes some options, one of them being `underscoreBeforeDigits`. If we didn't use this fields like `address_line_1` or `section_127_agreement` in the DB would be incorrectly converted to `address_line1` and `section127_agreement`. But this is what leads to our problem.

The previous teams' decision to name one of the schemas `crm_v2` leads to an incorrect conversion. **knexSnakeCaseMappers()** is seeing this and returning `'crm_v_2'`. It doesn't know it's a schema instead of a column name because Knex doesn't provide that context. This is the exact same issue we faced in [Make timestamps consistent at model layer](#85).

Thankfully, a solution that didn't work there will work here. We can provide our own custom knex snake case mappers implementation which knows to ignore `'crm_v_2'`, whilst calling Objection.js own methods for everything else. It also gives us a solution if we face any more funnies like this when dealing with the legacy data.
Cruikshanks added a commit that referenced this pull request Feb 24, 2023
https://eaflood.atlassian.net/browse/WATER-3896

Whilst working on [Add SROC Supplementary Billing Invoice Service](#119) we hit an issue. It was the first time we needed to query the `crm_v2` schema. But that threw errors due to the way [Objection.js knexSnakeCaseMappers()](https://vincit.github.io/objection.js/api/objection/#knexsnakecasemappers) works. It uses Knex's [wrapIdentifier()](https://knexjs.org/guide/#wrapidentifier) and [postProcessResponse()](https://knexjs.org/guide/#postprocessresponse) hooks to see each 'identifier' name. It can then test whether it needs converting, either from or to snake case.

For example, `knex('table').withSchema('foo').select('table.field as otherName').where('id', 1)` will call `wrapIdentifier()` for the values `'table'`, `'foo'`, `'table'`, `'field'`, `'otherName'` and `'id'`.

**knexSnakeCaseMappers()** takes some options, one of them being `underscoreBeforeDigits`. If we didn't use this, fields like `address_line_1` or `section_127_agreement` in the DB would be incorrectly converted to `address_line1` and `section127_agreement`. But this is what leads to our problem.

The previous teams' decision to name one of the schemas `crm_v2` leads to an incorrect conversion. **knexSnakeCaseMappers()** is seeing this and returning `'crm_v_2'`. It doesn't know it's a schema instead of a column name because Knex doesn't provide that context. This is the exact same issue we faced in [Make timestamps consistent at model layer](#85).

Thankfully, a solution that didn't work there will work here. We can provide our own custom Knex snake case mappers implementation which knows to ignore `'crm_v_2'`, whilst calling **Objection.js** own methods for everything else. It also gives us a solution if we face any more funnies like this when dealing with the legacy database.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
housekeeping Refactoring, tidying up or other work which supports the project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants