Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Model Col. SchemaVersions and migrations on Cols #2286

Merged
merged 17 commits into from
Feb 12, 2024

Conversation

AndrewSisley
Copy link
Contributor

@AndrewSisley AndrewSisley commented Feb 5, 2024

Relevant issue(s)

Resolves #2198

Description

Models Collection SchemaVersions and migrations on Collections, instead of in the Lens Registry.

There is quite a lot going on this PR, and most of it is shoved into the last commit, there was quite a lot that had to happen at the same time, and probably a few changes that could have been broken out to other commits, weren't.

Probably the easiest way to view the user visible changes is in tests/integration/schema/updates/with_schema_branch_test.go. There is a new test action, and the remodelling should be quite visible there.

Primary motivators for the change:

  1. Boost the DevEx RE Lens and schema versions. It seemed pretty painful before to find transforms, yet alone figure out where that lay in the version history, or add them to a collection.
  2. Unify the Collection history/schema history with the View system, so that they both roughly work in the same way. Reducing the cognitive and maintenance overhead of having two different systems for the same thing for both us and the users.

Noteworthy stuff:

  1. The branching of schema is not new, it is just more visible with this change. There is nothing we can really do to block it in a distributed system. I do now see it as supported, excluding Lens, which still needs a relatively minor tweak in order to handle the migration-pathing.
  2. The model change in client/descriptions.go
  3. The removal of stuff from the LensRegistry. This is currently kept to a minimum, however I would very much like to completely remove LensRegistry in the near future.
  4. The CLI and the Http clients used to rely on the fact that db.SetMigration and registry.SetMigration did the same thing. This is no longer the case.
  5. The CLI and the Http clients followed a different, flatter structure than the Go client, flattening the lens registry into the lens/schema pathways. This was inconsistent, and only worked when (4) was as it was. This has been changed out of necessity and the 3 clients now have matching structures.
  6. The CLI still paths everything under schema, and the http client everything under lens, which is strange, but out of scope. I will open a ticket to make them the same.
  7. Some of the schema versions in the existing schema tests have changed, I think there was a bug in the framework where it was not caring about their accuracy. It now cares.
  8. Users can now specify their transforms when patching a schema.
  9. Out of the 4 collection getting funcs on client.Store, only GetAllCollections has the ability to get inactive collections. I currently very much dislike that we have 4 functions for the same thing in the Go client, especially given that this is not reflected in the Http and CLI clients. In another ticket I would very much like us to collapse these four functions into one, and have that one function support the fetching of inactive collections.

Todo:

@AndrewSisley AndrewSisley self-assigned this Feb 5, 2024
@AndrewSisley AndrewSisley changed the title feat: Model Collection SchemaVersions and migrations on Collections feat: Model Col SchemaVersions and migrations on Collections Feb 5, 2024
@AndrewSisley AndrewSisley added feature New feature or request area/schema Related to the schema system area/collections Related to the collections system labels Feb 5, 2024
@AndrewSisley AndrewSisley added this to the DefraDB v0.10 milestone Feb 5, 2024
@AndrewSisley AndrewSisley changed the title feat: Model Col SchemaVersions and migrations on Collections feat: Model Col. SchemaVersions and migrations on Cols Feb 5, 2024
Copy link

codecov bot commented Feb 5, 2024

Codecov Report

Attention: 185 lines in your changes are missing coverage. Please review.

Comparison is base (86610d7) 74.15% compared to head (e2c8d81) 73.83%.

❗ Current head e2c8d81 differs from pull request most recent head af16fc7. Consider uploading reports for the commit af16fc7 to get more accurate results

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2286      +/-   ##
===========================================
- Coverage    74.15%   73.83%   -0.32%     
===========================================
  Files          258      257       -1     
  Lines        25656    25739      +83     
===========================================
- Hits         19024    19003      -21     
- Misses        5302     5408     +106     
+ Partials      1330     1328       -2     
Flag Coverage Δ
all-tests 73.83% <73.50%> (-0.32%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
cli/cli.go 100.00% <100.00%> (ø)
cli/collection_describe.go 83.78% <100.00%> (+0.93%) ⬆️
cli/schema_migration_set.go 71.19% <100.00%> (+0.50%) ⬆️
cli/schema_set_active.go 100.00% <100.00%> (ø)
db/backup.go 59.64% <100.00%> (ø)
db/base/collection_keys.go 80.00% <100.00%> (ø)
db/description/schema.go 62.96% <100.00%> (+0.46%) ⬆️
db/schema.go 86.50% <100.00%> (+0.07%) ⬆️
http/openapi.go 97.25% <ø> (ø)
lens/fetcher.go 68.63% <100.00%> (+2.12%) ⬆️
... and 22 more

... and 19 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 86610d7...af16fc7. Read the comment docs.

@AndrewSisley AndrewSisley force-pushed the 2198-col-sources-pt2 branch 9 times, most recently from e62de3d to a6dd001 Compare February 5, 2024 21:00
@AndrewSisley AndrewSisley marked this pull request as ready for review February 5, 2024 21:33
@AndrewSisley AndrewSisley requested a review from a team February 5, 2024 21:33
Copy link
Contributor

@islamaliev islamaliev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a brief review. Looks good so far. Have just some minor comments and question.
I might have another look at it some time later.

client/db.go Outdated
// present in the database.
//
// If true is provided, the new schema versions will be made active and previous versions deactivated, otherwise
// [SetDefaultSchemaVersion] should be called to do so.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: adjust doc to [SetActiveSchemaVersion]

Copy link
Contributor Author

@AndrewSisley AndrewSisley Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers :) Will do a search to make sure there aren't others I've missed too :)

  • Fix remaining SetDefaultSchemaVersion

@@ -92,6 +111,10 @@ func (col CollectionDescription) QuerySources() []*QuerySource {
return sourcesOfType[*QuerySource](col)
}

func (col CollectionDescription) CollectionSources() []*CollectionSource {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: please add documentation to the public method

Copy link
Contributor Author

@AndrewSisley AndrewSisley Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thanks :)

  • Doc CollectionSources()

@@ -111,6 +134,28 @@ type QuerySource struct {
Query request.Select
}

// QuerySource represents a collection data source from another collection instance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: CollectionSource

Copy link
Contributor Author

@AndrewSisley AndrewSisley Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😁 Copy-pasta

  • Fix doc

db/collection.go Outdated
continue
// The collection version may exist before the schema version was created locally. This is
// because migrations for the globally known schema version may have been registered locally
// (typically to handle documentes synced over P2P at higher versions) before the local schema
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: documents

Copy link
Contributor Author

@AndrewSisley AndrewSisley Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks :)

  • Fix typo

db/collection.go Outdated
@@ -196,31 +199,92 @@ func (db *db) updateSchema(
return err
}

if setAsDefaultVersion {
cols, err := description.GetCollectionsBySchemaVersionID(ctx, txn, previousVersionID)
// After creating the new schema version, we need to new create collection versions for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "to create new collection"

Copy link
Contributor Author

@AndrewSisley AndrewSisley Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wording was deliberate, to try and avoid ambiguity between many collections with one root, and a brand new collection-set with a different root. It is done in a few places.

Will have another think on this, but please do share any further thoughts of your own in the meantime :)

  • "to create new collection" vs "create collection versions" vs other

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just that "we need to new create" has new in the wrong position.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh 😁 I didn't spot that on re-read even, will fix lol

}

cols := make([]client.CollectionDescription, 0)
for res := range q.Next() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: you could you datastore.DeserializePrefix[uint32] here.

It could be also used for GetCollections method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think DeserializePrefix would be beneficial here, it hides the interaction with the datastore, and creates more work for the machine here (second iteration required, an unused set for Key), and is another func for the reader to have to navigate through when they are reading.

GetCollections is a better fit than this one, but I remain reluctant due to the extra layer.

sourceCols = append(sourceCols, col)
}

for _, sourceCol := range sourceCols {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I think we should try to use index-based approach as much as possible:

for i := range sourceCols {
// ...
if source.SourceCollectionID == sourceCols[i].ID {

It might seem to be a little improvement, but it add up significantly.
This is one of the things I don't like about Go: it makes things look deceptively simple.
When one writes C++ code he is (or should be) constantly aware of costs of every instruction.
With Go it feels different (talking from my perfonal experienece). But it's the same CPU cycles that we waste.

If you really want to refer to this var as sourceCol you can extract it as a pointer:

for i := range courseCols {
    courseCol := &courseCols[i]

I think we all should try to follow this approach as much as possible.

Note: it applies to other parts in the code as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) I do disagree with this, and do see it as a personal styling preference of yours. I find a more 'modern' for i, foo loop to be much more readable and less error prone. The performance difference can go either way too, and is typically very very small.

I do see courseCol := &courseCols[i] as the worst of both worlds lol. Especially when it is creating a pointer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a personal styling preference of yours

It's not personal. I think it is everybody's preference to have faster code than slower.

Yes, you are right, "modern" style is a bit more readable, but I don't think it outweighs the benefits of faster code.
If you look at one instance of a loop it seem like very small impact, but this approach is used in lots of places, which means we do a lot of copying. This adds up.

I do see courseCol := &courseCols[i] as the worst of both worlds lol. Especially when it is creating a pointer.

How is it worst?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not personal. I think it is everybody's preference to have faster code than slower.

Performance is not the ultimate goal of most code. Typically we trade performance for readability (otherwise we'd all code in assembly).

Your suggestion is not always going to be more performant either. foo[i] is not free.

How is it worst?

It does what for i, foo does, but in a more unexpected way, and it uses a pointer. Depending on the size of foo that can be more expensive than copying, and it can result in more GC pressure.


In my opinion:

Using for i { everywhere is seen by me as a premature optimisation with an unproven and likely negligible impact on our users and a very real negative impact on readability.

And readability issues breed bad and unperformant code. Prioritising blanket micro-optimisations like this could ultimately result in much worse performance for our users.

And given that we differ, and a very significant part of our profession appears to also differ (else for i, foo would not exist in Go and other langs), I see this as a personal preference and is not something that should be imposed on any developer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using for i { everywhere

I didn't say everywhere. Some types (like ints) are much cheaper to copy by values. I should be decided on case-by-case basis.
CollectionDescription is cheaper to pass around by pointer.

Anyway... didn't want to start a long conversation. It's a suggestion after all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is that for i, foo := range bar improves readability and the extra allocation here has very little impact on performance. Not worth switching to for i := range bar.

@@ -269,7 +269,13 @@ func (s *server) PushLog(ctx context.Context, req *pb.PushLogRequest) (*pb.PushL
if len(cols) == 0 {
return nil, client.NewErrCollectionNotFoundForSchema(schemaRoot)
}
col := cols[0]
var col client.Collection
for _, c := range cols {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: just another reminder to use index-based iteration approach

Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like the changes. Just a handful of comments to take care of before approval and I have something I want to bring up for discussion on Discord. Thanks Andy!

cli/schema_migration_set.go Outdated Show resolved Hide resolved
cli/schema_patch.go Outdated Show resolved Hide resolved
client/descriptions.go Show resolved Hide resolved
core/key.go Show resolved Hide resolved
core/key.go Outdated Show resolved Hide resolved
core/key.go Outdated Show resolved Hide resolved
sourceCols = append(sourceCols, col)
}

for _, sourceCol := range sourceCols {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is that for i, foo := range bar improves readability and the extra allocation here has very little impact on performance. Not worth switching to for i := range bar.

Copy link
Collaborator

@fredcarle fredcarle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for making the changes :)

@AndrewSisley AndrewSisley merged commit 398ffbc into sourcenetwork:develop Feb 12, 2024
27 of 28 checks passed
@AndrewSisley AndrewSisley deleted the 2198-col-sources-pt2 branch February 12, 2024 18:10
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
…#2286)

## Relevant issue(s)

Resolves sourcenetwork#2198

## Description

Models Collection SchemaVersions and migrations on Collections, instead
of in the Lens Registry.

Primary motivators for the change:
1. Boost the DevEx RE Lens and schema versions. It seemed pretty painful
before to find transforms, yet alone figure out where that lay in the
version history, or add them to a collection.
2. Unify the Collection history/schema history with the View system, so
that they both roughly work in the same way. Reducing the cognitive and
maintenance overhead of having two different systems for the same thing
for both us and the users.

Noteworthy stuff:
1. The branching of schema is not new, it is just more visible with this
change. There is nothing we can really do to block it in a distributed
system. I do now see it as supported, excluding Lens, which still needs
a relatively minor tweak in order to handle the migration-pathing.
2. The model change in client/descriptions.go
3. The removal of stuff from the LensRegistry. This is currently kept to
a minimum, however I would very much like to completely remove
LensRegistry in the near future.
4. The CLI and the Http clients used to rely on the fact that
db.SetMigration and registry.SetMigration did the same thing. This is no
longer the case.
5. The CLI and the Http clients followed a different, flatter structure
than the Go client, flattening the lens registry into the
`lens`/`schema` pathways. This was inconsistent, and only worked when
(4) was as it was. This has been changed out of necessity and the 3
clients now have matching structures.
6. The CLI still paths everything under `schema`, and the http client
everything under `lens`, which is strange, but out of scope. I will open
a ticket to make them the same.
7. Some of the schema versions in the existing schema tests have
changed, I think there was a bug in the framework where it was not
caring about their accuracy. It now cares.
8. Users can now specify their transforms when patching a schema.
9. Out of the 4 collection getting funcs on `client.Store`, only
`GetAllCollections` has the ability to get inactive collections. I
currently very much dislike that we have 4 functions for the same thing
in the Go client, especially given that this is not reflected in the
Http and CLI clients. In another ticket I would very much like us to
collapse these four functions into one, and have that one function
support the fetching of inactive collections.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/collections Related to the collections system area/schema Related to the schema system feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CollectionDescription Source refactor
3 participants