-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Remove CollectionDescription.Schema #1965
feat: Remove CollectionDescription.Schema #1965
Conversation
b16ff04
to
159aa0b
Compare
Codecov ReportAttention:
@@ Coverage Diff @@
## develop #1965 +/- ##
===========================================
- Coverage 74.30% 74.03% -0.28%
===========================================
Files 242 244 +2
Lines 23436 23508 +72
===========================================
- Hits 17414 17402 -12
- Misses 4837 4899 +62
- Partials 1185 1207 +22
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 8 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
db/collection.go
Outdated
existingDescriptionsByName map[string]client.CollectionDescription, | ||
proposedDescriptionsByName map[string]client.CollectionDescription, | ||
existingSchemaByName map[string]client.SchemaDescription, | ||
proposedDescriptionsByName map[string]client.SchemaDescription, | ||
desc client.CollectionDescription, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: Passing this in no longer makes any sense, not does fetching it in patchSchema. Drop the patchSchema call to getCollectionsByName
, and fetch it on demand in here.
See also #1939 (comment)
- Remove unwanted complexity RE collectionDescription param
collectionSchemaVersionKey := core.NewCollectionSchemaVersionKey(schema.VersionID) | ||
// Whilst the schemaVersionKey is global, the data persisted at the key's location | ||
// is local to the node (the global only elements are not useful beyond key generation). | ||
err = txn.Systemstore().Put(ctx, collectionSchemaVersionKey.ToDS(), buf) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
collectionSchemaKey := core.NewCollectionSchemaKey(schemaID) | ||
err = txn.Systemstore().Put(ctx, collectionSchemaKey.ToDS(), []byte(schemaVersionID)) | ||
collectionSchemaKey := core.NewCollectionSchemaKey(schema.SchemaID) | ||
err = txn.Systemstore().Put(ctx, collectionSchemaKey.ToDS(), []byte(schema.VersionID)) | ||
if err != nil { | ||
return nil, err | ||
} | ||
|
||
err = txn.Systemstore().Put(ctx, collectionKey.ToDS(), []byte(schemaVersionID)) | ||
err = txn.Systemstore().Put(ctx, collectionKey.ToDS(), []byte(schema.VersionID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought: These make no sense anymore. I think this is what you were referring to here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes :)
db/descriptions/schema.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Although I like the functions that you create here, I'm not a fan of the descriptions
package. I would much rather see this under db
and db.schema.go
is still small enough to host the content of this file without causing trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reasons for this is documented in the description, whilst it is very small now, it will hopefully grow very soon. This package provides a single cohesive location for all similar stuff (schema, collection, index, repo-stuff) to live alongside each other without getting blended into the rest of the db package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the description. My comment still stands. I don't think that even after moving collection in there that it would make sense to me. I think having the content under a file with a representative name under db
would be much better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be quite a lot of code. The schema stuff is 170 lines, collection and index and anything else that follows would be the same. Plus the repository layer on top of them. Why would you want all of that spread about a fairly large and general purpose package (db
)? It would be much much easier to read and maintain if it all lived in isolation in the same location.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe it would make any difference for maintainability. The schema stuff was already in there. You took it out of existing functions. If the files are well named and represent their contents, I don't really care how large the package is in terms of lines of code. Readability would be unchanged. I also care more about the size of functions than the size of files.
If you move collection to a sub package, we will end up with the sub-package having its parent as a dependency and eventually risk hitting circular dependency issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collection depends on db...
I do not understand what you mean here. You might be misunderstanding what will be living in this package.
Why not just add collection_description.go and schema_description.go instead of creating a sub-package?
Because they are going to be separated by 10 or so other files in the db
package that are alphabetically larger than the first and smaller than the second, with index_description.go
somewhere in between, plus whatever files the repo stuff will use, all causing cohesion to be lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If schema_description.go
is next to schema.go
and functions inside schema.go
call functions inside schema_description.go
, I think that's pretty good cohesion.
Why would having schema_description.go
next to collection_description.go
be more cohesive than the above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would having schema_description.go next to collection_description.go be more cohesive than the above?
The below is the definition of 'cohesive' returned from duck-duck-go. By definition if two files not next to each other, they are less cohesive than two files that are next to each other.
cohesive:
Holding the particles of a homogeneous body together; ; producing cohesion
Cohering, or sticking together, as in a mass; capable of cohering; tending to cohere.
For example, if the files were called description_schema.go
and description_collection.go
, they would be co-located, and it would be much easier to spot the shared concepts. A directory is just that really but using a /
instead of an _
, with some special treatment from file browser - this also emphasises the fact that they are separate conceptually from the rest of the db
package contents.
When I look at the codebase, I'd like to see the stuff defining how to save and fetch schema next to the code used to save and fetch collections, and I'd also want the two neatly organised so that the implementation to fetch schema isn't mixed up with the code to save collections.
This lets me make sure that they all look and behave in predictable ways, and if I want to change all of them it is much more obvious what 'all of them' is, so that I don't accidentally miss some similar stuff (e.g. index descriptions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright I'll get behind your approach. I would like that you name it in the singular if it's not too big an ask as pointed out by islam.
I am worried that we will eventually hit a circular dependency situation given the proximity to the types in the root db package. But we can cross that bridge when we get there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay cheers, I'll add a task/note against the package name convo to rename it
74704e0
to
c8170a0
Compare
@@ -132,6 +133,12 @@ type CollectionIndexKey struct { | |||
|
|||
var _ Key = (*CollectionIndexKey)(nil) | |||
|
|||
type SchemaVersionKey struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: I think it needs a comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cheers - I missed that, will add
- Doc key
@@ -257,6 +264,11 @@ func NewCollectionSchemaVersionKey(schemaVersionId string) CollectionSchemaVersi | |||
return CollectionSchemaVersionKey{SchemaVersionId: schemaVersionId} | |||
} | |||
|
|||
func NewCollectionSchemaVersionKeyFromString(key string) CollectionSchemaVersionKey { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Don't you think it makes sense to add some unit tests? There is a dedicated file that tests keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not. This is very simple, and tested by pretty much every integration test we have. Unit tests would only slow down development (now and in the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would be relying on an integration test to flag an issue with something that can easily be locally tested? That means that if we make a change to any of the keys, we would have to run the integration test suite to see if there are any problems with the change instead of running the much faster local unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a closer look and I think that not unit testing our keys could cause bad changes to go unnoticed until the breaking change tests are executed which could be quite annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really in the mood to get back into the integration vs unit test conversation at the moment. I have written a large amount on this topic before and you know my opinion on this matter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have two overall objective with testing:
1- Testing behaviour that is meaningful to the end user (like you mentioned above)
2- Reaching maximum code coverage so that all possible code paths have been tested (sometimes that is currently impossible to do just with integration tests). This is a goal that the team set for itself (significantly increase code coverage).
I'll end my input on this conversation with reminding you of the above objectives. If you believe that simply relying on integration tests in case contributes towards achieving the above objectives, then I think we're good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you believe that simply relying on integration tests in case contributes towards achieving the above objectives
I very strongly believe so. Sometimes the integration tests will need to be more imaginative than our current core suite, for example chaos monkey like magic, or providing a deliberately faulty/test datastore implementation (the root datastore is something that can be publicly provided), but ultimately everything should be testable via public only access points. If it isn't, it is dead code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just another note: it is practically impossible to reach a desired test coverage just with integration tests.
Just imagine setting up "a deliberately faulty/test datastore" that you want to trigger for a specific request at a specific code location. You saw it was hard to achieve it with unit tests, doing it with integration tests will be a nightmare.
And those tests helped me write better code and fix bugs (not reported) in the existing code. Like this:
err := someAction()
if err != nil {
return err
}
...
res :=rsultsIter.NextSync()
if res.Error != nil {
return err // here is the problem
}
Usually we don't write test that such paths. But we can easily overlook the details and ship incorrect code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such resilience tests don't need to be coded like traditional do-this, do-that tests. You can use this https://en.wikipedia.org/wiki/Chaos_engineering approach instead. This can allow the tests to be somewhat proactive (searching for bugs) instead reactive (testing known pathways where bugs are expected).
Also a lot of our errors can be reached my manipulating the datastore values directly, no need to mess around with mocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also want to introduce a really magic testing layer, where randomized queries and setup data are fed in. Traditionally you can do this using recorded real-world data/actions, but there has been significant movement in using 'AI' to generate stuff like queries for such things.
db/descriptions/descriptions.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I think conventional way of naming packages in go is in singular form
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I always saw that as when talking about a single thing. For example, the db
package contains a single db
definition, that can be used to instantiate multiple db
instances. In descriptions
we have multiple description
definitions (schema, collection, indexes - eventually), each of which can be used to instantiate multiple instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as I said, there is no official rule for it, it's just community convention.
The main argument for it how you going to use it. Yes, the package might deal with several different descriptions, but for the most packages most of of their public methods deal with a single entity (a.k.a description).
Like context.Context
allows you to create (or wrap) different types of contexts, but you handle a single one at a time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like context.Context allows you to create (or wrap) different types of contexts, but you handle a single one at a time.
Yes, but a context.Context is still just a (singular) context.Context. The descriptions in this package do (will) not share an interface, or base inherited type. The description types are all public and independent, 'descriptions' is just a made up umbrella term because I needed to give the package a name - there is no Description
interface or type.
Another example is the enumerable
package, this is singular, because it describes a single thing, although there are many implementations, and some extensions of the base interface.
Note: I'm mostly just chatting for fun here, I dont really care about the name of the package, and think description
(s) is a bit awkward anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fred also prefers description
over descriptions
, renaming :)
- Rename package
db/descriptions/schema.go
Outdated
"github.com/sourcenetwork/defradb/datastore" | ||
) | ||
|
||
// CreateSchemaVersion creates and saves to the store a new schema version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: "a new schema description version"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do disagree, and don't think adding description
there adds any clarity. It does also clash with Fred's long-term naming suggestion where SchemaDescription
will be renamed to Schema
(or maybe SchemaVersion
, either way Description
will be dropped).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was just observation as a reader that I was looking for "description" word because the method is in descriptions
package, but it wasn't clear to me at first sight why.
db/descriptions/schema.go
Outdated
return desc, nil | ||
} | ||
|
||
// GetSchemas returns the schema of all the default schemas in the system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: what is "default schema"? Is it a common term in defra?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a fairly new term that came to be when we allowed the toggling of schema versions. There is a word missing here, it should read default schema versions
. A default schema version is a version that is the one used by a collection/query/etc when not explicitly requesting a version.
- Tweak doc wording
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all right, the term "default schema versions" I know :)
db/descriptions/schema.go
Outdated
} | ||
defer func() { | ||
if err := collectionSchemaVersionQuery.Close(); err != nil { | ||
log.Error(ctx, NewErrFailedToCloseSchemaQuery(err).Error()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: #1450 (comment) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hahaha, nice 😁 Thanks Islam 😆
- Return error
@@ -661,86 +661,6 @@ func TestNonUniqueDrop_ShouldDeleteStoredIndexedFields(t *testing.T) { | |||
assert.Len(t, f.getPrefixFromDataStore(prodCatKey.ToString()), 1) | |||
} | |||
|
|||
func TestNonUniqueDrop_IfDataStorageFails_ReturnError(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: I know we already touched on this one, but it really hard to see these tests are being cut off.
I know these unit test might bit a hard to maintain with all these mocks but they reach to the place where integration tests can't reach and help us achieve a greater code coverage which we are striving for.
I wonder how do you decide if some tests need to be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we already touched on this one, but it really hard to see these tests are being cut off.
I really do appreciate that, and it is a big personal reason why I do not want the removal of these tests to be blended in with the key.go unit test discussion.
help us achieve a greater code coverage which we are striving for
- Code coverage is not an end goal, it is just an indicator. I do not strive for code coverage, I strive for correct and maintainable code. Good tests contribute to code correctness and lower maintenance costs, bad tests can have a negative effect on both.
- Removing these tests does not appear to have affected code coverage.
I wonder how do you decide if some tests need to be removed.
:) I'm a lot more conservative than I want to be RE preserving unit tests. I have tried several times before to remove many of the unit tests that I have spend notable time changing in these last few PRs.
The index unit tests however have cost me a lot of time the last 10 days or so. It took me roughly half a day of development to get them working in a previous PR, I didn't bother trying in this one and removed the ones that started failing.
I do not consider these to be worth their maintenance cost.
They are the only tests to change in this PR.
This PR does not change any public behaviour.
This PR does not touch any index related code.
Yet these tests started to fail. These tests inhibit useful work, they prevent us from making useful changes to the codebase. Changes that are irrelevant to what they are testing for.
Almost everything that the tests test, is covered by integration tests. Some tests in these files do test errors which do not yet have integration test coverage, however I consider testing those errors to be very low value.
As well as the above, fixing these once they start failing is particularly painful. There is quite a lot of mock setup, and it is not easy code to read. Fixing them requires guessing how many internal calls to various mocks used by the tests are the 'correct' number, with the 'correct' input params and responses.
Somewhat related to the conversation we had RE the index-explain integration tests, these unit tests make unimportant stuff really important, and this wastes a lot of time.
In my opinion, the cost of these tests far outweighs any benefit of having them. Good tests save developer time (including external users' time in the case of bugs), that is why we write them - these tests cost more time than they will ever save.
I am not against the unit testing of stuff. Unit tests are useful in many cases.
I am not against mocks, they have their place, however here, with the amount of irrelevant stuff that they need to mock, they make these tests bad tests that cost more than they save.
If we wish to preserve tests similar in concept to the tests in this file they should be reworked somehow. They really should not fail when making the changes I have made in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet these tests started to fail.
To clarify on this, I changed the commit order - I try quite hard to make to all of my commits pass the CI, it makes a lot of things easier. So they started to fail part way through this PR, I removed them, then moved the commit to the start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey Andy, thanks for the detailed reasoning. I appreciate it.
I agree, these tests turned out to be a bit too invasive and have high maintenance cost with relatively little benefit.
These have cost me more than a day's worth of time the last week already and they have started failing again due to various internal mocks requiring the exact number of calls with the exact params etc and I have decided that is enough. These are bad tests that cost far more time to maintain than they will ever save. No public behaviour has changed here, yet they insist on failing, and require constant deep diving into really dense mock setup code that has very little to do with anything useful to our users. They are being removed now.
Also stores the schema against a new key. This key is not read in this commit, but will be shortly.
606947d
to
ac557cc
Compare
ac557cc
to
12ccb43
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're doing good progress on detangling collection and schema. Keep it up! 🙂
## Relevant issue(s) Resolves sourcenetwork#1958 ## Description Removes CollectionDescription.Schema. Also splits the storage of schema out from within collection. The storage of schema has been broken out to a new sub-package of db, at the moment it is a very simple file, but collection will be moved there in sourcenetwork#1964. I was planning on doing that in this PR (in part, to provide context for reviewers, as atm it is basically a single-file package), but it proved to be non-trivial due to some existing messiness in that space and was broken out to two more tasks. I also wish for stuff in that directory to eventually follow a repository-like pattern, where stuff is cached (within a context/txn's context) instead of fetching from store on each call. Moving this stuff out to a new directory instead of preserving it in the (already very large) db directory should make both db and the new sub-package a fair bit more cohesive and easier to read.
## Relevant issue(s) Resolves sourcenetwork#1958 ## Description Removes CollectionDescription.Schema. Also splits the storage of schema out from within collection. The storage of schema has been broken out to a new sub-package of db, at the moment it is a very simple file, but collection will be moved there in sourcenetwork#1964. I was planning on doing that in this PR (in part, to provide context for reviewers, as atm it is basically a single-file package), but it proved to be non-trivial due to some existing messiness in that space and was broken out to two more tasks. I also wish for stuff in that directory to eventually follow a repository-like pattern, where stuff is cached (within a context/txn's context) instead of fetching from store on each call. Moving this stuff out to a new directory instead of preserving it in the (already very large) db directory should make both db and the new sub-package a fair bit more cohesive and easier to read.
Relevant issue(s)
Resolves #1958
Description
Removes CollectionDescription.Schema.
Also splits the storage of schema out from within collection.
The storage of schema has been broken out to a new sub-package of db, at the moment it is a very simple file, but collection will be moved there in #1964. I was planning on doing that in this PR (in part, to provide context for reviewers, as atm it is basically a single-file package), but it proved to be non-trivial due to some existing messiness in that space and was broken out to two more tasks.
I also wish for stuff in that directory to eventually follow a repository-like pattern, where stuff is cached (within a context/txn's context) instead of fetching from store on each call.
Moving this stuff out to a new directory instead of preserving it in the (already very large) db directory should make both db and the new sub-package a fair bit more cohesive and easier to read.