Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace core.Key and ds.Key with strongly typed data structures #19

Closed

Conversation

AndrewSisley
Copy link
Contributor

Replace the string based core.Key and ds.Key types with strongly typed data structures.

Namespacing is now entirely handled by the datastores and should not be present on the keys (i.e. no leading '/db/data').

Fixes the factory_test.go tests, only failing tests are failing on master too (Test_Generator_buildTypesFromAST_SingleScalarField, TestMutationParse_Update_Simple_Array and TestMutationParse_Create_Error_Missing_Data).

Database dumps seem to pair up with master (manually checked a few of the db/tests, and my local file-based instance).

Performance doesn't seem to have taken a hit (checked manually firing in queries via UI client, might even be improved due to removal of a few string manipulations - query time was 250-680 micro seconds), but maybe have an eye out for anything silly I may have done in the planner in particular.

System will likely need further extension in the time-traveling branch, and when implementing the non-primary index stuff.

@AndrewSisley AndrewSisley self-assigned this Oct 29, 2021
@todo
Copy link

todo bot commented Oct 29, 2021

unit test this like crazy

defradb/core/key.go

Lines 85 to 90 in ecc6049

func NewDataStoreKey(key string) DataStoreKey { //@todo: unit test this like crazy
dataStoreKey := DataStoreKey{}
if key == "" {
return dataStoreKey
}


This comment was generated by todo based on a todo comment in ecc6049 in #19. cc @sourcenetwork.

@todo
Copy link

todo bot commented Oct 29, 2021

Support multiple spans

Prefix: spans[0].Start().ToString(), // @todo: Support multiple spans
}
if df.reverse {
q.Orders = []dsq.Order{dsq.OrderByKeyDescending{}}


This comment was generated by todo based on a todo comment in ecc6049 in #19. cc @sourcenetwork.

@todo
Copy link

todo bot commented Oct 29, 2021

, this is very lazy

df.doc.Key = []byte(kv.Key.DocKey) // core.DataStoreKey{DocKey: kv.Key.DocKey}.Bytes() //todo, this is very lazy
// keyFD := df.schemaFields[0] // _key
// df.doc.Properties[keyFD] = &document.EncProperty{
// Raw: df.doc.Key[:],


This comment was generated by todo based on a todo comment in ecc6049 in #19. cc @sourcenetwork.

@todo
Copy link

todo bot commented Oct 29, 2021

Config logger param package wide

headset: newHeadset(headstore, namespace), //TODO: Config logger param package wide
crdt: crdt,
}
}


This comment was generated by todo based on a TODO comment in ecc6049 in #19. cc @sourcenetwork.

@AndrewSisley
Copy link
Contributor Author

I think I have actually done something horrible, PR description benchmark were invalid as I forgot you need to re-run go install, newer benchmarks are considerably slower and I need to investigate before merge (current guess is I botched the span/PrefixWith stuff in planner).

@jsimnz
Copy link
Member

jsimnz commented Nov 1, 2021

ill check the perf on my side and investigate. TBD

@AndrewSisley
Copy link
Contributor Author

I think I just got lucky running the queries from master the first few times. Performance seems comparable ~500us - 2ms to master. Added a couple of commits (including one fixup) whilst going through again today. I will rebase onto master and squash the fixup before merging after/if the PR is approved

Is wasteful taking the whole struct (and initing one in most cases)
@AndrewSisley AndrewSisley force-pushed the merkle-crdt-factory-key-fix-without-store-comp branch from ef8c7aa to 8cbf009 Compare November 1, 2021 18:47
@jsimnz
Copy link
Member

jsimnz commented Nov 5, 2021

I think at the moment, I'd rather just implement the simpler version of the conversion, instead of the structured version.

It kinda seems like the same problems plauge the implementation here, specifically in the New constructors, it assumes certain structures, and does a lot to attempt to make sure the format is consistent, while technically not giving us more assurance then the generic version. Unless of course we manually construct the key structs each field at a time.

Thoughts?

// [CollectionId]/[PrimaryIndexId]/
//
// Any properties before the above (assuming a '/' deliminator) are ignored
func NewDataStoreKey(key string) DataStoreKey {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically these constructors is what im referring to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be very much in favour of merging lol :) But I can see where you are coming from - I spent quite a while trying to get rid of as many prod-code references to those constructors as possible.

Those constructors are only called in a small number of places (ignoring tests and the deprecated get.go, 3 for NewDataStoreKey, and once for the HeadStoreKey. And those are all on direct output from the datastore where the keys should be 100% complete. Everything else is manually built.

I think even if these were more widely used (which they shouldn't be), the review still solves the following:

  1. Code is no longer order dependent - e.g. if something in place x changes the structure by (e.g.) adding the dockey, then place y wont break in really weird ways because of the string parsing hacks it does to get the collection id based on the number of '/' present
  2. It is now much more obvious what is being used and where, instead of having to track back through the object's lifetime and navigate through the ds.Namespace()/Base/Split/whatever you can see that x is the CollectionId with IndexId
  3. The format of string output is much more visible and you can now see what is being written/read
  4. Removes the db/namespace from the vars being passed through the codebase, forcing the datastores to be responsible for managing that and reducing the 'stuff' the devs see when debugging. I also found that their inclusion within the string-key was partial and very dependent on where you were in the codebase, not all places had them and some broke when were given them.

It also keeps all the scary stuff in the one place, where it is very visible, and tested. There is a risk that we'll accidentally call these functions when we shouldn't (i.e. when there is no reason to, and we can manually set stuff), but it is better than having poorly documented structured-strings floating round all over the place :P

@jsimnz
Copy link
Member

jsimnz commented Nov 8, 2021 via email

@AndrewSisley
Copy link
Contributor Author

Code relies on the datastore wrapping - I paid quite a lot of attention to the db dumps during testing to make sure I wasn't losing that.

These are all pretty valid reasons :). And the benefits are fairly clear. I haven't checked yet, but does this play well with the actual data stores? When you call txn.headstore.get(/my-doc-key) the 'headstore' references a "Wrapped data store" from the go-datastore package. Which automatically handles prefix wrapping and stripping for the "/db/heads" prefix. Same applies to /db/data, /db/blocks, and /db/system.

@jsimnz
Copy link
Member

jsimnz commented Nov 8, 2021

OK, I'm inclined to go with this approach, however with this much string manipulation/allocation in the hot path, I want to get at least a rudimentary benchmark suite in place to ensure everything is good without notable (or ideally any) perf degredation.

@AndrewSisley
Copy link
Contributor Author

Okay fair enough :) I'll probs get cracking on that (after I finish the group stuff) if you don't beat me to it :D

@jsimnz jsimnz linked an issue Nov 17, 2021 that may be closed by this pull request
1 task
@AndrewSisley
Copy link
Contributor Author

Closing this PR as Github doesnt seem to allow me to change the target branch once a PR is open. Will open a new one against develop.

@AndrewSisley AndrewSisley deleted the merkle-crdt-factory-key-fix-without-store-comp branch February 7, 2022 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Convert all instances of ds.Key to core.Key in both internal and external APIs
2 participants