-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Fetcher field selection and optimistic filter evaluation #491
Conversation
By default set to 1.
Recursively ignores all .txt .svg and .png files in the bench/ folder
…iltering, and lazy value loading
note: Depending on your preferred review style, you can ignore the commit history, as I kept the previous attempts' progress/WIP commits when trying different approaches (for historical reasons). So you can scope your review to the files changed tab as that will show only the most recent approach |
Benchmark results for the Uses
|
@@ -120,7 +120,7 @@ func NewDataStoreKey(key string) DataStoreKey { | |||
} else { | |||
indexOfDocKey = numberOfElements - 1 | |||
} | |||
dataStoreKey.DocKey = elements[indexOfDocKey] | |||
dataStoreKey.DocKey = strings.Split(elements[indexOfDocKey], ":")[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why so scared @AndrewSisley? lol
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTSD coming back from the previous key-code 😆 We used to have some very horrible and unsafe string magic, both in the original, and my first refactor of it #84.
I'm guessing this can be removed, as I didn't spot any other refs to it last night - but I only gave it a quick scan as my mental energy was quite low and it is a complicated PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The strings.Split
function is pretty safe here. It will alway return an slice with length of at least one so using [0]
will never panic with index out of range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually have no memory doing this or why it's here. Prob related to a bug I noticed a while back w.r.t the instance type and doc key not being properly parsed. Will def cleanup/make safer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looked like that was the case - will comment more if it stays 😆
@@ -156,6 +160,7 @@ type Spans []Span | |||
|
|||
// KeyValue is a KV store response containing the resulting core.Key and byte array value | |||
type KeyValue struct { | |||
Res dsq.Result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I know this is super WIP code, so I'm guessing this isn't the long term plan - but just in case you miss it in the cleanup or whatever, I really dont think this should live here and the fetcher might need to define it's own internal KeyValue struct or similar instead of leaking this through here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed in keeping this internal to the fetcher
// 2) we have a filter and its a filter field | ||
// 3) we have passed the filter | ||
// then get the value | ||
// otherwise itll be lazy loaded down the line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: I would be really really cautious in making this lazy outside of the fetcher. This is a IO/file/system/whatever operation and making it too lazy could really result in some misleading benchmarks, and nasty reliability issues, as well as the more obvious leaking of concerns/concepts through to other areas of the codebase.
I think I might be more concerned about this than the required modifications to badger/etc, and would be curious as to how much you think we gain from this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is lazy within the fetcher. Itll be resolved before fetcher.FetchNext returns.
It's lazy in the chance that the entire document is ignored due to filter not passing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which case there's no point spending the time to copy the value bytes from badger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay - got it. When cleaning up it might be worth tweaking this comment then (depending on how the eventual code looks like), as to me it read like the laziness would be leaked outside the fetcher which is much scarier/important
Closing as this is too old and there is a new PR with a diff approach #1500 |
## Relevant issue(s) Resolves #490 Resolves #1582 (indirectly) ## Description This is a reduced version of #491. It takes a very different approach, and tries to keep as much of the existing Fetcher structure as possible. Basically, this will try to eagerly ignore documents that don't pass the given filter at the fetcher level. This means we can apply various optimizations then if the filter was applied at the scanNode level like before.
## Relevant issue(s) Resolves sourcenetwork#490 Resolves sourcenetwork#1582 (indirectly) ## Description This is a reduced version of sourcenetwork#491. It takes a very different approach, and tries to keep as much of the existing Fetcher structure as possible. Basically, this will try to eagerly ignore documents that don't pass the given filter at the fetcher level. This means we can apply various optimizations then if the filter was applied at the scanNode level like before.
RELEVANT ISSUE(S)
Resolves #490
DESCRIPTION
Refactors the db document fetcher to handle
Side-effect is to optimize the fetcher to use the BadgerBD efficient key only iteration, which required additional changes in the go-datastore interfaces, BadgerDB, and BadgerDB datastore interface.
This has significant performance increases depending on the type of query (2-4x)
This is a pretty notable change, not in its design, but the side effect in the other packages. We can track the required changes for the dependant repos:
BadgerDB - sourcenetwork/badger#1
go-datastore - sourcenetwork/go-datastore#1
This PR is an extreme WIP, still has lingering artifacts from the several previous attempts at this goal, and debug print statements.
Additionally, many additional benchmarks have been added throughout this effort and need to be cleaned up.
HOW HAS THIS BEEN TESTED?
At the moment it uses the existing integration testing suite. I focused on the query tests, there may be others this breaks.
CHECKLIST:
ENVIRONMENT / OS THIS WAS TESTED ON?
Please specify which of the following was this tested on (remove or add your own):