feat: Fetcher field selection and optimistic filter evaluation #491

jsimnz · 2022-06-01T08:21:34Z

RELEVANT ISSUE(S)

Resolves #490

DESCRIPTION

Refactors the db document fetcher to handle

Field selection
Filter selection

Side-effect is to optimize the fetcher to use the BadgerBD efficient key only iteration, which required additional changes in the go-datastore interfaces, BadgerDB, and BadgerDB datastore interface.

This has significant performance increases depending on the type of query (2-4x)

This is a pretty notable change, not in its design, but the side effect in the other packages. We can track the required changes for the dependant repos:
BadgerDB - sourcenetwork/badger#1
go-datastore - sourcenetwork/go-datastore#1

This PR is an extreme WIP, still has lingering artifacts from the several previous attempts at this goal, and debug print statements.

Additionally, many additional benchmarks have been added throughout this effort and need to be cleaned up.

HOW HAS THIS BEEN TESTED?

At the moment it uses the existing integration testing suite. I focused on the query tests, there may be others this breaks.

CHECKLIST:

I have commented the code, particularly in hard-to-understand areas.
I have made corresponding changes to the repo-held documentation.
I have made sure that the PR title adheres to the conventional commit style (subset of the ones we use can be found under: tools/configs/chglog/config.yml

ENVIRONMENT / OS THIS WAS TESTED ON?

Please specify which of the following was this tested on (remove or add your own):

Arch Linux
Debian Linux
MacOS
Windows

By default set to 1.

Recursively ignores all .txt .svg and .png files in the bench/ folder

…iltering, and lazy value loading

jsimnz · 2022-06-01T22:28:29Z

note: Depending on your preferred review style, you can ignore the commit history, as I kept the previous attempts' progress/WIP commits when trying different approaches (for historical reasons). So you can scope your review to the files changed tab as that will show only the most recent approach

jsimnz · 2022-06-01T22:47:41Z

Benchmark results for the bench/query/simple benches only. Base (original) commit used: 9263b5d.

Uses -count 3 for benmcharks, everything else default (eg benchtime).

name                                               old time/op  new time/op  delta
_Query_UserSimple_Query1_WithFilter_Sync_1-12       249µs ± 5%   243µs ± 1%   -2.30%  (p=1.000 n=3+3)
_Query_UserSimple_Query1_WithFilter_Sync_10-12      413µs ± 3%   341µs ± 1%  -17.49%  (p=0.100 n=3+3)
_Query_UserSimple_Query1_WithFilter_Sync_100-12    2.25ms ± 0%  1.32ms ± 1%  -41.39%  (p=0.100 n=3+3)
_Query_UserSimple_Query1_WithFilter_Sync_1000-12   21.0ms ± 0%  10.6ms ± 0%  -49.38%  (p=0.100 n=3+3)
_Query_UserSimple_Query1_WithFilter_Sync_10000-12   208ms ± 1%   105ms ± 1%  -49.36%  (p=0.100 n=3+3)
_Query_UserSimple_Query2_WithFilter_Sync_1000-12   20.4ms ± 2%   9.7ms ± 1%  -52.28%  (p=0.100 n=3+3)
_Query_UserSimple_Query2_WithFilter_Sync_10000-12   202ms ± 0%    97ms ± 4%  -51.86%  (p=0.100 n=3+3)
_Query_UserSimple_Query3_WithFilter_Sync_1000-12   20.4ms ± 1%   8.7ms ± 1%  -57.42%  (p=0.100 n=3+3)
_Query_UserSimple_Query3_WithFilter_Sync_10000-12   203ms ± 4%    84ms ± 1%  -58.62%  (p=0.100 n=3+3)
_Query_UserSimple_Query4_WithFilter_Sync_1000-12   20.5ms ± 1%  10.1ms ± 7%  -50.80%  (p=0.100 n=3+3)
_Query_UserSimple_Query4_WithFilter_Sync_10000-12   202ms ± 1%    95ms ± 0%  -52.91%  (p=0.100 n=3+3)
_Query_UserSimple_Query5_WithFilter_Sync_1000-12   20.8ms ± 1%   7.8ms ± 1%  -62.36%  (p=0.100 n=3+3)
_Query_UserSimple_Query5_WithFilter_Sync_10000-12   208ms ± 3%    73ms ± 1%  -64.73%  (p=0.100 n=3+3)
_Query_UserSimple_Query6_WithFilter_Sync_1000-12   20.9ms ± 1%  10.7ms ± 2%  -48.69%  (p=0.100 n=3+3)
_Query_UserSimple_Query6_WithFilter_Sync_10000-12   207ms ± 0%   109ms ± 7%  -47.25%  (p=0.100 n=3+3)
_Query_UserSimple_Query7_WithFilter_Sync_1000-12   20.4ms ± 2%   8.8ms ± 2%  -57.10%  (p=0.100 n=3+3)
_Query_UserSimple_Query7_WithFilter_Sync_10000-12   205ms ± 1%    83ms ± 3%  -59.38%  (p=0.100 n=3+3)
_Query_UserSimple_Query8_WithFilter_Sync_1000-12   20.2ms ± 1%   8.3ms ± 1%  -58.79%  (p=0.100 n=3+3)
_Query_UserSimple_Query8_WithFilter_Sync_10000-12   199ms ± 0%    81ms ± 1%  -59.33%  (p=0.100 n=3+3)
_Query_UserSimple_Query9_WithFilter_Sync_1000-12   20.5ms ± 2%   7.5ms ± 1%  -63.55%  (p=0.100 n=3+3)
_Query_UserSimple_Query9_WithFilter_Sync_10000-12   206ms ± 5%    77ms ±11%  -62.52%  (p=0.100 n=3+3)

AndrewSisley · 2022-06-03T00:48:17Z

core/key.go

@@ -120,7 +120,7 @@ func NewDataStoreKey(key string) DataStoreKey {
 	} else {
 		indexOfDocKey = numberOfElements - 1
 	}
-	dataStoreKey.DocKey = elements[indexOfDocKey]
+	dataStoreKey.DocKey = strings.Split(elements[indexOfDocKey], ":")[0]


Why so scared @AndrewSisley? lol

PTSD coming back from the previous key-code 😆 We used to have some very horrible and unsafe string magic, both in the original, and my first refactor of it #84.

I'm guessing this can be removed, as I didn't spot any other refs to it last night - but I only gave it a quick scan as my mental energy was quite low and it is a complicated PR.

The strings.Split function is pretty safe here. It will alway return an slice with length of at least one so using [0] will never panic with index out of range.

Actually have no memory doing this or why it's here. Prob related to a bug I noticed a while back w.r.t the instance type and doc key not being properly parsed. Will def cleanup/make safer

looked like that was the case - will comment more if it stays 😆

AndrewSisley · 2022-06-03T14:50:05Z

core/data.go

@@ -156,6 +160,7 @@ type Spans []Span

 // KeyValue is a KV store response containing the resulting core.Key and byte array value
 type KeyValue struct {
+	Res   dsq.Result


suggestion: I know this is super WIP code, so I'm guessing this isn't the long term plan - but just in case you miss it in the cleanup or whatever, I really dont think this should live here and the fetcher might need to define it's own internal KeyValue struct or similar instead of leaking this through here.

Agreed in keeping this internal to the fetcher

AndrewSisley · 2022-06-03T14:56:58Z

db/fetcher/fetcher.go

+		// 2) we have a filter and its a filter field
+		// 3) we have passed the filter
+		// then get the value
+		// otherwise itll be lazy loaded down the line


suggestion: I would be really really cautious in making this lazy outside of the fetcher. This is a IO/file/system/whatever operation and making it too lazy could really result in some misleading benchmarks, and nasty reliability issues, as well as the more obvious leaking of concerns/concepts through to other areas of the codebase.

I think I might be more concerned about this than the required modifications to badger/etc, and would be curious as to how much you think we gain from this.

This is lazy within the fetcher. Itll be resolved before fetcher.FetchNext returns.

It's lazy in the chance that the entire document is ignored due to filter not passing

Which case there's no point spending the time to copy the value bytes from badger

Ah okay - got it. When cleaning up it might be worth tweaking this comment then (depending on how the eventual code looks like), as to me it read like the laziness would be leaked outside the fetcher which is much scarier/important

jsimnz · 2023-05-12T12:52:30Z

Closing as this is too old and there is a new PR with a diff approach #1500

## Relevant issue(s) Resolves #490 Resolves #1582 (indirectly) ## Description This is a reduced version of #491. It takes a very different approach, and tries to keep as much of the existing Fetcher structure as possible. Basically, this will try to eagerly ignore documents that don't pass the given filter at the fetcher level. This means we can apply various optimizations then if the filter was applied at the scanNode level like before.

## Relevant issue(s) Resolves sourcenetwork#490 Resolves sourcenetwork#1582 (indirectly) ## Description This is a reduced version of sourcenetwork#491. It takes a very different approach, and tries to keep as much of the existing Fetcher structure as possible. Basically, this will try to eagerly ignore documents that don't pass the given filter at the fetcher level. This means we can apply various optimizations then if the filter was applied at the scanNode level like before.

jsimnz added 21 commits March 24, 2022 19:03

Updated bench Makefile to include count arg

932bcff

By default set to 1.

Updated gitignore

9b3a13f

Recursively ignores all .txt .svg and .png files in the bench/ folder

WIP - key only iterator and explicit value lookup

5753fd7

Including keysonly option on iterator

e3e4dca

WIP - benchmark test cases

d54364f

Added badger specific tests

fc564cc

WIP

38e942a

initial outline of filter/seeking iterator

dd4152d

WIP - filled with data races, plz fix plz

9cb9335

Fixing tests

3f794ea

majority of simple/filter query tests passing

212db05

Seek point optimization

3e7294e

added more benchmarks

c753278

added more tests

3a630f0

WIP

43088ff

Proper benchmark naming for iterator benchs

795ef2a

Changed merkle clock logging to debug

f57c6c7

Removing old seek artifacts from previous attempt

8391119

Implemented new fetcher architecture based on item copy, optimistic f…

c190272

…iltering, and lazy value loading

Removed print debug

60fabcb

Updated gomod with new dev dependencies for badger/go-datastore

8894409

jsimnz added area/db-system Related to the core system related components of the DB feature New feature or request labels Jun 1, 2022

jsimnz added this to the DefraDB v0.3 milestone Jun 1, 2022

jsimnz requested a review from AndrewSisley June 1, 2022 22:44

AndrewSisley reviewed Jun 3, 2022

View reviewed changes

jsimnz self-assigned this Jun 3, 2022

jsimnz added the action/no-benchmark Skips the action that runs the benchmark. label Jun 7, 2022

jsimnz modified the milestones: DefraDB v0.3, DefraDB v0.4 Aug 4, 2022

jsimnz force-pushed the develop branch from a615303 to d87bae8 Compare January 8, 2023 00:16

jsimnz modified the milestones: DefraDB v0.4, DefraDB v0.5 Jan 20, 2023

shahzadlone modified the milestones: DefraDB v0.5, DefraDB v0.5.1 Apr 13, 2023

jsimnz mentioned this pull request May 12, 2023

refactor: Fetcher filter and field optimization #1500

Merged

5 tasks

jsimnz closed this May 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Fetcher field selection and optimistic filter evaluation #491

feat: Fetcher field selection and optimistic filter evaluation #491

jsimnz commented Jun 1, 2022

jsimnz commented Jun 1, 2022

jsimnz commented Jun 1, 2022

AndrewSisley Jun 3, 2022

fredcarle Jun 3, 2022

AndrewSisley Jun 3, 2022

fredcarle Jun 3, 2022

jsimnz Jun 4, 2022

AndrewSisley Jun 6, 2022

AndrewSisley Jun 3, 2022

jsimnz Jun 4, 2022

AndrewSisley Jun 3, 2022

jsimnz Jun 4, 2022

jsimnz Jun 4, 2022

AndrewSisley Jun 6, 2022

jsimnz commented May 12, 2023

feat: Fetcher field selection and optimistic filter evaluation #491

feat: Fetcher field selection and optimistic filter evaluation #491

Conversation

jsimnz commented Jun 1, 2022

RELEVANT ISSUE(S)

DESCRIPTION

HOW HAS THIS BEEN TESTED?

CHECKLIST:

ENVIRONMENT / OS THIS WAS TESTED ON?

jsimnz commented Jun 1, 2022

jsimnz commented Jun 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsimnz commented May 12, 2023