Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mlayer/skiplist #1214

Closed
wants to merge 10 commits into from
Closed

Mlayer/skiplist #1214

wants to merge 10 commits into from

Conversation

ashwin95r
Copy link
Contributor

@ashwin95r ashwin95r commented Jul 19, 2017

Initial PR to see skiplist performance in mlayer

Current:
$ go test --bench AddMut
BenchmarkAddMutations-4   	  200000	      7683 ns/op

After change (with skiplist):
$ go test --bench AddMut
BenchmarkAddMutations-4   	  500000	      5082 ns/op

Using skiplist is ~33% faster. I'll try to benchmark this for more scenarios and update here.


This change is Reviewable

@manishrjain
Copy link
Contributor

Review status: 0 of 3 files reviewed at latest revision, 7 unresolved discussions, some commit checks broke.


posting/list.go, line 35 at r1 (raw file):

ryszard/goskiplist

Badger has a skiplist. Can't you use that? We've already optimized it.


posting/list.go, line 73 at r1 (raw file):

	ghash         uint64
	plist         *protos.PostingList
	mlayer        *skiplist.SkipList //[]*protos.Posting // mutations

If you're modifying things, you don't need to keep the old code


posting/list.go, line 74 at r1 (raw file):

	plist         *protos.PostingList
	mlayer        *skiplist.SkipList //[]*protos.Posting // mutations
	len           int

len is a reserved keyword for an internal function.


posting/list.go, line 368 at r1 (raw file):

			if mpost.Op == Del {
				// Undo old post.
				l.len--

Is it expensive to get length from the skiplist? If so, we can modify the skiplist (ours) to keep track of length.


posting/list.go, line 376 at r1 (raw file):

		} else if oldPost.Op == Del {
			if mpost.Op == Set {
				l.len++

What? Why?


posting/list.go, line 409 at r1 (raw file):

	} else {
		if psame { // mpost.Op==Del
			l.len--

I'm not liking this length tracking at all. Let's switch the skiplist impl, and get rid of this len stuff.


posting/list.go, line 628 at r1 (raw file):

}

func (l *List) lengthEst() int {

Why is this an estimate?


Comments from Reviewable

@manishrjain
Copy link
Contributor

This would also impact the memory usage significantly. Can you benchmark memory allocations as well?


Review status: 0 of 3 files reviewed at latest revision, 7 unresolved discussions, some commit checks broke.


Comments from Reviewable

@ashwin95r
Copy link
Contributor Author

With skiplist:
$ go test --benchmem --bench AddMut 
BenchmarkAddMutations-4   	  200000	      5125 ns/op	     765 B/op	       6 allocs/op

With slice:
$ go test --benchmem --bench AddMut 
BenchmarkAddMutations-4   	  300000	     20577 ns/op	     378 B/op	       3 allocs/op

Memory usage is ~2x


Review status: 0 of 3 files reviewed at latest revision, 8 unresolved discussions.


posting/list.go, line 35 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

ryszard/goskiplist

Badger has a skiplist. Can't you use that? We've already optimized it.

I had a look but that library was too specific to Badger, so I used this library for a quick check as the API was simple.


posting/list.go, line 73 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

If you're modifying things, you don't need to keep the old code

Done.


posting/list.go, line 368 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Is it expensive to get length from the skiplist? If so, we can modify the skiplist (ours) to keep track of length.

It's not the length of the skiplist but the length of the posting list (the mlayer has add and del ops which affect the lenght). We iterate through the mlayer to see how many add/del posting which is unnecessary. We can track it when an update happens which is this len.


posting/list.go, line 376 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

What? Why?

As mentioned, this is to track the total length based on the ops.


posting/list.go, line 409 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

I'm not liking this length tracking at all. Let's switch the skiplist impl, and get rid of this len stuff.

The skiplist already has a Len() but we need to know the contents (hence iterate) to calculate our length. So, this can't be passed on to the skiplist lib.


posting/list.go, line 628 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

Why is this an estimate?

Since we don't look at the actual ops in mlayer, this is approximate. At some places, this approximate length is good enough.


Comments from Reviewable

@ashwin95r
Copy link
Contributor Author

Review status: 0 of 3 files reviewed at latest revision, 7 unresolved discussions, some commit checks broke.


posting/list.go, line 35 at r1 (raw file):

Previously, ashwin95r (Ashwin Ramesh) wrote…

I had a look but that library was too specific to Badger, so I used this library for a quick check as the API was simple.

Also badger skl doesn't have Delete() Api


Comments from Reviewable

@ashwin95r
Copy link
Contributor Author

Review status: 0 of 3 files reviewed at latest revision, 7 unresolved discussions, some commit checks broke.


posting/list.go, line 74 at r1 (raw file):

Previously, manishrjain (Manish R Jain) wrote…

len is a reserved keyword for an internal function.

Done.


Comments from Reviewable

@ashwin95r
Copy link
Contributor Author

Since we're going to shard the PLs and flush them if the size of the mutation layer reaches a threshold, skiplists won't be of much benefit (as it only outperforms insertions on larger lists). So closing this PR.

@ashwin95r ashwin95r closed this Jul 24, 2017
@pawanrawal pawanrawal deleted the mlayer/skiplist branch December 19, 2017 08:41
jarifibrahim pushed a commit that referenced this pull request Mar 16, 2020
Important changes
```
 - Changes to overlap check in compaction.
 - Remove 'this entry should've been caught' log.
 - Changes to write stalling on levels 0 and 1.
 - Compression is disabled by default in Badger.
 - Bloom filter caching in a separate ristretto cache.
 - Compression/Encryption in background.
 - Disable cache by default in badger.
```

The following new changes are being added from badger
`git log ab4352b00a17...91c31ebe8c22`

```
91c31eb Disable cache by default (#1257)
eaf64c0 Add separate cache for bloom filters (#1260)
1bcbefc Add BypassDirLock option (#1243)
c6c1e5e Add support for watching nil prefix in subscribe API (#1246)
b13b927 Compress/Encrypt Blocks in the background (#1227)
bdb2b13 fix changelog for v2.0.2 (#1244)
8dbc982 Add Dkron to README (#1241)
3d95b94 Remove coveralls from Travis Build(#1219)
5b4c0a6 Fix ValueThreshold for in-memory mode (#1235)
617ed7c Initialize vlog before starting compactions in db.Open (#1226)
e908818 Update CHANGELOG for Badger 2.0.2 release. (#1230)
bce069c Fix int overflow for 32bit (#1216)
e029e93 Remove ExampleDB_Subscribe Test (#1214)
8734e3a Add missing package to README for badger.NewEntry (#1223)
78d405a Replace t.Fatal with require.NoError in tests (#1213)
c51748e Fix flaky TestPageBufferReader2 test (#1210)
eee1602 Change else-if statements to idiomatic switch statements. (#1207)
3e25d77 Rework concurrency semantics of valueLog.maxFid (#1184) (#1187)
4676ca9 Add support for caching bloomfilters (#1204)
c3333a5 Disable compression and set ZSTD Compression Level to 1 (#1191)
0acb3f6 Fix L0/L1 stall test (#1201)
7e5a956 Support disabling the cache completely. (#1183) (#1185)
82381ac Update ristretto to version  8f368f2 (#1195)
3747be5 Improve write stalling on level 0 and 1
5870b7b Run all tests on CI (#1189)
01a00cb Add Jaegar to list of projects (#1192)
9d6512b Use fastRand instead of locked-rand in skiplist (#1173)
2698bfc Avoid sync in inmemory mode (#1190)
2a90c66 Remove the 'this entry should've caught' log from value.go (#1170)
0a06173 Fix checkOverlap in compaction (#1166)
0f2e629 Fix windows build (#1177)
03af216 Fix commit sha for WithInMemory in CHANGELOG. (#1172)
23a73cd Update CHANGELOG for v2.0.1 release. (#1181)
465f28a Cast sz to uint32 to fix compilation on 32 bit (#1175)
ea01d38 Rename option builder from WithInmemory to WithInMemory. (#1169)
df99253 Remove ErrGCInMemoryMode in CHANGELOG. (#1171)
8dfdd6d Adding changes for 2.0.1 so far (#1168)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants