Skip to content

Commit

Permalink
Add random read benchmarks for sst tables (#862)
Browse files Browse the repository at this point in the history
* Add random read benchmarks for sst tables
* Change table size in table benchmarks to 5 * 1e6
* Change table loading mode to `LoadToRAM` in table benchmarks
* Update table/README to have latest benchmarks result.
  • Loading branch information
ashish-goswami authored Jun 18, 2019
1 parent fa0679c commit 88799d3
Show file tree
Hide file tree
Showing 2 changed files with 105 additions and 67 deletions.
86 changes: 52 additions & 34 deletions table/README.md
Original file line number Diff line number Diff line change
@@ -1,51 +1,69 @@
# BenchmarkRead
Size of table is 127,618,890 bytes for all benchmarks.

# BenchmarkRead
```
$ go test -bench Read$ -count 3
Size of table: 105843444
BenchmarkRead-8 3 343846914 ns/op
BenchmarkRead-8 3 351790907 ns/op
BenchmarkRead-8 3 351762823 ns/op
$ go test -bench ^BenchmarkRead$ -run ^$ -count 3
goos: linux
goarch: amd64
pkg: github.com/dgraph-io/badger/table
BenchmarkRead-16 10 153281932 ns/op
BenchmarkRead-16 10 153454443 ns/op
BenchmarkRead-16 10 155349696 ns/op
PASS
ok github.com/dgraph-io/badger/table 23.549s
```

Size of table is 105,843,444 bytes, which is ~101M.

The rate is ~287M/s which matches our read speed. This is using mmap.
Size of table is 127,618,890 bytes, which is ~122MB.

To read a 64M table, this would take ~0.22s, which is negligible.
The rate is ~783MB/s using LoadToRAM (when table is in RAM).

```
$ go test -bench BenchmarkReadAndBuild -count 3
To read a 64MB table, this would take ~0.0817s, which is negligible.

BenchmarkReadAndBuild-8 1 2341034225 ns/op
BenchmarkReadAndBuild-8 1 2346349671 ns/op
BenchmarkReadAndBuild-8 1 2364064576 ns/op
# BenchmarkReadAndBuild
```go
$ go test -bench BenchmarkReadAndBuild -run ^$ -count 3
goos: linux
goarch: amd64
pkg: github.com/dgraph-io/badger/table
BenchmarkReadAndBuild-16 2 945041628 ns/op
BenchmarkReadAndBuild-16 2 947120893 ns/op
BenchmarkReadAndBuild-16 2 954909506 ns/op
PASS
ok github.com/dgraph-io/badger/table 26.856s
```

The rate is ~43M/s. To build a ~64M table, this would take ~1.5s. Note that this
The rate is ~127MB/s. To build a 64MB table, this would take ~0.5s. Note that this
does NOT include the flushing of the table to disk. All we are doing above is
to read one table (mmaped) and write one table in memory.
reading one table (which is in RAM) and write one table in memory.

The table building takes 0.5-0.0817s ~ 0.4183s.

The table building takes 1.5-0.22 ~ 1.3s.
# BenchmarkReadMerged
Below, we merge 5 tables. The total size remains unchanged at ~122M.

If we are writing out up to 10 tables, this would take 1.5*10 ~ 15s, and ~13s
is spent building the tables.
```go
$ go test -bench ReadMerged -run ^$ -count 3
BenchmarkReadMerged-16 2 954475788 ns/op
BenchmarkReadMerged-16 2 955252462 ns/op
BenchmarkReadMerged-16 2 956857353 ns/op
PASS
ok github.com/dgraph-io/badger/table 33.327s
```

When running populate, building one table in memory tends to take ~1.5s to ~2.5s
on my system. Where does this overhead come from? Let's investigate the merging.
The rate is ~127MB/s. To read a 64MB table using merge iterator, this would take ~0.5s.

Below, we merge 5 tables. The total size remains unchanged at ~101M.
# BenchmarkRandomRead

```
$ go test -bench ReadMerged -count 3
BenchmarkReadMerged-8 1 1321190264 ns/op
BenchmarkReadMerged-8 1 1296958737 ns/op
BenchmarkReadMerged-8 1 1314381178 ns/op
```go
go test -bench BenchmarkRandomRead$ -run ^$ -count 3
goos: linux
goarch: amd64
pkg: github.com/dgraph-io/badger/table
BenchmarkRandomRead-16 300000 3596 ns/op
BenchmarkRandomRead-16 300000 3621 ns/op
BenchmarkRandomRead-16 300000 3596 ns/op
PASS
ok github.com/dgraph-io/badger/table 44.727s
```

The rate is ~76M/s. To build a 64M table, this would take ~0.84s. The writing
takes ~1.3s as we saw above. So in total, we expect around 0.84+1.3 ~ 2.1s.
This roughly matches what we observe when running populate. There might be
some additional overhead due to the concurrent writes going on, in flushing the
table to disk. Also, the tables tend to be slightly bigger than 64M/s.
For random read benchmarking, we are randomly reading a key and verifying its value.
86 changes: 53 additions & 33 deletions table/table_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@
package table

import (
"bytes"
"fmt"
"math/rand"
"os"
"sort"
"testing"
"time"

"github.com/dgraph-io/badger/options"
"github.com/dgraph-io/badger/y"
Expand Down Expand Up @@ -619,28 +621,14 @@ func TestMergingIteratorTakeTwo(t *testing.T) {
require.EqualValues(t, "a2", string(vs.Value))
require.EqualValues(t, 'A', vs.Meta)
it.Next()

require.False(t, it.Valid())
}

func BenchmarkRead(b *testing.B) {
n := 5 << 20
builder := NewTableBuilder()
filename := fmt.Sprintf("%s%s%d.sst", os.TempDir(), string(os.PathSeparator), rand.Int63())
f, err := y.OpenSyncedFile(filename, true)
y.Check(err)
for i := 0; i < n; i++ {
k := fmt.Sprintf("%016x", i)
v := fmt.Sprintf("%d", i)
y.Check(builder.Add([]byte(k), y.ValueStruct{Value: []byte(v), Meta: 123, UserMeta: 0}))
}

f.Write(builder.Finish())
tbl, err := OpenTable(f, options.MemoryMap, nil)
y.Check(err)
n := int(5 * 1e6)
tbl := getTableForBenchmarks(b, n)
defer tbl.DecrRef()

// y.Printf("Size of table: %d\n", tbl.Size())
b.ResetTimer()
// Iterate b.N times over the entire table.
for i := 0; i < b.N; i++ {
Expand All @@ -654,23 +642,10 @@ func BenchmarkRead(b *testing.B) {
}

func BenchmarkReadAndBuild(b *testing.B) {
n := 5 << 20
builder := NewTableBuilder()
filename := fmt.Sprintf("%s%s%d.sst", os.TempDir(), string(os.PathSeparator), rand.Int63())
f, err := y.OpenSyncedFile(filename, true)
y.Check(err)
for i := 0; i < n; i++ {
k := fmt.Sprintf("%016x", i)
v := fmt.Sprintf("%d", i)
y.Check(builder.Add([]byte(k), y.ValueStruct{Value: []byte(v), Meta: 123, UserMeta: 0}))
}

f.Write(builder.Finish())
tbl, err := OpenTable(f, options.MemoryMap, nil)
y.Check(err)
n := int(5 * 1e6)
tbl := getTableForBenchmarks(b, n)
defer tbl.DecrRef()

// y.Printf("Size of table: %d\n", tbl.Size())
b.ResetTimer()
// Iterate b.N times over the entire table.
for i := 0; i < b.N; i++ {
Expand All @@ -688,7 +663,7 @@ func BenchmarkReadAndBuild(b *testing.B) {
}

func BenchmarkReadMerged(b *testing.B) {
n := 5 << 20
n := int(5 * 1e6)
m := 5 // Number of tables.
y.AssertTrue((n % m) == 0)
tableSize := n / m
Expand All @@ -706,7 +681,7 @@ func BenchmarkReadMerged(b *testing.B) {
y.Check(builder.Add([]byte(k), y.ValueStruct{Value: []byte(v), Meta: 123, UserMeta: 0}))
}
f.Write(builder.Finish())
tbl, err := OpenTable(f, options.MemoryMap, nil)
tbl, err := OpenTable(f, options.LoadToRAM, nil)
y.Check(err)
tables = append(tables, tbl)
defer tbl.DecrRef()
Expand All @@ -727,3 +702,48 @@ func BenchmarkReadMerged(b *testing.B) {
}()
}
}

func BenchmarkRandomRead(b *testing.B) {
n := int(5 * 1e6)
tbl := getTableForBenchmarks(b, n)
defer tbl.DecrRef()

r := rand.New(rand.NewSource(time.Now().Unix()))

b.ResetTimer()
for i := 0; i < b.N; i++ {
itr := tbl.NewIterator(false)
no := r.Intn(n)
k := []byte(fmt.Sprintf("%016x", no))
v := []byte(fmt.Sprintf("%d", no))
itr.Seek(k)
if !itr.Valid() {
b.Fatal("itr should be valid")
}
v1 := itr.Value().Value

if !bytes.Equal(v, v1) {
fmt.Println("value does not match")
b.Fatal()
}
itr.Close()
}
}

func getTableForBenchmarks(b *testing.B, count int) *Table {
rand.Seed(time.Now().Unix())
builder := NewTableBuilder()
filename := fmt.Sprintf("%s%s%d.sst", os.TempDir(), string(os.PathSeparator), rand.Int63())
f, err := y.OpenSyncedFile(filename, true)
require.NoError(b, err)
for i := 0; i < count; i++ {
k := fmt.Sprintf("%016x", i)
v := fmt.Sprintf("%d", i)
y.Check(builder.Add([]byte(k), y.ValueStruct{Value: []byte(v)}))
}

f.Write(builder.Finish())
tbl, err := OpenTable(f, options.LoadToRAM, nil)
require.NoError(b, err, "unable to open table")
return tbl
}

0 comments on commit 88799d3

Please sign in to comment.