Roll up exceeded value size and alpha crash #4752

JimWen · 2020-02-10T09:18:46Z

What version of Dgraph are you using?

Dgraph version : v1.2.1
Dgraph SHA-256 : 3f18ff84570b2944f4d75f6f508d55d902715c7ca2310799cc2991064eb046f8
Commit SHA-1 : ddcda92
Commit timestamp : 2020-02-06 15:31:05 -0800
Branch : HEAD
Go version : go1.13.5

Have you tried reproducing the issue with the latest release?

yes

What is the hardware spec (RAM, OS)?

128G mem & 1.8T SSD

Linux version 3.10.0-1062.9.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Dec 6 15:49:49 UTC 2019

Steps to reproduce the issue (command/config used to run Dgraph).

same as issue #4733 with dgraph

Expected behaviour and actual result.

Value exceeding is still there, the cluster did't hung any more but one alpha crash after about 30 minutes. the log is as followings:

2020/02/10 16:46:34 Key not found
github.com/dgraph-io/badger/v2.init
        /tmp/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/errors.go:36
runtime.doInit
        /usr/local/go/src/runtime/proc.go:5222
runtime.doInit
        /usr/local/go/src/runtime/proc.go:5217
runtime.doInit
        /usr/local/go/src/runtime/proc.go:5217
runtime.doInit
        /usr/local/go/src/runtime/proc.go:5217
runtime.doInit
        /usr/local/go/src/runtime/proc.go:5217
runtime.main
        /usr/local/go/src/runtime/proc.go:190
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1357

github.com/dgraph-io/dgraph/x.Check
        /tmp/go/src/github.com/dgraph-io/dgraph/x/error.go:42
github.com/dgraph-io/dgraph/posting.(*List).rollup
        /tmp/go/src/github.com/dgraph-io/dgraph/posting/list.go:834
github.com/dgraph-io/dgraph/posting.(*List).Rollup
        /tmp/go/src/github.com/dgraph-io/dgraph/posting/list.go:710
github.com/dgraph-io/dgraph/worker.(*node).rollupLists.func3
        /tmp/go/src/github.com/dgraph-io/dgraph/worker/draft.go:1081
github.com/dgraph-io/badger/v2.(*Stream).produceKVs.func1
        /tmp/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/stream.go:191
github.com/dgraph-io/badger/v2.(*Stream).produceKVs
        /tmp/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/stream.go:234
github.com/dgraph-io/badger/v2.(*Stream).Orchestrate.func1
        /tmp/go/pkg/mod/github.com/dgraph-io/badger/[email protected]/stream.go:337
runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1357

Something important

I'm wondering what the value that exceeded actually is and what the fix release split is.

The edges here grow very soon, about 1 billion every day and when i check the source code, i found that rolling up is set a fixed time interval of 5 minutes, is it reasonable here？

The text was updated successfully, but these errors were encountered:

JimWen · 2020-02-11T00:23:11Z

Today i try to not bulk load first, just import data through api, it works fine. So, is this problem caused by the bulk load data？

martinmr · 2020-02-11T00:33:39Z

Thanks for the update. It looks like the list is too long and becomes too big to be rolled up. I am not sure where the error is coming from but I am assuming it's when writing data. I think the fact that the issue goes away when you don't use the bulk loader supports this theory because splits are not done when loading data using the bulk loader.

Just for reference. Splitting a posting list means writing a posting list using multiple key-value pairs in badger. It's meant to prevent the type of issues that arise when a posting list becomes too big.

danielmai · 2020-02-11T00:47:17Z

@JimWen Can we get a copy of your dataset to reproduce this issue? Feel free to email me privately (see my GitHub profile email).

JimWen · 2020-02-11T01:19:45Z

Thang you @martinmr
Yes, that's true. After bulk loading, there is about 20 billion edges, no exception happens before i start to import data through api.And i also notice something strange->

1.When importing data through api after bulk loading, the log is as followings

I0205 17:05:38.048405 258429 log.go:34] Rolling up Time elapsed: 01m51s, bytes sent: 3.1 GB, speed: 28 MB/sec
I0205 17:05:39.052383 258429 log.go:34] Rolling up Time elapsed: 01m52s, bytes sent: 3.1 GB, speed: 27 MB/sec
I0205 17:05:40.042005 258429 log.go:34] Rolling up Time elapsed: 01m53s, bytes sent: 3.1 GB, speed: 27 MB/sec
E0205 17:06:55.871480 258429 draft.go:442] Error while rolling up lists at 18061: Value with size 1439997117 exceeded 1073741823 limit. Value:
00000000 0a b7 b9 d2 ae 05 12 13 08 83 80 80 80 80 80 80 |................|
00000010 80 10 12 05 00 00 00 00 00 18 01 12 13 08 84 80 |................|
00000020 80 80 80 80 80 80 10 12 05 00 00 00 00 00 18 01 |................|
00000030 12 13 08 87 80 80 80 80 80 80 80 10 12 05 00 00 |................|
00000040 00 00 00 18 01 12 13 08 88 80 80 80 80 80 80 80 |................|
00000050 10 12 05 00 00 00 00 00 18 01 12 13 08 8b 80 80 |................|
00000060 80 80 80 80 80 10 12 05 00 00 00 00 00 18 01 12 |................|
00000070 13 08 92 80 80 80 80 80 80 80 10 12 05 00 00 00 |................|
00000080 00 00 18 01 12 13 08 95 80 80 80 80 80 80 80 10 |................|

2.When impoting data just through api, the log is as followings

I0211 08:55:50.262319 26603 log.go:34] Rolling up Created batch of size: 4.3 MB in 267.724618ms.
I0211 08:55:50.542791 26603 log.go:34] Rolling up Created batch of size: 4.5 MB in 273.896934ms.
I0211 08:55:50.786033 26603 log.go:34] Rolling up Created batch of size: 4.3 MB in 239.172574ms.
I0211 08:55:50.805872 26603 log.go:34] Rolling up Time elapsed: 08s, bytes sent: 104 MB, speed: 13 MB/sec
I0211 08:55:51.208664 26603 log.go:34] Rolling up Created batch of size: 12 MB in 66.96367ms.
I0211 08:55:51.208682 26603 log.go:34] Rolling up Sent 1125937 keys
I0211 08:55:51.226869 26603 draft.go:1102] Rolled up 1125895 keys. Done
I0211 08:55:51.226887 26603 draft.go:445] List rollup at Ts 852355: OK.

It seems that 1 rollup bytes sent size is much bigger than 2, so it exceeded, why this happens?

JimWen · 2020-02-11T01:27:58Z

@JimWen Can we get a copy of your dataset to reproduce this issue? Feel free to email me privately (see my GitHub profile email).

Sorry, i'm afraid the data can't be provided, the data is the confidential data of company

JimWen · 2020-02-17T03:06:35Z

I have changed to use liveload to load init data, everything works fine util now, why not try to support spliting in bulkload? It seems that bulkload would bring much problem with huge dataset, but the only problem of liveload is too slow.@martinmr @danielmai

martinmr · 2020-02-18T21:58:40Z

@JimWen Yes, bulk loader should be able to handle big lists. I'll open a feature request.

As with regards to the rollup size differences between the bulk loader and the live loader, it makes sense. You start your cluster after running the bulk loader so all your data is already loaded. With the live loader, the data is being loaded as it's rolled up so the size of the lists is smaller.

martinmr mentioned this issue Feb 18, 2020

Bulk loader should split up big posting lists. #4801

Closed

JimWen closed this as completed Feb 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roll up exceeded value size and alpha crash #4752

Roll up exceeded value size and alpha crash #4752

JimWen commented Feb 10, 2020 •

edited by sleto-it

Loading

JimWen commented Feb 11, 2020

martinmr commented Feb 11, 2020

danielmai commented Feb 11, 2020

JimWen commented Feb 11, 2020

JimWen commented Feb 11, 2020

JimWen commented Feb 17, 2020 •

edited

Loading

martinmr commented Feb 18, 2020

Roll up exceeded value size and alpha crash #4752

Roll up exceeded value size and alpha crash #4752

Comments

JimWen commented Feb 10, 2020 • edited by sleto-it Loading

What version of Dgraph are you using?

Have you tried reproducing the issue with the latest release?

What is the hardware spec (RAM, OS)?

Steps to reproduce the issue (command/config used to run Dgraph).

Expected behaviour and actual result.

Something important

JimWen commented Feb 11, 2020

martinmr commented Feb 11, 2020

danielmai commented Feb 11, 2020

JimWen commented Feb 11, 2020

JimWen commented Feb 11, 2020

JimWen commented Feb 17, 2020 • edited Loading

martinmr commented Feb 18, 2020

JimWen commented Feb 10, 2020 •

edited by sleto-it

Loading

JimWen commented Feb 17, 2020 •

edited

Loading