-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Roll up exceeded value size and alpha crash #4752
Comments
Today i try to not bulk load first, just import data through api, it works fine. So, is this problem caused by the bulk load data? |
Thanks for the update. It looks like the list is too long and becomes too big to be rolled up. I am not sure where the error is coming from but I am assuming it's when writing data. I think the fact that the issue goes away when you don't use the bulk loader supports this theory because splits are not done when loading data using the bulk loader. Just for reference. Splitting a posting list means writing a posting list using multiple key-value pairs in badger. It's meant to prevent the type of issues that arise when a posting list becomes too big. |
@JimWen Can we get a copy of your dataset to reproduce this issue? Feel free to email me privately (see my GitHub profile email). |
Thang you @martinmr 1.When importing data through api after bulk loading, the log is as followings I0205 17:05:38.048405 258429 log.go:34] Rolling up Time elapsed: 01m51s, bytes sent: 3.1 GB, speed: 28 MB/sec 2.When impoting data just through api, the log is as followings I0211 08:55:50.262319 26603 log.go:34] Rolling up Created batch of size: 4.3 MB in 267.724618ms. It seems that 1 rollup bytes sent size is much bigger than 2, so it exceeded, why this happens? |
Sorry, i'm afraid the data can't be provided, the data is the confidential data of company |
I have changed to use liveload to load init data, everything works fine util now, why not try to support spliting in bulkload? It seems that bulkload would bring much problem with huge dataset, but the only problem of liveload is too slow.@martinmr @danielmai |
@JimWen Yes, bulk loader should be able to handle big lists. I'll open a feature request. As with regards to the rollup size differences between the bulk loader and the live loader, it makes sense. You start your cluster after running the bulk loader so all your data is already loaded. With the live loader, the data is being loaded as it's rolled up so the size of the lists is smaller. |
What version of Dgraph are you using?
Have you tried reproducing the issue with the latest release?
yes
What is the hardware spec (RAM, OS)?
128G mem & 1.8T SSD
Linux version 3.10.0-1062.9.1.el7.x86_64 ([email protected]) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-39) (GCC) ) #1 SMP Fri Dec 6 15:49:49 UTC 2019
Steps to reproduce the issue (command/config used to run Dgraph).
same as issue #4733 with dgraph
Expected behaviour and actual result.
Value exceeding is still there, the cluster did't hung any more but one alpha crash after about 30 minutes. the log is as followings:
Something important
I'm wondering what the value that exceeded actually is and what the fix release split is.
The edges here grow very soon, about 1 billion every day and when i check the source code, i found that rolling up is set a fixed time interval of 5 minutes, is it reasonable here?
The text was updated successfully, but these errors were encountered: