-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow Queries After Doing A Bulk Load On Nightly Build #1168
Comments
Added a profile for when these long queries happen. This one actually times out through the UI. This is hours after my data load when making this ticket. |
It finally returned in 37 seconds. This is a millisecond response on 0.7.7, so something is going on here for sure. This is after I bulk loaded data still, 2 hours after. I have also restarted the server once to try to fix the speed issues. Attached my /debug/vars at the time this is happening. |
|
We can make our GC smarter. Right now, it just starts testing if a value log needs to be rewritten every 10 mins. We need to expose some vars from Badger to be able to really figure out what to optimize there. Maybe we can allow a user to turn it off entirely. Or, reduce how many iterations it does before deciding to rewrite the file. |
So, @willcj33 : This randomness of being slow, sounds to me like it is a GC problem. And the cpu profiling indicates that well. We could just reduce the amount of GC work going on in the background, and that should be sufficient here. Is this blocking for you? Do you see this often? Also, @janardhan1993 : Didn't you change the table mapping to memory? I think that would affect this heavily. We should just set it back to load to RAM (and provide a flag to allow users to change it to memory map or nothing, with a warning that things would be slow). |
So, to summarize, I think the real change that's required here is to have LSM tree be in RAM. That would speed up both the GC effort in background and the startup slowness mentioned in #1180 . Having said that, we can also reduce the megabytes of data we comb through in value log for GC (reduce from 100 to 50 MB, @pawanrawal ). |
@willcj33 confirmed that after switching to using SSD this is not an issue anymore. @manishrjain should we still reduce the window size? |
No need. I think the main diff is load to ram.
Sent from Nexus 6P
…On Jul 16, 2017 5:18 PM, "Pawan Rawal" ***@***.***> wrote:
@willcj33 <https://github.com/willcj33> confirmed that after switching to
using SSD this is not an issue anymore.
@manishrjain <https://github.com/manishrjain> should we still reduce the
window size?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1168 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABsyNCzxXGfUZgXsnkJuNrZd0p_COMrpks5sOqhWgaJpZM4OS6c9>
.
|
Important changes ``` - Changes to overlap check in compaction. - Remove 'this entry should've been caught' log. - Changes to write stalling on levels 0 and 1. - Compression is disabled by default in Badger. - Bloom filter caching in a separate ristretto cache. - Compression/Encryption in background. - Disable cache by default in badger. ``` The following new changes are being added from badger `git log ab4352b00a17...91c31ebe8c22` ``` 91c31eb Disable cache by default (#1257) eaf64c0 Add separate cache for bloom filters (#1260) 1bcbefc Add BypassDirLock option (#1243) c6c1e5e Add support for watching nil prefix in subscribe API (#1246) b13b927 Compress/Encrypt Blocks in the background (#1227) bdb2b13 fix changelog for v2.0.2 (#1244) 8dbc982 Add Dkron to README (#1241) 3d95b94 Remove coveralls from Travis Build(#1219) 5b4c0a6 Fix ValueThreshold for in-memory mode (#1235) 617ed7c Initialize vlog before starting compactions in db.Open (#1226) e908818 Update CHANGELOG for Badger 2.0.2 release. (#1230) bce069c Fix int overflow for 32bit (#1216) e029e93 Remove ExampleDB_Subscribe Test (#1214) 8734e3a Add missing package to README for badger.NewEntry (#1223) 78d405a Replace t.Fatal with require.NoError in tests (#1213) c51748e Fix flaky TestPageBufferReader2 test (#1210) eee1602 Change else-if statements to idiomatic switch statements. (#1207) 3e25d77 Rework concurrency semantics of valueLog.maxFid (#1184) (#1187) 4676ca9 Add support for caching bloomfilters (#1204) c3333a5 Disable compression and set ZSTD Compression Level to 1 (#1191) 0acb3f6 Fix L0/L1 stall test (#1201) 7e5a956 Support disabling the cache completely. (#1183) (#1185) 82381ac Update ristretto to version 8f368f2 (#1195) 3747be5 Improve write stalling on level 0 and 1 5870b7b Run all tests on CI (#1189) 01a00cb Add Jaegar to list of projects (#1192) 9d6512b Use fastRand instead of locked-rand in skiplist (#1173) 2698bfc Avoid sync in inmemory mode (#1190) 2a90c66 Remove the 'this entry should've caught' log from value.go (#1170) 0a06173 Fix checkOverlap in compaction (#1166) 0f2e629 Fix windows build (#1177) 03af216 Fix commit sha for WithInMemory in CHANGELOG. (#1172) 23a73cd Update CHANGELOG for v2.0.1 release. (#1181) 465f28a Cast sz to uint32 to fix compilation on 32 bit (#1175) ea01d38 Rename option builder from WithInmemory to WithInMemory. (#1169) df99253 Remove ErrGCInMemoryMode in CHANGELOG. (#1171) 8dfdd6d Adding changes for 2.0.1 so far (#1168) ```
Version:
I am experiencing LONG delays (3.5-10s) on queries (usually heavily filtered ones that access a lot of related data) within hours of restoring data to the newest release of Dgraph. My CPU seems lost (< 10%) pretty much at all times. I try to restart it, and it does nothing to help. When I run it a second time (same query), What took 5s, takes 35ms (which is how fast it ran on 0.7.7). Seemingly, after a day or so of this instance running (untouched) the speeds are faster that 0.7.7, where after a restart (to ensure no caching or anything), what took 35ms on 0.7.7, takes around 8ms on master, but it seems I have to wait some random amount of time. UID queries are blazing fast always. Tagging @janardhan1993 and @pawanrawal as they are most familiar with my issue.
The text was updated successfully, but these errors were encountered: