Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

splinter_bundle_build_filters: Assertion `!req->should_build[i]' failed #201

Closed
rosenhouse opened this issue Jan 12, 2022 · 3 comments · Fixed by #278
Closed

splinter_bundle_build_filters: Assertion `!req->should_build[i]' failed #201

rosenhouse opened this issue Jan 12, 2022 · 3 comments · Fixed by #278
Assignees
Labels
blocks-open-source Needs to be fixed before we go open source bug Something isn't working critical

Comments

@rosenhouse
Copy link
Member

rosenhouse commented Jan 12, 2022

Can reliably produce this assertion without any async behavior:

./bin/driver_test splinter_test --seq-perf --max-async-inflight 0 --num-insert-threads 20 --num-lookup-threads 10 --db-capacity-gib 60 --stats
./bin/driver_test: splinterdb_build_version a8566beb
Dispatch test splinter_test
fingerprint_size: 27
Running splinter_test with 1 caches
splinter_test: splinter performance test started with 1                tables
inserting   3% complete for table 0Assertion failed at src/trunk.c:3665:trunk_bundle_build_filters(): "!req->should_build[i]".

Updated (agurajada) 7.Mar.2022: Attempted to repro this with latest version of /main as of this commit:

fc49943 Alex Conway 4 hours ago Mon, 07-Mar-2022, 12:32:00 PM (Authored: Mon, 07-Mar-2022, 07:55:15 PM) Format Checks

In release build, we get this assertion:

Fusion-LocalVM:[632] $ ./bin/driver_test splinter_test --seq-perf --max-async-inflight 0 --num-insert-threads 20 --num-lookup-threads 10 --db-capacity-gib 60 --stats
./bin/driver_test: splinterdb_build_version fc49943b
Dispatch test splinter_test
fingerprint_size: 27
Running splinter_test with 1 caches
splinter_test: SplinterDB performance test started with 1 tables
inserting   9% complete for table 0 ... Assertion failed at src/trunk.c:2631:trunk_replace_bundle_branches(): 
"(pos != TRUNK_MAX_PIVOTS)". Pivot live for bundle not found in req, pos=20 != TRUNK_MAX_PIVOTS=20
Aborted (core dumped)

In debug build, at the same cut of the SHA, we get this assertion:

(gdb) run splinter_test --seq-perf --max-async-inflight 0 --num-insert-threads 20 --num-lookup-threads 10 --db-capacity-gib 60
Starting program: /home/agurajada/Code/splinterdb/bin/driver_test splinter_test --seq-perf --max-async-inflight 0 --num-insert-threads 20 --num-lookup-threads 10 --db-capacity-gib 60
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
/home/agurajada/Code/splinterdb/bin/driver_test: splinterdb_build_version fc49943b
Dispatch test splinter_test
fingerprint_size: 27
Running splinter_test with 1 caches
splinter_test: SplinterDB performance test started with 1 tables
[...]
inserting   0% complete for table 0 ... [New Thread 0x7fff78ff9700 (LWP 174832)]
inserting   8% complete for table 0 ... Assertion failed at src/trunk.c:1832:trunk_get_new_bundle(): 
"(hdr->end_bundle != hdr->start_bundle)". No available bundles in trunk node. 
page disk_addr=655360, end_bundle=10, start_bundle=10

Thread 17 "driver_test" received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff7affd700 (LWP 174828)]
Stack trace in debug build:

(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7d12859 in __GI_abort () at abort.c:79
#2 0x00007ffff7fa6828 in platform_assert_false (stream=0x7ffff7ed95c0 <IO_2_1_stderr>, filename=0x7ffff7fb0284 "src/trunk.c", linenumber=1832,
functionname=0x7ffff7fb3990 <FUNCTION.8910> "trunk_get_new_bundle", expr=0x7ffff7fb0618 "(hdr->end_bundle != hdr->start_bundle)",
message=0x7ffff7fb05c0 "No available bundles in trunk node. page disk_addr=%lu, end_bundle=%d, start_bundle=%d")
at src/platform_linux/platform.c:328
#3 0x00007ffff7f81094 in trunk_get_new_bundle (spl=0x7fffb365c040, node=0x7ffff37efc40) at src/trunk.c:1832
#4 0x00007ffff7f84c9d in trunk_memtable_incorporate (spl=0x7fffb365c040, generation=146, tid=16) at src/trunk.c:3138
#5 0x00007ffff7f8548e in trunk_memtable_flush_internal (spl=0x7fffb365c040, generation=146) at src/trunk.c:3255
#6 0x00007ffff7f854e6 in trunk_memtable_flush_internal_virtual (arg=0x7fffb36ad378, scratch=0x7fff6c000900) at src/trunk.c:3266
#7 0x00007ffff7f7aed1 in task_group_perform_one (group=0x7ffff7a69280) at src/task.c:551
#8 0x00007ffff7f7b049 in task_perform_one (ts=0x7ffff7a69040) at src/task.c:576
#9 0x00007ffff7f8df8e in trunk_insert (spl=0x7fffb365c040, key=0x7fff7affcdf0 "+y\242\020\360o", <incomplete sequence \330>, data=...)
at src/trunk.c:5809
#10 0x000055555555e207 in test_trunk_insert_thread (arg=0x5555555a96a8) at tests/functional/splinter_test.c:165
#11 0x00007ffff7f79dd5 in task_invoke_with_hooks (func_and_args=0x5555555ab400) at src/task.c:183
#12 0x00007ffff7ef2609 in start_thread (arg=) at pthread_create.c:477
#13 0x00007ffff7e0f293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb)

@gapisback
Copy link
Collaborator

Hi, @ajhconway - Just following up with this critical issue. I can repro this still on /main, and have updated the issue description with the assertions seen now.

The hope is that PR #278 may fix this one. That one seems to have passed review, but has not been merged, yet.
Any updates / plans on when those fixes will be integrated, if they are ready to go?

I'd like to re-try these repros, with that fix rebased off of /main, to see if these tests can now pass. I can then integrate them to the nightly test runs. Thanks!

@rosenhouse
Copy link
Member Author

rosenhouse commented Mar 10, 2022

Simple C reproducer for this problem, in case we can't get seq-perf to reliably pass: b203552

@gapisback
Copy link
Collaborator

The stand-alone C-unit-test that is supposed to repro this problem has been attached under this commit to this PR #278.

That last PR is going thru final rounds of code-reviews / stabilization before it hits /main.

Then, I will re-run this splinter_test --seq-perf test. But, do note that there is at least one new issue #360 opened against this test-case, reporting some test-logic errors.

That issue will probably have to be resolved before repro-attempts can be made to see if these problems repro.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocks-open-source Needs to be fixed before we go open source bug Something isn't working critical
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants