Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect frame allocation for deeply nested builder operations (was Assert hit with clang 3.4.1 on 32bits plateforms) #41

Closed
epouponstormshield opened this issue May 12, 2017 · 26 comments
Labels

Comments

@epouponstormshield
Copy link

Hello,

We hit an annoying issue using clang 3.4.1 on FreeBSD 10.3, but only on 32bits systems (it works fine with the same version of clang on 64bits systems)

Please find attached a schema and a C code sample that triggers the problem.

Steps to reproduce:

$ flatcc my_ns.fbs -cwv --outfile=./my_ns.fbs.h

$ cc -Wall -std=c99 test.c -o test -Iflatcc-0.4.1//include/ -DFLATCC_REFLECTION=0 -DFLATCC_PORTABLE flatcc-0.4.1/lib/libflatccrt.a

$ ./test
Assertion failed: ((B->frame[0].type) == flatcc_builder_table), function flatcc_builder_check_required, file <...>/flatcc-0.4.1/src/runtime/builder.c, line 1242.
Abort trap (core dumped)

If I add only "TEST NAME" or "TEST VALUE", it works as expected.
Maybe I do something wrong, but it is strange since it works fine on 64bit arch?

Regards

example.zip

@mikkelfj
Copy link
Contributor

mikkelfj commented May 12, 2017

I will have a look - I need a VM to set up 32-bit - meanwhile it might be as simple as incorrect use of API, but if so, the same error should appear on 64-bits.

The following functions can be used during buffer construction to print a trace of how the frame is being built. It tracks the nesting level and the frame type. lldb can also be used.

https://github.com/dvidelabs/flatcc/blob/master/include/flatcc/flatcc_builder.h#L537
https://github.com/dvidelabs/flatcc/blob/master/include/flatcc/flatcc_builder.h#L801
https://github.com/dvidelabs/flatcc/blob/master/include/flatcc/flatcc_builder.h#L782

@mikkelfj
Copy link
Contributor

mikkelfj commented May 12, 2017

As a preliminary explanation - but we still need to figure out the difference for 32-bits:

info_push_start means that you start the member table of the vector. Since info is a vector of status tables, the status table has already been opened, so calling status_start is not needed, nor correct.

	my_ns_infovec_info_push_start(&builder);

	my_ns_info_status_start(&builder);

This is later followed by this code which reverts the order of push_end and status end:

		my_ns_info_status_push_end(&builder);
	my_ns_info_status_end(&builder);

If you switched the above statements, you would end the inner status and it would just be garbage, and the push_end would add an empty status object.

NOTE: this is from a quick read of the source, and I do not recall all details of the API, so I might be mistaken - but this is what pops out right now. If you can help identify why 32-bit behaves differently it would be a great help - certainly something I need to figure out.

Also, the project could need better documentation - mkdocs with several examples - if anyone feels like taking this up, it might help prevent such issues in the future.

@mikkelfj
Copy link
Contributor

mikkelfj commented May 12, 2017

The above is not entirely correct, I missed a level - so the push_end order is not incorrect, but it misses the end to match my_ns_info_status_start which should probably just be removed.

@mikkelfj
Copy link
Contributor

The API is flexible but this causes som confusion:

vector_start, table_start, table_add,, ..., ref = table_end, vector_push(ref), vector_end

or
vector_start, vector_push_start, table_add,..., vector_push_end, ..., vector_end

@epouponstormshield
Copy link
Author

epouponstormshield commented May 12, 2017

Thanks for your support!

it seems we correctly use the second mentioned version?

I added some verbose using the functions you suggested.
Here is what I get (the values are printed just after each call)

flatcc_builder_init(&builder) - Lvl = 0 - Type = 0
my_ns_infovec_start_as_root(&builder) - Lvl = 2 - Type = 3
my_ns_infovec_info_start(&builder) - Lvl = 3 - Type = 5
my_ns_infovec_info_push_start(&builder) - Lvl = 4 - Type = 3
my_ns_info_status_start(&builder) - Lvl = 5 - Type = 5
my_ns_info_status_push_start(&builder) - Lvl = 6 - Type = 3
my_ns_status_value_add(&builder, 50) - Lvl = 6 - Type = 3
my_ns_status_attr_infos_start(&builder) - Lvl = 7 - Type = 5
my_ns_status_attr_infos_push_start(&builder) - Lvl = 8 - Type = 3
my_ns_attr_infos_name_add(&builder, flatbuffers_string_create_str(&builder, "TEST NAME")) - Lvl = 8 - Type = 3
my_ns_attr_infos_value_add(&builder, flatbuffers_string_create_str(&builder, "TEST VALUE")) - Lvl = 8 - Type = 65535
Assertion failed: ((B->frame[0].type) == flatcc_builder_table), function flatcc_builder_check_required, file <...>/flatcc-0.4.1/src/runtime/builder.c, line 1242.
Abort trap (core dumped)

my_ns_status_attr_infos_push_end(&builder) makes the assertion

Removing "TEST NAME" gives this result:

flatcc_builder_init(&builder) - Lvl = 0 - Type = 0
my_ns_infovec_start_as_root(&builder) - Lvl = 2 - Type = 3
my_ns_infovec_info_start(&builder) - Lvl = 3 - Type = 5
my_ns_infovec_info_push_start(&builder) - Lvl = 4 - Type = 3
my_ns_info_status_start(&builder) - Lvl = 5 - Type = 5
my_ns_info_status_push_start(&builder) - Lvl = 6 - Type = 3
my_ns_status_value_add(&builder, 50) - Lvl = 6 - Type = 3
my_ns_status_attr_infos_start(&builder) - Lvl = 7 - Type = 5
my_ns_status_attr_infos_push_start(&builder) - Lvl = 8 - Type = 3
my_ns_attr_infos_value_add(&builder, flatbuffers_string_create_str(&builder, "TEST VALUE")) - Lvl = 8 - Type = 3
my_ns_status_attr_infos_push_end(&builder) - Lvl = 7 - Type = 5
my_ns_status_attr_infos_end(&builder) - Lvl = 6 - Type = 3
my_ns_info_status_push_end(&builder) - Lvl = 5 - Type = 5
my_ns_info_status_end(&builder) - Lvl = 4 - Type = 3
my_ns_infovec_info_push_end(&builder) - Lvl = 3 - Type = 5
my_ns_infovec_info_end(&builder) - Lvl = 2 - Type = 3
my_ns_info_end_as_root(&builder) - Lvl = 0 - Type = 0

Does that help? Maybe there is something wrong with the use of flatbuffers_string_create_str?

@mikkelfj
Copy link
Contributor

mikkelfj commented May 12, 2017

Added numbers to frame types to help make sense

enum flatcc_builder_type {
    flatcc_builder_empty = 0,
    flatcc_builder_buffer = 1,
    flatcc_builder_struct = 2,
    flatcc_builder_table = 3,
    flatcc_builder_vector = 4,
    flatcc_builder_offset_vector = 5,
    flatcc_builder_string = 6
};

@mikkelfj
Copy link
Contributor

mikkelfj commented May 12, 2017

Edit - removed this - incorrect analysis. The last trace looks ok.

my_ns_attr_infos_value_add(&builder, flatbuffers_string_create_str(&builder, "TEST VALUE")) - Lvl = 8 - Type = 3

I think this could be simplified with

my_ns_attr_infos_value_add_str(&builder, "TEST VALUE")) - Lvl = 8 - Type = 3

@epouponstormshield
Copy link
Author

Thanks for your explanation!
To sum up the problem: is it related to flatcc or is it related to an incorrect use of the API?

@mikkelfj
Copy link
Contributor

It's too early - I was confusing my self - not sure if you saw that I just deleted a large dissection. The create_str looks weird, but haven't looked.

@mikkelfj
Copy link
Contributor

This looks weird and could be a bug - which could explain different behavior between 32-bit and 64-bit. The type should not be -1 - could be an error code of sorts.

my_ns_attr_infos_value_add(&builder, flatbuffers_string_create_str(&builder, "TEST VALUE")) - Lvl = 8 - Type = 65535

@mikkelfj
Copy link
Contributor

I tried to build on MacOS 64-bit system with clang -m32 - and I did not get any assertions. I suspect it might have to do with reallocation of the stack triggering some bug.

@mikkelfj
Copy link
Contributor

I'll get back - need to debug this carefully.

@mikkelfj
Copy link
Contributor

OK, I have yet not found the root cause, but I think I know why 32-bit is different:

$ cc -Wall -std=c99 test.c -o test -Iflatcc-0.4.1//include/ -DFLATCC_REFLECTION=0 -DFLATCC_PORTABLE flatcc-0.4.1/lib/libflatccrt.a

We have already seen the type=65535 and this should trigger an assertion. So the assumption is that the 32-bit build is using the debug library libflatccrt_d.a and the 64-bit build is using the non-debug library libflatccrt.a. If you do not see type=65535 in 64-bit build, please let me know.

@mikkelfj
Copy link
Contributor

Still no final solution, but I strongly suspect there is a problem in detecting when and how much memory to allocate for additional frames. The default allocation makes space for 8 frames which is exactly where we are, and depending on how you debug, clear_buffer can crash when it attempts to deallocate the frame allocator (buffer index 4) suggestion earlier corruption in the frame. This would also explain why the frame type suddenly goes bad. The string is likely just a random trigger for a corrupt memory condition.

https://github.com/dvidelabs/flatcc/blob/master/src/runtime/builder.c#L530-L545

static int enter_frame(flatcc_builder_t *B, uint16_t align)
{
    if (++B->level > B->limit_level) {
        if (B->max_level > 0 && B->level > B->max_level) {
            return -1;
        }
        if (!(B->frame = reserve_buffer(B, flatcc_builder_alloc_fs, B->level * frame_size, frame_size, 0))) {
            return -1;
        }
        B->limit_level = (int)(B->buffers[flatcc_builder_alloc_fs].iov_len / frame_size);
        if (B->max_level > 0 && B->max_level < B->limit_level) {
            B->limit_level = B->max_level;
        }
    } else {
        ++B->frame;
    }
    frame(ds_offset) = B->ds_offset;
    frame(align) = B->align;
    B->align = align;
    /* Note: do not assume padding before first has been allocated! */
    frame(ds_first) = B->ds_first;
    frame(type_limit) = data_limit;
    B->ds_first = alignup(B->ds_first + B->ds_offset, 8);
    B->ds_offset = 0;
    return 0;
}

@mikkelfj
Copy link
Contributor

mikkelfj commented May 12, 2017

@epouponstormshield fixed, I hope, please confirm! - change has been committed to master.

if (!(B->frame = reserve_buffer(B, flatcc_builder_alloc_fs, B->level * frame_size, frame_size, 0)))

The argument B->level * frame_size to reserve_buffer is the amount of memory in use followed by the frame_size argument that is extra memory needed. The reported used memory is one frame more than the actual memory use. Because reserve_buffer returns a pointer to the next free entry, the first frame returned during initialization is frame at index 1 instead of frame at index 0. The amount of memory is correct, but one is wasted. The reallocation logic assumes there is space, but we are one frame short due to the wasted frame. This results in memory corruption.

The following should fix the issue:

if (!(B->frame = reserve_buffer(B, flatcc_builder_alloc_fs, (B->level - 1) * frame_size, frame_size, 0)))

and you can also patch 0.4.1 instead if you prefer not to run on master, or more hacky, you could bump the default allocator to a larger value than 8 times frame_size to postpone the problem.

@mikkelfj mikkelfj changed the title Assert hit with clang 3.4.1 on 32bits plateforms Incorrect frame allocation for deeply nested builder operations (was Assert hit with clang 3.4.1 on 32bits plateforms) May 12, 2017
@mikkelfj mikkelfj added the bug label May 12, 2017
@epouponstormshield
Copy link
Author

epouponstormshield commented May 12, 2017

Are you sure it is not something like (B->level - 1) * frame_size?

KO:
if (!(B->frame = reserve_buffer(B, flatcc_builder_alloc_fs, (B->level * frame_size - 1), frame_size, 0)))

OK:
if (!(B->frame = reserve_buffer(B, flatcc_builder_alloc_fs, ((B->level-1) * frame_size), frame_size, 0))) {

@mikkelfj
Copy link
Contributor

mikkelfj commented May 12, 2017

Yes, that was a typo in comment, I already edited it. Should be correct in source.

@mikkelfj
Copy link
Contributor

I will release 0.4.2 if and when the fix is confirmed to resolve the issue.

@epouponstormshield
Copy link
Author

Sure, I am trying your fix in a more complete environment in order to confirm.
I will tell you as soon as it is good.

@epouponstormshield
Copy link
Author

I confirm the problem is now fixed :)
Thanks for your support on this issue!

@mikkelfj
Copy link
Contributor

And thanks for a great report. This is a very important bug to get fixed.

@mikkelfj
Copy link
Contributor

0.4.2 has been released.

@epouponstormshield
Copy link
Author

I think it would make sense to extend the existing tests to cover such a use case.
Regards,

@mikkelfj
Copy link
Contributor

mikkelfj commented May 15, 2017

I was thinking about it - and yes it does make sense up to tiggering the first reallocation. If you feel like it, feel free to contribute. Issue #4 could also use such a test case - here the problem was not triggering allocation at all.

@mikkelfj
Copy link
Contributor

mikkelfj commented May 15, 2017

Because such a test would likely need a separate schema, the load_test project could be used as a template rather than extending the monster_test test cases.

https://github.com/dvidelabs/flatcc/tree/master/test/load_test

Note the custom build targets in https://github.com/dvidelabs/flatcc/blob/master/test/load_test/CMakeLists.txt
apperently only works when using Make and not Ninja as build target due to some CMake bug.

@mikkelfj
Copy link
Contributor

A test is tricky, though, because the default allocator uses an arbitrary limit of 8 frames that might easily be changed, voiding the test case, so at least a source comment is needed, or the use of a custom allocator with know limits.

int flatcc_builder_default_alloc(void *alloc_context, iovec_t *b, size_t request, int zero_fill, int hint)
https://github.com/dvidelabs/flatcc/blob/master/src/runtime/builder.c#L107

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants