-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect frame allocation for deeply nested builder operations (was Assert hit with clang 3.4.1 on 32bits plateforms) #41
Comments
I will have a look - I need a VM to set up 32-bit - meanwhile it might be as simple as incorrect use of API, but if so, the same error should appear on 64-bits. The following functions can be used during buffer construction to print a trace of how the frame is being built. It tracks the nesting level and the frame type. lldb can also be used. https://github.com/dvidelabs/flatcc/blob/master/include/flatcc/flatcc_builder.h#L537 |
As a preliminary explanation - but we still need to figure out the difference for 32-bits: info_push_start means that you start the member table of the vector. Since info is a vector of status tables, the status table has already been opened, so calling status_start is not needed, nor correct.
This is later followed by this code which reverts the order of push_end and status end:
If you switched the above statements, you would end the inner status and it would just be garbage, and the push_end would add an empty status object. NOTE: this is from a quick read of the source, and I do not recall all details of the API, so I might be mistaken - but this is what pops out right now. If you can help identify why 32-bit behaves differently it would be a great help - certainly something I need to figure out. Also, the project could need better documentation - mkdocs with several examples - if anyone feels like taking this up, it might help prevent such issues in the future. |
The above is not entirely correct, I missed a level - so the push_end order is not incorrect, but it misses the end to match my_ns_info_status_start which should probably just be removed. |
The API is flexible but this causes som confusion:
or |
Thanks for your support! it seems we correctly use the second mentioned version? I added some verbose using the functions you suggested.
my_ns_status_attr_infos_push_end(&builder) makes the assertion Removing "TEST NAME" gives this result:
Does that help? Maybe there is something wrong with the use of flatbuffers_string_create_str? |
Added numbers to frame types to help make sense
|
Edit - removed this - incorrect analysis. The last trace looks ok. my_ns_attr_infos_value_add(&builder, flatbuffers_string_create_str(&builder, "TEST VALUE")) - Lvl = 8 - Type = 3 I think this could be simplified with
|
Thanks for your explanation! |
It's too early - I was confusing my self - not sure if you saw that I just deleted a large dissection. The create_str looks weird, but haven't looked. |
This looks weird and could be a bug - which could explain different behavior between 32-bit and 64-bit. The type should not be -1 - could be an error code of sorts. my_ns_attr_infos_value_add(&builder, flatbuffers_string_create_str(&builder, "TEST VALUE")) - Lvl = 8 - Type = 65535 |
I tried to build on MacOS 64-bit system with clang -m32 - and I did not get any assertions. I suspect it might have to do with reallocation of the stack triggering some bug. |
I'll get back - need to debug this carefully. |
OK, I have yet not found the root cause, but I think I know why 32-bit is different:
We have already seen the type=65535 and this should trigger an assertion. So the assumption is that the 32-bit build is using the debug library |
Still no final solution, but I strongly suspect there is a problem in detecting when and how much memory to allocate for additional frames. The default allocation makes space for 8 frames which is exactly where we are, and depending on how you debug, clear_buffer can crash when it attempts to deallocate the frame allocator (buffer index 4) suggestion earlier corruption in the frame. This would also explain why the frame type suddenly goes bad. The string is likely just a random trigger for a corrupt memory condition.
|
@epouponstormshield fixed, I hope, please confirm! - change has been committed to master.
The argument The following should fix the issue:
and you can also patch 0.4.1 instead if you prefer not to run on master, or more hacky, you could bump the default allocator to a larger value than 8 times frame_size to postpone the problem. |
Are you sure it is not something like (B->level - 1) * frame_size? KO: OK: |
Yes, that was a typo in comment, I already edited it. Should be correct in source. |
I will release 0.4.2 if and when the fix is confirmed to resolve the issue. |
Sure, I am trying your fix in a more complete environment in order to confirm. |
I confirm the problem is now fixed :) |
And thanks for a great report. This is a very important bug to get fixed. |
0.4.2 has been released. |
I think it would make sense to extend the existing tests to cover such a use case. |
I was thinking about it - and yes it does make sense up to tiggering the first reallocation. If you feel like it, feel free to contribute. Issue #4 could also use such a test case - here the problem was not triggering allocation at all. |
Because such a test would likely need a separate schema, the load_test project could be used as a template rather than extending the monster_test test cases. https://github.com/dvidelabs/flatcc/tree/master/test/load_test Note the custom build targets in https://github.com/dvidelabs/flatcc/blob/master/test/load_test/CMakeLists.txt |
A test is tricky, though, because the default allocator uses an arbitrary limit of 8 frames that might easily be changed, voiding the test case, so at least a source comment is needed, or the use of a custom allocator with know limits.
|
Hello,
We hit an annoying issue using clang 3.4.1 on FreeBSD 10.3, but only on 32bits systems (it works fine with the same version of clang on 64bits systems)
Please find attached a schema and a C code sample that triggers the problem.
Steps to reproduce:
If I add only "TEST NAME" or "TEST VALUE", it works as expected.
Maybe I do something wrong, but it is strange since it works fine on 64bit arch?
Regards
example.zip
The text was updated successfully, but these errors were encountered: