Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcamera and cross compiled project #4

Open
ivan-ushakov opened this issue Oct 19, 2024 · 8 comments
Open

libcamera and cross compiled project #4

ivan-ushakov opened this issue Oct 19, 2024 · 8 comments

Comments

@ivan-ushakov
Copy link

Hello,
First of all, thank you for providing this cross compile toolchains for Raspberry Pi. I use Raspberry Pi Zero (without W) and I faced with problem I cannot solve.
I followed your guide and I can successfully build and run project with libcamera. Problem happens with shared_ptr type, it has wrong number of use_count. For example, this code gives 1 in use_count but must be two:

struct CameraService::Context final
{
	Context() : camera_manager(std::make_unique<libcamera::CameraManager>()) {}

	std::unique_ptr<libcamera::CameraManager> camera_manager;

	static void camera_added(std::shared_ptr<libcamera::Camera> camera)
	{
		logger::debug(fmt::format("CameraService: camera added with use_count={:d} p={:x}", camera.use_count(),
			reinterpret_cast<size_t>(camera.get())));
	}
};

CameraService::CameraService() : _context(std::make_unique<Context>())
{
	_context->camera_manager->cameraAdded.connect(&Context::camera_added);
}

void CameraService::open()
{
	if (!_context->camera_manager->start())
	{
		throw std::runtime_error("camera manager");
	}
}

std::shared_ptr<libcamera::Camera> inside camera_added has wrong number of use count and when execution left this function it destroys libcamera::Camera object because counter became zero. It looks like shared_ptr on Linux machine used for cross compilation is not comparable with shared_ptr on device, but I don't understand how this is possible since GCC should have ABI.

I tried different version of GCC from you: 14 and 12. Problem is the same. libcamera version is the same on device and Linux machine (I use mk_sbuild)

@tttapa
Copy link
Owner

tttapa commented Oct 19, 2024

I don't have a lot of experience with libcamera, so I'm not sure if I can be of any help. However, the first thing to try is to compile everything with the address and undefined behavior sanitizers enabled.

You could also try placing a hardware watchpoint on the refcount to see where it is decremented from 2 to 1.

Also, can you be sure that the camera isn't removed from another thread? Do you actually see two destructor calls for the same camera object?

@ivan-ushakov
Copy link
Author

It looks like memory layout problem for shared_ptr type. I use toolchain GCC 12.4 on my Linux machine but Raspberry Pi Zero has GCC 12.2 and maybe this could be a reason.

For example in disassemble of camera_added I don't see shared_ptr counter manipulation methods at all:

Thread 4 "camera-bot" hit Breakpoint 3, camera::CameraService::Context::camera_added (camera=std::shared_ptr<libcamera::Camera> (use count 1, weak count -1) = {...}) at /workspaces/RaspberryCamera/src/camera_service.cpp:102
102         logger::debug(fmt::format("CameraService: camera added with use_count={:d} p={:x}", camera.use_count(),
(gdb) disassembly
Undefined command: "disassembly".  Try "help".
(gdb) disassemble
Dump of assembler code for function _ZN6camera13CameraService7Context12camera_addedESt10shared_ptrIN9libcamera6CameraEE:
=> 0x00058ec8 <+0>:  vldr  d7, [pc, #152] @ 0x58f68 <_ZN6camera13CameraService7Context12camera_addedESt10shared_ptrIN9libcamera6CameraEE+160>
   0x00058ecc <+4>:  push  {r4, lr}
   0x00058ed0 <+8>:  ldr   r3, [r0, #4]
   0x00058ed4 <+12>: sub   sp, sp, #80 @ 0x50
   0x00058ed8 <+16>: cmp   r3, #0
   0x00058edc <+20>: ldrne r3, [r3, #4]
   0x00058ee0 <+24>: mov   r2, #54  @ 0x36
   0x00058ee4 <+28>: vstr  d7, [sp, #40]  @ 0x28
   0x00058ee8 <+32>: ldr   r1, [pc, #128] @ 0x58f70 <_ZN6camera13CameraService7Context12camera_addedESt10shared_ptrIN9libcamera6CameraEE+168>
   0x00058eec <+36>: str   r3, [sp, #24]
   0x00058ef0 <+40>: str   r1, [sp, #16]
   0x00058ef4 <+44>: ldr   r3, [r0]
   0x00058ef8 <+48>: add   r1, sp, #24
   0x00058efc <+52>: str   r1, [sp, #48]  @ 0x30
   0x00058f00 <+56>: add   r4, sp, #40 @ 0x28
   0x00058f04 <+60>: str   r2, [sp, #20]
   0x00058f08 <+64>: str   r3, [sp, #32]
   0x00058f0c <+68>: ldm   r4, {r0, r1, r2, r3}
   0x00058f10 <+72>: stm   sp, {r0, r1, r2, r3}
   0x00058f14 <+76>: add   r12, sp, #16
   0x00058f18 <+80>: ldm   r12, {r1, r2}
   0x00058f1c <+84>: add   r0, sp, #56 @ 0x38
   0x00058f20 <+88>: bl 0x67d60 <_ZN3fmt3v117vformatB5cxx11ENS0_17basic_string_viewIcEENS0_17basic_format_argsINS0_7contextEEE>
   0x00058f24 <+92>: ldrd  r2, [sp, #56]  @ 0x38
   0x00058f28 <+96>: str   r2, [sp, #44]  @ 0x2c
   0x00058f2c <+100>:   str   r3, [sp, #40]  @ 0x28
   0x00058f30 <+104>:   ldm   r4, {r0, r1}
   0x00058f34 <+108>:   bl 0x59ce4 <_ZN6camera6logger5debugESt17basic_string_viewIcSt11char_traitsIcEE>
   0x00058f38 <+112>:   ldr   r0, [sp, #56]  @ 0x38
   0x00058f3c <+116>:   add   r3, sp, #64 @ 0x40
   0x00058f40 <+120>:   cmp   r0, r3
   0x00058f44 <+124>:   beq   0x58f54 <_ZN6camera13CameraService7Context12camera_addedESt10shared_ptrIN9libcamera6CameraEE+140>
   0x00058f48 <+128>:   ldr   r1, [sp, #64]  @ 0x40
   0x00058f4c <+132>:   add   r1, r1, #1
   0x00058f50 <+136>:   bl 0x1d910 <_ZdlPvj@plt>
   0x00058f54 <+140>:   add   sp, sp, #80 @ 0x50
   0x00058f58 <+144>:   pop   {r4, pc}
   0x00058f5c <+148>:   add   r0, sp, #56 @ 0x38
   0x00058f60 <+152>:   bl 0x1dc34 <_ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE10_M_disposeEv@plt>
   0x00058f64 <+156>:   bl 0x1de38 <__cxa_end_cleanup@plt>
   0x00058f68 <+160>:   andeq r0, r0, r1, lsr #32
   0x00058f6c <+164>:   andeq r0, r0, r0
   0x00058f70 <+168>:   andeq r6, r8, r4, lsl r7
End of assembler dump.

Also it shows wrong information about use count and weak count

@tttapa
Copy link
Owner

tttapa commented Oct 20, 2024

I use toolchain GCC 12.4 on my Linux machine but Raspberry Pi Zero has GCC 12.2 and maybe this could be a reason.

If there's an ABI break in shared_ptr between GCC 12.2 and 12.4, then that would be a GCC bug, so this is unlikely to be the reason.

For example in disassemble of camera_added I don't see shared_ptr counter manipulation methods at all:

That's correct: the caller is responsible for constructing and destructing the function arguments. https://itanium-cxx-abi.github.io/cxx-abi/abi.html#non-trivial-parameters

@ivan-ushakov
Copy link
Author

I guess I found problem. Could you tell me how this flag is configured (opt/x-tools/armv6-rpi-linux-gnueabihf/armv6-rpi-linux-gnueabihf/include/c++/12.4.0/armv6-rpi-linux-gnueabihf/bits/c++config.h)?

/* Defined if shared_ptr reference counting should use atomic operations. */
#define _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY 1

As I understand if some library located on my device is compiled with _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY 0 then it will use __default_lock_policy = _S_mutex and this could be the reason of strange behaviour of my application

@tttapa
Copy link
Owner

tttapa commented Oct 20, 2024

Good catch!

It looks like Debian uses _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY=1 on armhf (which appears to be GCC's default), but Raspberry Pi OS does not ...

I'll have to investigate further.

@tttapa
Copy link
Owner

tttapa commented Oct 20, 2024

The difference appears to be that Raspberry Pi OS is compiled using the armv6 architecture, whereas the ARM1176JZF-S processor used by the BCM2835-based RPis uses the armv6kz architecture (ARMv6 with ARMv6k multiprocessor and TrustZone Security Extensions). So even though the hardware supports atomic instructions, they are disabled in Raspberry Pi OS, and this explains the differences in the implementation of shared_ptr.

Since the toolchains in tttapa/docker-arm-cross-toolchain compile for the ARM1176JZF-S processor specifically, they implicitly have atomic operations enabled. The fix is simple: build for a generic ARMv6 CPU without any extensions, perhaps with -mtune=arm1176jzf-s. I hope to push a fix soon.
This will result in a performance penalty if your code uses atomic operations (such as shared_ptr reference counting), but it is the only way to ensure binary compatibility with packages compiled for Raspberry Pi OS.

@tttapa
Copy link
Owner

tttapa commented Oct 27, 2024

I've created a new release for https://github.com/tttapa/docker-arm-cross-toolchain. The new toolchains are currently being built.

@ivan-ushakov
Copy link
Author

Great! I hope to test new toolchain during next week and after that I close this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants