Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken Interconnect on Clang-CL 16.0.5 #178

Open
deathkiller opened this issue Sep 30, 2023 · 9 comments
Open

Broken Interconnect on Clang-CL 16.0.5 #178

deathkiller opened this issue Sep 30, 2023 · 9 comments
Milestone

Comments

@deathkiller
Copy link

Hi!
I'm using Interconnect classes and it works good in MSVC, but it seems there is some problem in Clang-CL (16.0.5, in VS 2022 17.7.4). If I build project in Release with optimizations turned on (any of /O2, /O1, /Ox), some signals (using member functions) don't connect correctly (so no events are triggered on them). I was able to fix it by adding __attribute__((optnone)) to this line: https://github.com/mosra/corrade/blob/master/src/Corrade/Interconnect/Connection.h#L59, so it seems the compiler probably somehow breaks the SignalData constructor during optimizations. After the fix, it seems all signals work good, but I don't know why this happens and I couldn't find anything about it in the documentation. Do you have any idea? Also, I don't know which compilers are affected, I tested it only on MSVC and Clang-CL.

@mosra
Copy link
Owner

mosra commented Sep 30, 2023

Hi, thanks for the report!

For MSVC I'm adding /OPT:NOICF, which disables "identical COMDAT folding", which as far as I understand is an optimization MSVC does to fold functions with identical contents together, so they all end up having the same function pointer.

That's useful in general, especially in template-heavy code, where a lot of different template instantiations may end up being the exact same machine code, just with different types in C++. But for signals, which are distinguished from each other by the value of their function pointer, this is a problem -- let's say, Signal pressed() and Signal released() are expected to be treated differently but (apart from emit() taking their function pointer) they internally do the exact same thing so the compiler merges them into one function, leading to both being treated as the same signal.

I suppose clang-cl is doing something similar possibly, although I'm not aware of this problem on Linux or other platforms. Thus maybe clang-cl is using the MSVC linker and not its own, effectively inheriting the ICF optimization from it? So maybe using /OPT:NOICF for clang-cl as well would work too? I have a clang-cl build on the CI but it's only a Debug build so I unfortunately didn't catch this myself.

Depending on whether you use Corrade as a CMake subproject or externally installed, it'd mean expanding the logic to add /OPT:NOICF for clang-cl as well either directly here or in (your copy of) FindCorrade.cmake. Or just add it to linker flags of your project directly, for a quick test.


Just for the full picture -- something similar is happening on GCC with -Bsymbolic, where (IIRC?) a function pointer has a different value depending on whether it's called from within a shared library that defines it, or from outside (thus connecting to the signal from outside simply didn't work), and in some other case with GCC I remember I had to make the signal definitions non-inline to prevent them from having a different function pointer value in every file that included the header.

It's quite a mess frankly, each new compiler version and every improvement to any optimizer being potentially breaking to this functionality. There are other signal/slot libraries based on the same principle (taking a pointer of the signal function to map them to slots) and all of them are running into the exact same issues. Libraries that don't, such as Qt, have some meta-compiler that circumvents this, but that was exactly the extra complexity I didn't want to deal with. On the other hand, the meta-compiler has the advantage that it can assign unique contiguous indices to the signals, making their lookup significantly faster than with a hashmap.

Maybe something like embedding a compile-time-hashed value of the __PRETTY_FUNCTION__ value could robustly prevent compilers from folding the signals together, but I didn't investigate anything like that so far yet.

@mosra mosra added this to the 2023.0a milestone Sep 30, 2023
@deathkiller
Copy link
Author

Yes, I'm already using /OPT:NOICF switch for MSVC, but it seems it doesn't have any effect for clang-cl, only __attribute__((optnone)) worked for me. Turning off optimizations only for this SignalData constructor seems fine to me, but I'm not sure if it doesn't have any other consequences.

I also tried to insert OutputDebugString call inside the SignalData constructor to debug it, but with this call it started to work magically. So I'm not sure what's going on there.

I like Interconnect because I can define new events very easily. I would like to avoid taking extra steps to define an event.

@mosra
Copy link
Owner

mosra commented Sep 30, 2023

with this call it started to work magically

Yeah, I suspect that made the optimizer exclude the function from optimization / merging / folding. Same was happening for me with the broken inline behavior, adding a printf statement made the problem go away.

I'm not on Windows but I tried to reproduce the issue on the CI, and indeed the tests broke when I switched to Release. But, contrary to what you said, adding /OPT:NOICF actually fixed it, making the tests pass again. It's commit f76ac49 in the next branch, can you check it on your side?

I like Interconnect because I can define new events very easily. I would like to avoid taking extra steps to define an event.

Yes, that's exactly why I made the library, because the design seemed very easy to use in theory. But then compilers happened :D

@deathkiller
Copy link
Author

I tried it again, rebuilt everything and it still doesn't work with /OPT:NOICF. But I see linker knows about the switch, because if I change it to some non-existing /OPT:, e.g. /OPT:NOICFTEST, the linker shows error lld-link : error : /opt: unknown option: noicftest. And that also probably means that clang-cl uses its own LLVM linker.

@mosra
Copy link
Owner

mosra commented Sep 30, 2023

Ah well. Are you able to reproduce with Corrade's own tests? Enable them with CORRADE_BUILD_TESTS and run the InterconnectTest. For me the output with clang-cl and Release looked like this: https://ci.appveyor.com/project/mosra/corrade/build/next-3187/job/t5wk6f3gxvhsivbx?fullLog=true#L2538

I tried on the CI with -fuse-ld=lld now but that passed the tests too. There's version 17.7.3 but I think such minor version difference alone isn't responsible for any behavior difference.

@deathkiller
Copy link
Author

Yeah, all tests in InterconnectTest passes in next branch (emitterIdenticalSignals fails in master branch). But in my project, even non-identical signals don't work.

@mosra
Copy link
Owner

mosra commented Sep 30, 2023

Would you be able to create a minimal repro case I could look at, or change the test in a way that makes it fail too? Do you use any extra compiler or linker flags besides the CMake defaults for Release? There's also one other test that checks sending signals across DLLs, in case a variant of that scenario in particular is what could trigger this behavior.

If everything else fails, I'll go with your original optnone suggestion. But it feels like a rather imprecise hammer that could have unintended side effects, and also I'd really like to have a regression test case for this :)

@deathkiller
Copy link
Author

I discovered that if I disable exceptions completely (because I have it disabled in my project), following test starts to fail. And optnone fixes it, so it starts passing again. Interesting is that turning exceptions back on in my project doesn't help. I think clang-cl is a bit cursed. 😀 But I'm using multiple inheritance in my project too, so it may be related.

  FAIL [17] emitterMultipleInheritance() at ...\corrade-master\src\Corrade\Interconnect\Test\Test.cpp:602
        Containers mailbox.messages and (std::vector<std::string>{"hello", "<>ahoy<>"}) have different size, actual 1 but 2 expected. Actual contents:
        {hello}
        but expected
        {<>ahoy<>, hello}
        Actual hello but <>ahoy<> expected on position 0.

@LB--
Copy link
Contributor

LB-- commented Sep 30, 2023

But for signals, which are distinguished from each other by the value of their function pointer, this is a problem

Even though the C++ standard says that two pointers to the same function must compare equal, in practice this isn't possible to implement on all platforms - for example, on Windows with imported DLL functions, the first time the function is called might patch the function address, and if you take the address before and after they won't compare equal. I'm not sure how relevant this is to you, but I recommend being careful when it comes to comparing function pointers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

3 participants