-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash on arm/arm64 with mono due to missing memory barriers #24932
Comments
@jaredpar who owns this code? |
@marek-safar this is owned by the compiler. Will let @gafter, @tmat, @VSadov comment on this. |
Making - private T _firstItem;
+ private volatile T _firstItem; |
The reusing thread will have to do Interlocked.CompareExchange to check out the instance. That is a full fence. Volatile may effect timing. It could get pretty expensive on ARM. My guess, it is a race somewhere else and making things slower “fixes” them. |
Reference write should be a “release” anyways. Interlocked is used here because of object uniqueness. If reference writes to heap are not releases, a lot of things would be broken. Roslyn assumes CLR2.0 memory model. - at least tries to. Since we test on x86/64 only, we cant be sure it there are no accidental strong ordering dependencies. It could be one of those. Or just a race which we do not see on other platforms due to timing. |
@VSadov It doesn't matter if the reusing thread does a CAS and adds a memory barrier, since the memory barrier orders only on the calling thread. The stores on the thread that released the object can still be committed in any order. |
One of the best explanation of CLR2.0 memory ordering that I know is http://joeduffyblog.com/2007/11/10/clr-20-memory-model/ It does not mention threads. In that model a fence is a fence and effects are global. It does, however, mention that the description is a practical rationalization of the actual implementation and it is unclear whether Mono provides the same guarantees. |
It still seems more likely that we just have an ordinary race somewhere... |
FYI, just to bump this, related to a compilation problem I have when running mono msbuild and Roslyn on an Raspberry ARM64, mono+Roslyn is giving the following exception:
Which seems to be related to this issue https://bugzilla.xamarin.com/show_bug.cgi?id=56546#c2 I don't see how to pass the For what is worth, dotnet 3.0 is running fine with csc, but the fact is that it is running super slow on ARM64, so the concurrency bug might not be visible with it |
FWIW - the only reasonably testable memory model at the time was CLR2.0, so Roslyn targets that. It was an explicit decision. (discussed in meetings and such). Roslyn has a lot of shared state - basically embraces it. // naked store!
// `stlr` is assumed (or `dmb ish; str`, on v7)
maybe_read_in_other_threads = new Something(); If the store-release assumption above is weakened, a good deal of codebase may need to be reviewed and carefully volatile-ized. |
Pinging this thread again as this is still a pain point for Mono CI. Most of the random CI failures I get are from CodeAnalysis on ARM, and so I end up re-running lanes a whole bunch as a result. I believe we're working on a hack on our end to minimize the issue, but wanted to see if there's any chance of this being addressed upstream. |
The Roslyn compiler, and the CoreFX libraries we depend on, are designed for the CLR memory model. Moving it to support the Mono / ECMA memory model is a non-trivial task. It's on the order of one to two months to get it done. This is not to say we should or should not do it, just that the work is much more significant than this thread makes it out to be. It's the equivalent of a medium sized language feature. Stepping back though I'm not sure changing Roslyn is the right approach here given our .NET 5 plans. The idea is to make it so customers can use Mono as a runtime for their .NET applications. Are we suggesting that all customers need to make the same memory model adjustments as Roslyn is being asked to do? Or should we be looking to get Mono in line with the CLR memory model here? |
I am not sure it is related to this issue but we try to use CSharpCompilation.Emit() on ARM64 and sometimes our process crash. Once at least it outputted this error:
The same code works consistently on Windows. |
Doesn't look like this issue should be open. We ended up adding barriers on the runtime side on mono. |
Version Used:
2.6.0. Doesn't matter
Steps to Reproduce:
I've seen random roslyn crashes / compilation errors with mono ever since switching to it for building the BCL (on arm/arm64 linuxes)
For reproduction you just need to build mono, but the bug is extremely evasive. I have reliably reproduced it on a scaleway.com arm64 vm and also on a compulab utilite2. I can provide access to a machine where it should be easy to reproduce.
Expected Behavior:
Don't crash
Issue:
As far as I've investigated the crashes and my small understanding of object pools, it looks like missing memory barriers here https://github.com/dotnet/roslyn/blob/master/src/Dependencies/PooledObjects/ObjectPool%601.cs#L194
The thread that owns the object and puts it back for reuse doesn't use any synchronisation primitives, so the thread that will reuse this freed object is not guaranteed to see the correct state of the freed object.
The text was updated successfully, but these errors were encountered: