Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NativeAOT] Object synchronization method was called from an unsynchronized block of code. #104340

Closed
jkotas opened this issue Jul 3, 2024 · 28 comments · Fixed by #106703
Closed
Assignees
Labels
area-System.Threading in-pr There is an active PR which will close this issue when it is merged Known Build Error Use this to report build issues in the .NET Helix tab
Milestone

Comments

@jkotas
Copy link
Member

jkotas commented Jul 3, 2024

Intermittent Object synchronization method was called from an unsynchronized block of code. exceptions.

Known Issue Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": ["Object synchronization method was called from an unsynchronized block of code.", "ObjectHeader.Release"],
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=752522
Result validation: ⚠️ Validation could not be done without an Azure DevOps build URL on the issue. Please add it to the "Build: 🔎" line.
Validation performed at: 7/3/2024 2:40:39 AM UTC

Report

Build Definition Step Name Console log Pull Request
779167 dotnet/runtime Build Tests Log #80154
778002 dotnet/runtime Build Tests Log #106403
777711 dotnet/runtime Build Tests Log #106515
776075 dotnet/runtime Build Tests Log #106418
774475 dotnet/runtime Build Tests Log #106349
774360 dotnet/runtime Build Tests Log #105030
773753 dotnet/runtime Build Tests Log #106238
771962 dotnet/runtime Build Tests Log #104393
770489 dotnet/runtime Build Tests Log #106172
770443 dotnet/runtime Build Tests Log #106167
769888 dotnet/runtime Build Tests Log #106000
767716 dotnet/runtime Build Tests Log #106040
767255 dotnet/runtime Build Tests Log #106010
767233 dotnet/runtime Build Tests Log #106014
766310 dotnet/runtime Build Tests Log #105737
765458 dotnet/runtime Build Tests Log #105920
765307 dotnet/runtime Build Tests Log #105866
764609 dotnet/runtime Build Tests Log #105595
763406 dotnet/runtime Build Tests Log #105842
763381 dotnet/runtime Build Tests Log #105840
760921 dotnet/runtime Build Tests Log #105695
760268 dotnet/runtime Build Tests Log #105695
758871 dotnet/runtime Build Tests Log #105394
756714 dotnet/runtime Build Tests Log #105308
752522 dotnet/runtime Build Tests Log #105346
752390 dotnet/runtime Build Tests Log #103752
Build Definition Test Pull Request
755415 dotnet/runtime nativeaot.SmokeTests.WorkItemExecution #105433
752703 dotnet/runtime System.Text.Tests.EncodingTests.GetEncoding_FromProvider_ByEncodingName_WithDisallowedEncoding_Throws(encodingName: "utf-7", codePage: 65000) #105170
751577 dotnet/runtime WasmTestOnChrome-ST-System.Reflection.Metadata.Tests.WorkItemExecution
750924 dotnet/runtime WasmTestOnV8-ST-System.IO.FileSystem.Tests.WorkItemExecution

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 4 30
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jul 3, 2024
@jkotas
Copy link
Member Author

jkotas commented Jul 3, 2024

Hit in #104202. Log https://helixre8s23ayyeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-pull-104202-merge-fe38091c4b294deca8/System.Threading.Channels.Tests/1/console.5ad789fe.log?helixlogtype=result :

[FAIL] System.Threading.Channels.Tests.UnboundedPrioritizedChannelTests.ManyProducerConsumer_ConcurrentReadWrite_Success(numReaders: 1, numWriters: 10)
System.AggregateException : One or more errors occurred. (Object synchronization method was called from an unsynchronized block of code.)
---- System.Threading.SynchronizationLockException : Object synchronization method was called from an unsynchronized block of code.
   at System.Threading.Tasks.Task.WaitAllCore(ReadOnlySpan`1, Int32, CancellationToken) + 0x304
   at System.Threading.Tasks.Task.WaitAll(Task[]) + 0x54
   at System.Threading.Channels.Tests.ChannelTestBase.ManyProducerConsumer_ConcurrentReadWrite_Success(Int32 numReaders, Int32 numWriters) + 0x16c
   at System.Threading.Channels!<BaseAddress>+0xaa3254

@jkotas jkotas added the Known Build Error Use this to report build issues in the .NET Helix tab label Jul 3, 2024
@jkotas
Copy link
Member Author

jkotas commented Jul 3, 2024

Hit in #104332. Log https://dev.azure.com/dnceng-public/public/_build/results?buildId=728548&view=logs&j=eb3d96e8-a17f-53b1-ecea-be64723f4b14&t=d8e271dd-8628-5dad-a374-176afa32935b&l=673 :

    Generating native code
    Generating native code
    Unhandled exception: System.AggregateException: One or more errors occurred. (Object synchronization method was called from an unsynchronized block of code.)
     ---> System.Threading.SynchronizationLockException: Object synchronization method was called from an unsynchronized block of code.
       at System.Threading.ObjectHeader.Release(Object) + 0x14a
       at System.Threading.Monitor.Exit(Object) + 0x17
       at System.Resources.ResourceReader.GetValueForNameIndex(Int32) + 0x253
       at System.Resources.ResourceReader.ResourceEnumerator.get_Value() + 0xdd
       at ILCompiler.SubstitutionProvider.AssemblyFeatureInfo..ctor(EcmaModule, Logger, IReadOnlyDictionary`2, BodyAndFieldSubstitutions) + 0x5eb
       at ILCompiler.SubstitutionProvider.FeatureSwitchHashtable.CreateValueFromKey(EcmaModule) + 0x5a
       at Internal.TypeSystem.LockFreeReaderHashtable`2.CreateValueAndEnsureValueIsInTable(TKey) + 0x3b
       at Internal.TypeSystem.LockFreeReaderHashtable`2.GetOrCreateValue(TKey) + 0x62
       at ILCompiler.SubstitutionProvider.GetSubstitution(MethodDesc) + 0xaa
       at ILCompiler.SubstitutedILProvider.GetMethodIL(MethodDesc) + 0x67
       at ILCompiler.Compilation.CombinedILProvider.GetMethodIL(MethodDesc) + 0x4f
       at ILCompiler.Compilation.ILCache.CreateValueFromKey(MethodDesc) + 0x6a

@agocke agocke added this to the 9.0.0 milestone Jul 8, 2024
@agocke agocke removed the untriaged New issue has not been triaged by the area owner label Jul 8, 2024
@MichalStrehovsky
Copy link
Member

I think the failure in ILC.exe started happening after #103574 (this is ILC-compiled ILC, using the LKG ILC), so that narrows it down to like two weeks worth of commits -_-

ILC.exe failures don't have crashdumps because of infra limitations and neither does the libraries test failure (again because of infra design).

@MichalStrehovsky
Copy link
Member

I spent some time digging through pipeline logs and trying to repro locally. We've not seen this in libraries testing in the past 10+ days, only when running ILC. We picked up a new ILC on Wednesday and it hasn't reproed since. I was not able to get a local repro either (I was already running with the updated ILC though).

So fingers crossed that whatever this was, it was already fixed and we just need to wait a couple more days to get enough confidence to close this issue.

@carlossanlop
Copy link
Member

carlossanlop commented Jul 16, 2024

Unfortunately, it seems this has not yet been fixed. Here's a fresh hit from today:

   Generating native code
    Unhandled exception: System.AggregateException: One or more errors occurred. (Object synchronization method was called from an unsynchronized block of code.)
     ---> System.Threading.SynchronizationLockException: Object synchronization method was called from an unsynchronized block of code.
       at System.Threading.ObjectHeader.Release(Object) + 0x147
       at System.Threading.Monitor.Exit(Object) + 0x18
       at System.Resources.ResourceReader.AllocateStringForNameIndex(Int32, Int32&) + 0x76d
       at System.Resources.ResourceReader.ResourceEnumerator.get_Key() + 0xea
       at ILCompiler.SubstitutionProvider.AssemblyFeatureInfo..ctor(EcmaModule, Logger, IReadOnlyDictionary`2, BodyAndFieldSubstitutions) + 0x5d5
       at ILCompiler.SubstitutionProvider.FeatureSwitchHashtable.CreateValueFromKey(EcmaModule) + 0x74
       at Internal.TypeSystem.LockFreeReaderHashtable`2.CreateValueAndEnsureValueIsInTable(TKey) + 0x4d
       at Internal.TypeSystem.LockFreeReaderHashtable`2.GetOrCreateValue(TKey) + 0x5f
       at ILCompiler.SubstitutionProvider.GetSubstitution(MethodDesc) + 0xc5
       at ILCompiler.SubstitutedILProvider.GetMethodIL(MethodDesc) + 0x4d
       at ILCompiler.Compilation.CombinedILProvider.GetMethodIL(MethodDesc) + 0x4f
       at ILCompiler.Compilation.ILCache.CreateValueFromKey(MethodDesc) + 0x7c
       at Internal.TypeSystem.LockFreeReaderHashtable`2.CreateValueAndEnsureValueIsInTable(TKey) + 0x4d
       at Internal.TypeSystem.LockFreeReaderHashtable`2.GetOrCreateValue(TKey) + 0x5f
       at ILCompiler.Compilation.GetMethodIL(MethodDesc) + 0xac
       at Internal.JitInterface.CorInfoImpl.CompileMethod(MethodCodeNode, MethodIL) + 0xc7
       at ILCompiler.RyuJitCompilation.CompileSingleMethod(CorInfoImpl, MethodCodeNode) + 0xc0
       at ILCompiler.RyuJitCompilation.CompileSingleMethod(MethodCodeNode) + 0x86
       at System.Threading.Tasks.Parallel.<>c__DisplayClass32_0`2.<ForEachWorker>b__0(Int32) + 0x94
       at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`2.<ForWorker>b__1(RangeWorker&, Int64, Boolean&) + 0x2fe
    --- End of stack trace from previous location ---
       at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`2.<ForWorker>b__1(RangeWorker&, Int64, Boolean&) + 0x5ea
       at System.Threading.Tasks.TaskReplicator.Replica`1.ExecuteAction(Boolean&) + 0x69
       at System.Threading.Tasks.TaskReplicator.Replica.Execute() + 0xb2
       --- End of inner exception stack trace ---
       at System.Threading.Tasks.TaskReplicator.Run[TState](TaskReplicator.ReplicatableUserAction`1, ParallelOptions, Boolean) + 0x205
       at System.Threading.Tasks.Parallel.ForWorker[TLocal,TInt](TInt, TInt, ParallelOptions, Action`1, Action`2, Func`4, Func`1, Action`1) + 0x4ea
    --- End of stack trace from previous location ---
       at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection, CancellationToken, Exception) + 0x52
       at System.Threading.Tasks.Parallel.ForWorker[TLocal,TInt](TInt, TInt, ParallelOptions, Action`1, Action`2, Func`4, Func`1, Action`1) + 0x5eb
       at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IList`1, ParallelOptions, Action`1, Action`2, Action`3, Func`4, Func`5, Func`1, Action`1) + 0x1a8
       at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable`1, ParallelOptions, Action`1, Action`2, Action`3, Func`4, Func`5, Func`1, Action`1) + 0x1a3
       at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable`1, ParallelOptions, Action`1) + 0xaa
       at ILCompiler.RyuJitCompilation.CompileMultiThreaded(List`1) + 0x16e
       at ILCompiler.RyuJitCompilation.ComputeDependencyNodeDependencies(List`1) + 0x25b
       at ILCompiler.DependencyAnalysisFramework.DependencyAnalyzer`2.ComputeDependencies(List`1) + 0x61
       at ILCompiler.DependencyAnalysisFramework.DependencyAnalyzer`2.ComputeMarkedNodes() + 0xe1
       at ILCompiler.RyuJitCompilation.CompileInternal(String, ObjectDumper) + 0x55
       at ILCompiler.Compilation.ILCompiler.ICompilation.Compile(String, ObjectDumper) + 0x75
       at ILCompiler.Program.Run() + 0x41ba
       at ILCompiler.ILCompilerRootCommand.<>c__DisplayClass240_0.<.ctor>b__0(ParseResult) + 0x41c
       at System.CommandLine.Invocation.AnonymousSynchronousCliAction.Invoke(ParseResult) + 0x3f
       at System.CommandLine.Invocation.InvocationPipeline.Invoke(ParseResult) + 0x1b5
D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Debug\build\Microsoft.NETCore.Native.targets(309,5): error MSB3073: The command ""D:\a\_work\1\s\artifacts\bin\coreclr\windows.x64.Debug\ilc-published\\ilc" @"D:\a\_work\1\s\artifacts\tests\coreclr/obj/windows.x64.Debug/Managed/nativeaot\SmokeTests\HardwareIntrinsics\X64Avx512\native\X64Avx512.ilc.rsp"" exited with code 1. [D:\a\_work\1\s\src\tests\nativeaot\SmokeTests\HardwareIntrinsics\X64Avx512.csproj] [D:\a\_work\1\s\src\tests\build.proj]

@stephentoub
Copy link
Member

stephentoub commented Jul 17, 2024

I also just hit it (linux-x64 Debug NativeAOT)

    Generating native code
    Unhandled exception: System.AggregateException: One or more errors occurred. (Object synchronization method was called from an unsynchronized block of code.)
     ---> System.Threading.SynchronizationLockException: Object synchronization method was called from an unsynchronized block of code.
       at System.Threading.ObjectHeader.Release(Object) + 0x14a
       at System.Threading.Monitor.Exit(Object) + 0x17
       at System.Resources.ResourceReader.GetValueForNameIndex(Int32) + 0x253
       at System.Resources.ResourceReader.ResourceEnumerator.get_Value() + 0xdd
       at ILCompiler.SubstitutionProvider.AssemblyFeatureInfo..ctor(EcmaModule, Logger, IReadOnlyDictionary`2, BodyAndFieldSubstitutions) + 0x5eb
       at ILCompiler.SubstitutionProvider.FeatureSwitchHashtable.CreateValueFromKey(EcmaModule) + 0x5a
       at Internal.TypeSystem.LockFreeReaderHashtable`2.CreateValueAndEnsureValueIsInTable(TKey) + 0x3b
       at Internal.TypeSystem.LockFreeReaderHashtable`2.GetOrCreateValue(TKey) + 0x62
       at ILCompiler.SubstitutionProvider.GetSubstitution(MethodDesc) + 0xaa
       at ILCompiler.SubstitutedILProvider.GetMethodIL(MethodDesc) + 0x67
       at ILCompiler.Compilation.CombinedILProvider.GetMethodIL(MethodDesc) + 0x4f
       at ILCompiler.Compilation.ILCache.CreateValueFromKey(MethodDesc) + 0x6a
       at Internal.TypeSystem.LockFreeReaderHashtable`2.CreateValueAndEnsureValueIsInTable(TKey) + 0x3b
       at Internal.TypeSystem.LockFreeReaderHashtable`2.GetOrCreateValue(TKey) + 0x62
       at ILCompiler.Compilation.GetMethodIL(MethodDesc) + 0xb0
       at Internal.JitInterface.CorInfoImpl.CompileMethod(MethodCodeNode, MethodIL) + 0xaa
       at ILCompiler.RyuJitCompilation.CompileSingleMethod(CorInfoImpl, MethodCodeNode) + 0xcb
       at ILCompiler.RyuJitCompilation.CompileSingleMethod(MethodCodeNode) + 0x89
       at System.Threading.Tasks.Parallel.<>c__DisplayClass32_0`2.<ForEachWorker>b__0(Int32) + 0x67
       at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`2.<ForWorker>b__1(RangeWorker&, Int64, Boolean&) + 0x2ec
    --- End of stack trace from previous location ---
       at System.Threading.Tasks.Parallel.<>c__DisplayClass19_0`2.<ForWorker>b__1(RangeWorker&, Int64, Boolean&) + 0x5d6
       at System.Threading.Tasks.TaskReplicator.Replica`1.ExecuteAction(Boolean&) + 0x4d
       at System.Threading.Tasks.TaskReplicator.Replica.Execute() + 0x90
       --- End of inner exception stack trace ---
       at System.Threading.Tasks.TaskReplicator.Run[TState](TaskReplicator.ReplicatableUserAction`1, ParallelOptions, Boolean) + 0x1ef
       at System.Threading.Tasks.Parallel.ForWorker[TLocal,TInt](TInt, TInt, ParallelOptions, Action`1, Action`2, Func`4, Func`1, Action`1) + 0x4fc
    --- End of stack trace from previous location ---
       at System.Threading.Tasks.Parallel.ThrowSingleCancellationExceptionOrOtherException(ICollection, CancellationToken, Exception) + 0x51
       at System.Threading.Tasks.Parallel.ForWorker[TLocal,TInt](TInt, TInt, ParallelOptions, Action`1, Action`2, Func`4, Func`1, Action`1) + 0x602
       at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IList`1, ParallelOptions, Action`1, Action`2, Action`3, Func`4, Func`5, Func`1, Action`1) + 0x197
       at System.Threading.Tasks.Parallel.ForEachWorker[TSource,TLocal](IEnumerable`1, ParallelOptions, Action`1, Action`2, Action`3, Func`4, Func`5, Func`1, Action`1) + 0x1a2
       at System.Threading.Tasks.Parallel.ForEach[TSource](IEnumerable`1, ParallelOptions, Action`1) + 0xa3
       at ILCompiler.RyuJitCompilation.CompileMultiThreaded(List`1) + 0x178
       at ILCompiler.RyuJitCompilation.ComputeDependencyNodeDependencies(List`1) + 0x246
       at ILCompiler.DependencyAnalysisFramework.DependencyAnalyzer`2.ComputeDependencies(List`1) + 0x4e
       at ILCompiler.DependencyAnalysisFramework.DependencyAnalyzer`2.ComputeMarkedNodes() + 0xdb
       at ILCompiler.RyuJitCompilation.CompileInternal(String, ObjectDumper) + 0x53
       at ILCompiler.Compilation.ILCompiler.ICompilation.Compile(String, ObjectDumper) + 0x50
       at ILCompiler.Program.Run() + 0x3fa1
       at ILCompiler.ILCompilerRootCommand.<>c__DisplayClass240_0.<.ctor>b__0(ParseResult) + 0x424
       at System.CommandLine.Invocation.AnonymousSynchronousCliAction.Invoke(ParseResult) + 0x37
       at System.CommandLine.Invocation.InvocationPipeline.Invoke(ParseResult) + 0x158

@agocke
Copy link
Member

agocke commented Aug 1, 2024

A lot of these failures are coming from ResourceReader, which hasn't changed in a couple years. That might be just a common use of locking. It seems more likely that the problem is caused by something related to threading.

I'm changing the area to System.Threading for now.

@mangod9
Copy link
Member

mangod9 commented Aug 2, 2024

@agocke, @MichalStrehovsky: from the stack trace it appears that this is coming from the NativeAOT specific implementation of ObjectHeader correct? Looking at the source here:

if (!GetSyncEntryIndex(oldBits, out int syncIndex))
its not able to get the SyncEntry?

@MichalStrehovsky
Copy link
Member

There is one hit while running native AOT smoke tests:

        Running Test: ThreadLocalStatics.TLSTesting.ThreadLocalStatics_Test
        Caught Unexpected exception:System.AggregateException: One or more errors occurred. (Object synchronization method was called from an unsynchronized block of code.) (Object synchronization method was called from an unsynchronized block of code.)
         ---> System.Threading.SynchronizationLockException: Object synchronization method was called from an unsynchronized block of code.
           at System.Threading.ObjectHeader.Release(Object) + 0x1b1
           at System.Threading.Monitor.Exit(Object) + 0x18
           at Internal.Runtime.CompilerServices.FunctionPointerOps.GetGenericMethodFunctionPointer(IntPtr, IntPtr) + 0x464
           at Internal.Runtime.TypeLoader.TypeLoaderEnvironment.PreloadedTypeComparator.ProduceFatFunctionPointerMethodEntryPoint(IntPtr, IntPtr) + 0x2a
           at Internal.Runtime.TypeLoader.TypeLoaderEnvironment.InvokeMapEntryDataEnumerator`2.GetMethodEntryPointComponent(TDictionaryComponentType, IntPtr&) + 0xaa
           at Internal.Runtime.TypeLoader.TypeLoaderEnvironment.InvokeMapEntryDataEnumerator`2.GetMethodEntryPoint(IntPtr&, TDictionaryComponentType&, IntPtr&) + 0x87
           at Internal.Runtime.TypeLoader.TypeLoaderEnvironment.TryGetMethodInvokeMetadataFromInvokeMap(MetadataReader, RuntimeTypeHandle, MethodHandle, RuntimeTypeHandle[], MethodSignatureComparer&, CanonicalFormKind, MethodInvokeMetadata&) + 0x343
           at Internal.Runtime.TypeLoader.TypeLoaderEnvironment.TryGetMethodInvokeMetadata(RuntimeTypeHandle, QMethodDefinition, RuntimeTypeHandle[], MethodSignatureComparer&, CanonicalFormKind, MethodInvokeMetadata&) + 0x96
           at Internal.Reflection.Execution.ExecutionEnvironmentImplementation.TryGetMethodInvokeInfo(RuntimeTypeHandle, QMethodDefinition, RuntimeTypeHandle[], MethodBase, MethodSignatureComparer&, CanonicalFormKind) + 0x8d
           at Internal.Reflection.Execution.ExecutionEnvironmentImplementation.TryGetMethodInvoker(RuntimeTypeHandle, QMethodDefinition, RuntimeTypeHandle[]) + 0x116
           at Internal.Reflection.Core.Execution.ExecutionEnvironment.GetMethodInvoker(RuntimeTypeInfo, QMethodDefinition, RuntimeTypeInfo[], MemberInfo, Exception&) + 0x253
           at System.Reflection.Runtime.MethodInfos.NativeFormat.NativeFormatMethodCommon.GetUncachedMethodInvoker(RuntimeTypeInfo[], MemberInfo, Exception&) + 0xa0
           at System.Reflection.Runtime.MethodInfos.RuntimeNamedMethodInfo`1.GetUncachedMethodInvoker(RuntimeTypeInfo[], MemberInfo) + 0x4a
           at System.Reflection.Runtime.MethodInfos.RuntimeNamedMethodInfo`1.get_UncachedMethodInvoker() + 0x8a
           at System.Reflection.Runtime.MethodInfos.RuntimeMethodInfo.get_MethodInvoker() + 0x5b
           at System.Reflection.Runtime.MethodInfos.RuntimeMethodInfo.Invoke(Object, BindingFlags, Binder, Object[], CultureInfo) + 0x3f
           at System.Reflection.MethodBase.Invoke(Object, Object[]) + 0x3c
           at ThreadLocalStatics.TLSTesting.InvokerHelper(MethodInfo[], MethodInfo[], Object) + 0xa1
           at ThreadLocalStatics.TLSTesting.MakeType2(Type, Type, Boolean) + 0xb64
           at ThreadLocalStatics.TLSTesting.<>c__DisplayClass3_0.<MultiThreaded_Test>b__0() + 0x63

But the rest are in the compiler with ResourceReader. I was not able to get a repro locally either by running the compiler in a loop or by writing a targeted test that works similarly to the use of lock in ResourceReader. We create a ResourceReader that is local to the thread and never escapes so I'm not sure how someone else could get a hold of the instance and lock on it. Unfortunately I don't know much about how locking is implemented to see if we could instrument something to make it more likely to hit. Unfortunately we don't have a dump for any of these.

@mangod9
Copy link
Member

mangod9 commented Aug 2, 2024

We create a ResourceReader that is local to the thread and never escapes so I'm not sure how someone else could get a hold of the instance and lock on it.

Doesn't look like it's the case where another thread is holding it, but more likely that it's not able to find the relevant SyncEntryIndex from ObjHeader. But without a dump can't say, perhaps you can add some logging to check what the ObjHeader state is when it throws?

@VSadov
Copy link
Member

VSadov commented Aug 2, 2024

This kind of failure could happen if we try to release a thin lock that we did not acquire. Either the lock is not acquired by anyone or acquired by some other thread. The point is that the lock is not inflated into a full Lock. If it was, then we would get the fat Lock and try releasing that. If that would fail it would throw from a different location.
So we can rule out a bug in the Lock, at least.

The lock usage is typically correctly paired. Most uses would just use lock C# statement. Actually releasing a lock that was not acquired would be an unusual bug. I suspect that we see the sync block corruption. We stamped the syncblock with our thread ID after acquiring, but upon release the ID did not match.

Also possible that the lock was in fact inflated, but we could not retrieve the fat lock because the index of the corresponding fat lock got corrupted. This is less likely though. Corrupted index would result in some IndexOutOfRange failure, most likely.

So the working theory would be that someone corrupted the syncblock and messed up the owner thread ID part.

@VSadov
Copy link
Member

VSadov commented Aug 2, 2024

Some of the failures come from Wasm test runs. Is NativeAOT involved in that? Some tools are built with NativeAOT perhaps?

@VSadov VSadov self-assigned this Aug 2, 2024
@MichalStrehovsky
Copy link
Member

Some of the failures come from Wasm test runs. Is NativeAOT involved in that? Some tools are built with NativeAOT perhaps?

I'm pretty sure they are unrelated, the infra insists on pairing that with this issue even though there is no ObjectHeader.Release in the stack trace.

@VSadov
Copy link
Member

VSadov commented Aug 5, 2024

I was so far unable to reproduce the failure. Even after running the repro for hundreds of iterations.
but it is real (happens once a day or so). Maybe it is easier to figure the reason by examining involved code instead.

From what we see this is not a platform specific problem. We see it both on Linux and on Windows.
It also seems to be specific to Debug-NativeAOT, although that could be just because we run the tests this way.

It is also not a general problem with thin locks or with GC. The failures would be more widespread. These failures appear only in two very similar methods: GetValueForNameIndex or AllocateStringForNameIndex.

In both cases the methods take lock(this) and call another worker method(s). The calees have asserts that the lock is taken, so we can assume all is correct when we call the workers/helpers and we do hold the lock and right bits are set etc...
Even though there are locks, there is nothing concurrent in this scenario. We essentially allocate deserializer, and do some kind of "foreach", which will take locks on the deserializer instance. We do not pass deserializer around though, so this looks like a completely single-threaded. It should not be a case that the lock is contended/inflated or the instance gets a hash code installed. Even though that should not break anything, just useful to rule that out.

So the object header must be bad coming back from deserializing methods, but only very rarely. The methods do engage in some unsafe (as in fixed + pointers) things on some code paths.
Perhaps we have some rare buffer overrun there that may end up corrupting the syncblock of the deserializer object. If we would not try to unlock "this" we might not even know that, since the instance dies shortly after. Corrupting bits used for locking may not be noticeable if the bits are not actually used for locking.

(although - why we see this on Debug and why not on CoreCLR? Maybe also happens, or maybe there are some mitigating factors)

Looking at potential overruns and/or adding some asserts around questionable paths could be the next step in figuring this out.

This is what I know so far about this bug.

@mangod9
Copy link
Member

mangod9 commented Aug 6, 2024

adding @kouvel as well, in case there are any thoughts.

@VSadov
Copy link
Member

VSadov commented Aug 17, 2024

I think I figured the root cause for the failure. It is an unintended sideeffect of #92974

When we update object header bits we do Interlocked.CompareExchange(ref *pHeader, newBits, oldBits).
That by itself is ok, since we are doing interlocked operation via a pointer to a syncblock of a pinned object.

One interesting thing to notice though is that we pass the location by reference. - ref *pHeader.
Historically object's header is not a part of the object and its location belongs to the previous object in the heap, if such exists. Seeing a managed reference to the header (as in ref *pHeader) would be confusing to GC. Transient uses are ok though, as long as there is no chance to have an actual byref value that is at any point reportable as a GC root.

When CompareExchange is expanded as intrinsic, the cmpxchg uses the address of the location directly, thus the byref is not materialized.
Also, prior to #92974, if the intrinsic is not expanded (as in Debug), then we do make a call that formally takes a byref, but the target would just make another call to a native helper with same arguments. As a result - we did have byref pointing to the object header, but never at a location where they could be reported to GC. (or perhaps the opportunity window was a lot smaller - TBD).

#92974 made the CompareExchange self-referential relying on second level intrinsic expansion.

https://github.com/dotnet/runtime/pull/92974/files#r1720791671

Notice that in Debug the expansion is forced in the implementation of the CompareExchange, but the application code (in our case ILC) would still make calls to the actual method.
In Debug this method will be fully-interruptible, compiled with minimum optimizations, with loads and reloads of the arguments, prior the execution of the expanded intrinsic, thus there will be a byref value that is actually reportable to GC.

        [Intrinsic]
        public static int CompareExchange(ref int location1, int value, int comparand)
        {
#if TARGET_X86 || TARGET_AMD64 || TARGET_ARM64 || TARGET_RISCV64
            return CompareExchange(ref location1, value, comparand); // Must expand intrinsic
00007FF7E4DA8760 55                   push        rbp  
00007FF7E4DA8761 57                   push        rdi  
00007FF7E4DA8762 50                   push        rax  
00007FF7E4DA8763 48 8D 6C 24 10       lea         rbp,[rsp+10h]  
00007FF7E4DA8768 48 89 4D 10          mov         qword ptr [rbp+10h],rcx  
00007FF7E4DA876C 89 55 18             mov         dword ptr [rbp+18h],edx  
00007FF7E4DA876F 44 89 45 20          mov         dword ptr [rbp+20h],r8d  
00007FF7E4DA8773 48 8B 4D 10          mov         rcx,qword ptr [location1]  
00007FF7E4DA8777 8B 55 18             mov         edx,dword ptr [value]  
// when called form ObjectHeader.cs, a  GC here can cause corruption 
// of syncblock bits (or crash) if GC is compacting since RCX contains a byref to an object header
00007FF7E4DA877A 8B 45 20             mov         eax,dword ptr [comparand]   
00007FF7E4DA877D F0 0F B1 11          lock cmpxchg dword ptr [rcx],edx  
00007FF7E4DA8781 89 45 F4             mov         dword ptr [rbp-0Ch],eax  
00007FF7E4DA8784 8B 45 F4             mov         eax,dword ptr [rbp-0Ch]  
00007FF7E4DA8787 48 83 C4 08          add         rsp,8
00007FF7E4DA878B 5F                   pop         rdi  
00007FF7E4DA878C 5D                   pop         rbp  
00007FF7E4DA878D C3                   ret  

@VSadov
Copy link
Member

VSadov commented Aug 17, 2024

I think the most obvious fix for this would be introducing Interlocked.CompareExchange(int* location, int value, int comparand) and make that a JIT intrinsic similar to CompareExchange that takes a ref.

We are probably too close to shipping to do that.

Also considering that such helper would only be required for tricks like in ObjectHeader.cs. In most other cases the pattern like Interlocked.CompareExchange(ref *somePointer, newBits, oldBits) is safe to use. GC can sort out heap and nonheap byrefs. It is the byrefs that point to heap, but not to an actual object, that can cause troubles.

That is a fairly rare scenario, probably reserved only to the runtime itself - like in the case of ObjectHeader.cs

I plan to make a bit more scoped fix for 9.0 that would basically be:

  • platforms that have JIT intrinsic for CompareExchange, should always expand the intrinsic for NativeAOT (even in Debug).
  • platforms that do not have JIT intrinsic, should call an int * native helper and not have ref *pHeader anywhere on the code path.

@jkotas
Copy link
Member Author

jkotas commented Aug 17, 2024

Great catch!

I think the most obvious fix for this would be introducing Interlocked.CompareExchange(int* location, int value, int comparand) and make that a JIT intrinsic similar to CompareExchange that takes a ref. We are probably too close to shipping to do that.

I think it is still fine to introduce this intrinsic. It should be straightforward change in the JIT.

Depending on the subtle details when the code may or may not be interruptible is very fragile. It is likely going to have bug tail.

@VSadov
Copy link
Member

VSadov commented Aug 17, 2024

Note that this was found by running ILC with additional stress - basically an extra thread that periodically calls GC.Collect(0) and does some allocations (I wonder if that idea could be turned into something more generally useful).

With extra stress the issue causes crashes in ILC that can be caught and examined in debugger.
I am not 100% sure if it is responsible for the original SynchronizationLockException repro that happens once or twice a day in the lab, but since it is related to the syncblock, I think it is likely a result of the same issue.

@VSadov
Copy link
Member

VSadov commented Aug 17, 2024

Plausible scenario how this issue may cause the SynchronizationLockException

  • we are about to set lock bits in the current obj header
  • GC happens while we are in CompareExchange
  • there is a preceeding object, thus GC just assumes that our byref points to prev object.
  • GC moves the preceding object. (current obj cannot move, since it is pinned, but objects around it could)
  • our byref is adjusted according to how far the prev object was moved and now points to just past the former prev object in its new location - most likely to an obj header of an object that now trails our former prev.
  • we set the lock bits in that object`s header, which is harmless if noone else will use that object for locking
  • when time comes to unlock, we notice that our obj is actually not locked -> SynchronizationLockException

Note - if there is no preceding object, updating our byref will likely crash instead - while figuring how far the containing object has moved. And if the prev object does not move in this GC - everything will work ok - thus the repro is infrequent.

The reason why various asserts like Debug.Assert(!Monitor.IsEntered(_reader)); sprinkled around ResourceReader did not help is that we are AOT/JIT compiling our app (ILC) for Debug, but this is not a bootstrap build - we use another, toolset ILC, to compile the new one, thus we use IL of the toolset's CorLib which was C#-compiled for Release and thus no longer has asserts.

Basically - when compiled as a part of Debug build the actual ILC code would have asserts, but the CorLib portion of it would not.
This matches the experience when the end user compiles their app for Debug.

When, as the next step, we build tests using newly built Debug ILC, only at the time of releasing the ResourceReader lock we will notice if lock was not locked. (although, deceivingly, there are asserts in the code that check this earlier)

@VSadov
Copy link
Member

VSadov commented Aug 19, 2024

I think the most obvious fix for this would be introducing Interlocked.CompareExchange(int* location, int value, int comparand) and make that a JIT intrinsic similar to CompareExchange that takes a ref. We are probably too close to shipping to do that.

I think it is still fine to introduce this intrinsic. It should be straightforward change in the JIT.

I'll try introducing such intrinsic.

As a backup, I have a more scoped solution as described in #104340 (comment)
I tested that locally and it seems to work, at least for the "extra stress" repro, which sees no crashes with the fix.

But I agree, a CompareExchange(int* location, . . . approach would be more robust.

@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Aug 20, 2024
@VSadov
Copy link
Member

VSadov commented Aug 20, 2024

I think it is still fine to introduce this intrinsic. It should be straightforward change in the JIT.

It looks like nothing needs to be changed in JIT. At least I could not find what could be changed.
For CompareExchange intrinsic everything seems to be driven by the return type, which is the same int. And GC-liveness of the location argument depends on the type of the argument.

As a result, once the pointer-taking CompareExchange overload is introduced JIT emits same expansion for it as for byref-based, modulo GC tracking of the ref, - exactly what we want. That is without any changes to the JIT.
And that is basically what we want.

@AndyAyersMS
Copy link
Member

Hit this again in #107811

@VSadov
Copy link
Member

VSadov commented Sep 14, 2024

The ILC failure cannot go away until toolset is updated and picks up an ILC that is not affected by a bug.

I guess it has not been updated yet? @MichalStrehovsky

@AndyAyersMS
Copy link
Member

Ah, could be another one of those "we're waiting for RC1 to become the build toolset" things...

@mangod9
Copy link
Member

mangod9 commented Sep 14, 2024

yeah believe it should be updated next week now that RC1 is released.

@jkotas
Copy link
Member Author

jkotas commented Sep 14, 2024

This is fixed in RC2. RC1 does not have the fix for this one.

@mangod9
Copy link
Member

mangod9 commented Sep 14, 2024

oh yes, then not for another few weeks :( .

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Threading in-pr There is an active PR which will close this issue when it is merged Known Build Error Use this to report build issues in the .NET Helix tab
Projects
Archived in project
8 participants