Infinite spin lock in Encoding.GetEncoding #33383

filipnavara · 2020-03-09T15:59:43Z

We've encountered weird deadlock/live lock on .NET Core 3.1 and captured a memory dump of it. The dump may contain sensitive information and hence I am not comfortable sharing it publicly but I will be happy to share it privately or dump specific information from it.

Three threads are racing on the same spin lock which is marked as locked but there doesn't seem to be any thread actually locking it.

Stack traces for the thread look like this:

Thread 18060:

 	System.Private.CoreLib.dll!System.Threading.ReaderWriterLockSlim.SpinLock.EnterSpin(System.Threading.ReaderWriterLockSlim.EnterSpinLockReason reason) Line 1606	C#
 	System.Private.CoreLib.dll!System.Threading.ReaderWriterLockSlim.TryEnterUpgradeableReadLockCore(System.Threading.ReaderWriterLockSlim.TimeoutTracker timeout) Line 681	C#
 	System.Text.Encoding.CodePages.dll!System.Text.EncodingTable.GetCodePageFromName(string name) Line 26	C#
 	System.Text.Encoding.CodePages.dll!System.Text.CodePagesEncodingProvider.GetEncoding(string name) Line 87	C#
 	System.Private.CoreLib.dll!System.Text.EncodingProvider.GetEncodingFromProvider(string encodingName) Line 94	C#
 	System.Private.CoreLib.dll!System.Text.Encoding.GetEncoding(string name) Line 315	C#
 	MailClient.Mail.dll!MailClient.Mail.MailHeaderDictionary.MailHeaderDictionary(System.IO.Stream stream)	Unknown

looping with spinIndex = 556840848.

Thread 22080:

>	System.Private.CoreLib.dll!System.Threading.ReaderWriterLockSlim.SpinLock.EnterSpin(System.Threading.ReaderWriterLockSlim.EnterSpinLockReason reason) Line 1608	C#
 	System.Private.CoreLib.dll!System.Threading.ReaderWriterLockSlim.TryEnterUpgradeableReadLockCore(System.Threading.ReaderWriterLockSlim.TimeoutTracker timeout) Line 681	C#
 	System.Text.Encoding.CodePages.dll!System.Text.EncodingTable.GetCodePageFromName(string name) Line 26	C#
 	System.Text.Encoding.CodePages.dll!System.Text.CodePagesEncodingProvider.GetEncoding(string name) Line 87	C#
 	System.Private.CoreLib.dll!System.Text.EncodingProvider.GetEncodingFromProvider(string encodingName) Line 94	C#
 	System.Net.Mail.dll!System.Net.Mail.MailAddress.MailAddress(string address, string displayName, System.Text.Encoding displayNameEncoding) Line 76	C#
 	System.Net.Mail.dll!System.Net.Mail.MailAddress.MailAddress(string address) Line 46	C#

looping with spinIndex = 0.

Thread 23232:

 	System.Private.CoreLib.dll!System.Threading.ReaderWriterLockSlim.SpinLock.EnterSpin(System.Threading.ReaderWriterLockSlim.EnterSpinLockReason reason) Line 1606	C#
 	System.Private.CoreLib.dll!System.Threading.ReaderWriterLockSlim.TryEnterUpgradeableReadLockCore(System.Threading.ReaderWriterLockSlim.TimeoutTracker timeout) Line 681	C#
 	System.Text.Encoding.CodePages.dll!System.Text.EncodingTable.GetCodePageFromName(string name) Line 26	C#
 	System.Text.Encoding.CodePages.dll!System.Text.CodePagesEncodingProvider.GetEncoding(string name) Line 87	C#
 	System.Private.CoreLib.dll!System.Text.EncodingProvider.GetEncodingFromProvider(string encodingName) Line 94	C#
>	System.Private.CoreLib.dll!System.Text.Encoding.GetEncoding(string name) Line 315	C#
 	System.Net.Http.dll!System.Net.Http.HttpContent.ReadBufferAsString(System.ArraySegment<byte> buffer, System.Net.Http.Headers.HttpContentHeaders headers)	Unknown
 	System.Net.Http.dll!System.Net.Http.HttpContent.ReadBufferedContentAsString()	Unknown
 	System.Net.Http.dll!System.Net.Http.HttpContent.ReadAsStringAsync.AnonymousMethod__36_0(System.Net.Http.HttpContent s)	Unknown
 	System.Net.Http.dll!System.Net.Http.HttpContent.WaitAndReturnAsync<System.Net.Http.HttpContent, string>(System.Threading.Tasks.Task waitTask, System.Net.Http.HttpContent state, System.Func<System.Net.Http.HttpContent, string> returnFunc)	Unknown
 	System.Net.Http.dll!System.Net.Http.HttpContent.ReadAsStringAsync()	Unknown

looping with spinIndex = 552403664.

EncodingTable.s_cacheLock object looks like this:

	Name	Value	Type
	CurrentReadCount	0	int
	HasNoWaiters	true	bool
	IsReadLockHeld	false	bool
	IsUpgradeableReadLockHeld	false	bool
	IsWriteLockHeld	false	bool
	RecursionPolicy	NoRecursion	System.Threading.LockRecursionPolicy
	RecursiveReadCount	0	int
	RecursiveUpgradeCount	0	int
	RecursiveWriteCount	0	int
	WaitingReadCount	0	int
	WaitingUpgradeCount	0	int
	WaitingWriteCount	0	int
	_fDisposed	false	bool
	_fIsReentrant	false	bool
	_fUpgradeThreadHoldingRead	false	bool
	_lockID	2	long
	_numReadWaiters	0	uint
	_numUpgradeWaiters	0	uint
	_numWriteUpgradeWaiters	0	uint
	_numWriteWaiters	0	uint
	_owners	0	uint
▶	_readEvent	null	System.Threading.EventWaitHandle
▶	_spinLock	{System.Threading.ReaderWriterLockSlim.SpinLock}	System.Threading.ReaderWriterLockSlim.SpinLock
▶	_upgradeEvent	null	System.Threading.EventWaitHandle
	_upgradeLockOwnerId	-1	int
▶	_waitUpgradeEvent	null	System.Threading.EventWaitHandle
	_waiterStates	NoWaiters	System.Threading.ReaderWriterLockSlim.WaiterStates
▶	_writeEvent	null	System.Threading.EventWaitHandle
	_writeLockOwnerId	-1	int

It seems that the lock object is not held by anything but it seems to spin three threads anyway.

For completeness, this is how the SpinLock object looks:

Name	Value	Type
EnterForEnterAnyReadDeprioritizedCount	0	ushort
EnterForEnterAnyWriteDeprioritizedCount	0	ushort
_enterDeprioritizationState	0	int
_isLocked	1	int

The text was updated successfully, but these errors were encountered:

Dotnet-GitSync-Bot · 2020-03-09T15:59:46Z

I couldn't add an area label to this Issue.

Checkout this page to find out which area owner to ping, or please add exactly one area label to help train me in the future.

filipnavara · 2020-03-09T16:34:32Z

/cc @kouvel (not sure if the spinning in SpinLock is intended to go this high)

filipnavara · 2020-03-09T22:29:08Z

I cannot really confirm it yet but I am starting to suspect that thread 22080 actually got past the TryEnter and it's the one locking the SpinLock. At that point it probably lost its time quantum and the two other spinning threads are causing some abnormal scheduling behavior that never lets it run again. There's around 100 threads in total when the application reaches this point so the other threads could possibly be contributing too.

We have a machine where this issue seems to be triggered couple of time a day but not reliably. It didn't trigger by simply running Encoding.GetEncoding in parallel on multiple threads.

Update: The theory about thread 22080 is probably bogus. The disassembly shows the following:

3119DE4C  cmp         dword ptr [esi],0   << EIP points here
3119DE4F  jne         3119DE78  
3119DE51  mov         edx,1  
3119DE56  xor         eax,eax  
3119DE58  lock cmpxchg dword ptr [esi],edx

which comes from this C# code line:

if (_isLocked == 0 && TryEnter())

Hence, it didn't get past the TryEnter.

kouvel · 2020-03-09T22:35:14Z

ReaderWriterLockSlim is not hardened for thread-abort, or might be related to #8710. It would be difficult to tell from a crash dump what might have happened, as based on the state above it looks like info about the owner thread would have been cleared already. Are there any thread-aborts, OOMs, or reentrancy happening in the scenario?

filipnavara · 2020-03-09T22:38:57Z

No thread aborts in the whole app, .NET Core doesn't have any way to trigger them explicitly anyway. No re-entrancy as far as I can tell. The only OutOfMemoryException in the dump is the one pre-allocated on startup and it has no fields filled in. The one run where we captured it under debugger didn't show it either.

filipnavara · 2020-03-09T22:44:52Z

As far as the re-entrancy goes I cannot completely rule it out on the UI thread. It's not happening at the particular moment the dump is captured and there are no references to the ReaderWriterLockSlim on threads other than the spinning ones. However, some re-entrancy could have happened earlier in the run on the UI thread. I'd have to look into the implications and whether it could possibly result in a scenario where the spin lock looks locked.

kouvel · 2020-03-09T22:55:16Z

At the moment I don't see how it would get into that state due to reentrancy, as the spin-lock is released before any wait operation that could result in reentrancy.

We have a machine where this issue seems to be triggered couple of time a day but not reliably.

Since it appears to be happening fairly frequently, an option might be to add some logging to enter/exit on the spin-waits logging the managed thread ID and look for a failure to find a missing exit.

It looks like the implementation hasn't changed significantly since 2.1 based on history, is the scenario new, or has there been enough coverage on 2.x to indicate that this is a new problem in 3.x?

filipnavara · 2020-03-09T23:01:53Z

We migrated our application from .NET Framework where the Encoding code paths were significantly different from .NET Core. We went straight to 3.0/3.1 since it's a desktop app which was not supported earlier. Unfortunately I don't have any data to support whether this is a problem that was present there earlier or not.

I can try to add some custom logging but it will require me to build custom runtime so that will definitely take me couple of days (especially for building 3.1 from source).

kouvel · 2020-03-09T23:20:43Z

I think logging would at least provide more info (provided a repro is found with logs), as I don't see a way to make progress with the current info. What I have done before in a somewhat similar case is to sync to the exact commit in the dotnet/coreclr repo that matches the commit hash from coreclr.dll from the runtime being used, and then (using an xcopy-able runtime) update only System.Private.CoreLib.dll with additional logging. If using Console, it may be necessary to reflect into it for logging, or do in-memory logging and dump only the interesting logs at some point.

filipnavara · 2020-03-09T23:26:56Z

I'm thinking about changing ReaderWriterLockSlim.SpinLock._isLocked to actually track the thread id that locked it. That way I should be able to preserve most of the logic and have the most up-to-date information. I'll give it a try tomorrow.

Thanks for the idea with just replacing CoreLib. I've messed that up before because I need to match correct build configuration to the runtime (release/debug/checked) since they are not interchangeable. Hopefully I won't mess it this time.

kouvel · 2020-03-09T23:33:53Z

I'm thinking about changing ReaderWriterLockSlim.SpinLock._isLocked to actually track the thread id that locked it.

That would hopefully point to the thread but by the time of failure it would be too late and the thread's stack would be long past the issue. Definitely would be more info, it would be interesting to see stack traces for the enter without a matching exit, although that may make it more difficult to repro if it's a timing issue. I'm sure there would be ways to progress from there, like adding try/finally's to track when there is be a missing exit, and to log the stack trace when that happens.

kouvel · 2020-03-09T23:40:28Z

Or maybe try/catch-all+rethrow/finallys to ensure that the spin-lock is not held by the by current thread, by the end of any entry point, with logs in the catch-all and finally. Hopefully there should be no other way for the logs to not find a more actionable cause for the issue.

filipnavara · 2020-04-10T09:37:29Z

We were unable to reproduce the issue again, even on the same machine. Going to close it for now. Thanks for help and ideas!

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Mar 9, 2020

jkotas added area-System.Threading tenet-reliability Reliability/stability related issue (stress, load problems, etc.) labels Mar 9, 2020

filipnavara closed this as completed Apr 10, 2020

ghost locked as resolved and limited conversation to collaborators Dec 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite spin lock in Encoding.GetEncoding #33383

Infinite spin lock in Encoding.GetEncoding #33383

filipnavara commented Mar 9, 2020

Dotnet-GitSync-Bot commented Mar 9, 2020

filipnavara commented Mar 9, 2020

filipnavara commented Mar 9, 2020 •

edited

Loading

kouvel commented Mar 9, 2020

filipnavara commented Mar 9, 2020

filipnavara commented Mar 9, 2020

kouvel commented Mar 9, 2020

filipnavara commented Mar 9, 2020

kouvel commented Mar 9, 2020

filipnavara commented Mar 9, 2020

kouvel commented Mar 9, 2020

kouvel commented Mar 9, 2020 •

edited

Loading

filipnavara commented Apr 10, 2020

Infinite spin lock in Encoding.GetEncoding #33383

Infinite spin lock in Encoding.GetEncoding #33383

Comments

filipnavara commented Mar 9, 2020

Dotnet-GitSync-Bot commented Mar 9, 2020

filipnavara commented Mar 9, 2020

filipnavara commented Mar 9, 2020 • edited Loading

kouvel commented Mar 9, 2020

filipnavara commented Mar 9, 2020

filipnavara commented Mar 9, 2020

kouvel commented Mar 9, 2020

filipnavara commented Mar 9, 2020

kouvel commented Mar 9, 2020

filipnavara commented Mar 9, 2020

kouvel commented Mar 9, 2020

kouvel commented Mar 9, 2020 • edited Loading

filipnavara commented Apr 10, 2020

filipnavara commented Mar 9, 2020 •

edited

Loading

kouvel commented Mar 9, 2020 •

edited

Loading