-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Description
Since upgrading our application to .Net 9 we are seeing processes lock up and hang indefinitely (stuck for several hours before being killed manually). Analysis of the memory dumps suggests a garbage collection issue.
Have collected many memory dumps and can probably share a memory dump from our dev environment privately if needed.
WinDBg output:
0:000> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
*** WARNING: Unable to verify checksum for Datascape.exe
KEY_VALUES_STRING: 1
Key : Analysis.CPU.mSec
Value: 3625
Key : Analysis.Elapsed.mSec
Value: 76019
Key : Analysis.IO.Other.Mb
Value: 3
Key : Analysis.IO.Read.Mb
Value: 1
Key : Analysis.IO.Write.Mb
Value: 13
Key : Analysis.Init.CPU.mSec
Value: 437
Key : Analysis.Init.Elapsed.mSec
Value: 6136
Key : Analysis.Memory.CommitPeak.Mb
Value: 233
Key : Analysis.Version.DbgEng
Value: 10.0.27725.1000
Key : Analysis.Version.Description
Value: 10.2408.27.01 amd64fre
Key : Analysis.Version.Ext
Value: 1.2408.27.1
Key : CLR.Engine
Value: CORECLR
Key : CLR.Version
Value: 9.0.24.52809
Key : Failure.Bucket
Value: BREAKPOINT_80000003_coreclr.dll!SVR::gc_heap::wait_for_gc_done
Key : Failure.Hash
Value: {e9e7019b-8511-725f-3866-9e920f973aa5}
Key : Failure.Source.FileLine
Value: 14738
Key : Failure.Source.FilePath
Value: D:\a\_work\1\s\src\coreclr\gc\gc.cpp
Key : Failure.Source.SourceServerCommand
Value: raw.githubusercontent.com/dotnet/runtime/9d5a6a9aa463d6d10b0b0ba6d5982cc82f363dc3/src/coreclr/gc/gc.cpp
Key : Timeline.OS.Boot.DeltaSec
Value: 886293
Key : Timeline.Process.Start.DeltaSec
Value: 461511
Key : WER.OS.Branch
Value: fe_release
Key : WER.OS.Version
Value: 10.0.20348.1
Key : WER.Process.Version
Value: 2.7.122.1440
FILE_IN_CAB: Datascape (2).DMP
NTGLOBALFLAG: 0
APPLICATION_VERIFIER_FLAGS: 0
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 0000000000000000
ExceptionCode: 80000003 (Break instruction exception)
ExceptionFlags: 00000000
NumberParameters: 0
FAULTING_THREAD: 000023b4
PROCESS_NAME: Datascape.dll
ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION} Breakpoint A breakpoint has been reached.
EXCEPTION_CODE_STR: 80000003
STACK_TEXT:
00000066`f4d7b438 00007ffd`d01ada4e : 00000000`00000000 00000000`00001528 00000000`00000030 00000000`00000018 : ntdll!NtWaitForSingleObject+0x14
00000066`f4d7b440 00007ffd`a39d0991 : 00000000`00000000 00000000`00000018 00000260`00000000 00000000`000000a8 : KERNELBASE!WaitForSingleObjectEx+0x8e
00000066`f4d7b4e0 00007ffd`a3912632 : 00000000`00000001 00000000`00000000 00000000`00000000 00007ffd`00000000 : coreclr!SVR::gc_heap::wait_for_gc_done+0x5d
00000066`f4d7b510 00007ffd`a39f8819 : 00000260`a4ebf118 00000260`a4e21728 00000260`a4ebf118 00000000`00000040 : coreclr!SVR::GCHeap::GarbageCollectGeneration+0xee
00000066`f4d7b560 00007ffd`a3a34d96 : 00000000`00000000 00000260`a4ebf118 00000000`00000040 00000260`a4e21728 : coreclr!SVR::gc_heap::trigger_gc_for_alloc+0x15
00000066`f4d7b590 00007ffd`a3a34c7e : 00000000`00000000 00000000`00000000 00000000`00000000 00000260`e851e288 : coreclr!SVR::gc_heap::try_allocate_more_space+0x656de
00000066`f4d7b600 00007ffd`a3920417 : 00000000`00000002 00000000`00000040 00000260`a4e21728 00000260`f1e34b28 : coreclr!SVR::gc_heap::allocate_more_space+0x65772
00000066`f4d7b660 00007ffd`a3948a71 : 00000000`00000002 00007ffd`44f31a4e 00000260`a4e21728 00000260`c5be9e68 : coreclr!SVR::GCHeap::Alloc+0xb7
00000066`f4d7b6c0 00007ffd`a39488cd : 00000260`e854bb10 00000000`00000000 00000066`f4d7b970 00000260`e854c500 : coreclr!AllocateObject+0x101
00000066`f4d7b750 00007ffd`4ffdbe60 : 00007ffd`50236ce8 00000260`e854c758 00000260`e854c568 00000000`00000000 : coreclr!JIT_New+0xdd
00000066`f4d7b8b0 00007ffd`50236ce8 : 00000260`e854c758 00000260`e854c568 00000000`00000000 00000066`f4d7b8b0 : EntityFramework!System.Data.Entity.Core.Mapping.ViewGeneration.Validation.ForeignConstraint.CheckIfConstraintMappedToForeignKeyAssociation+0x580
00000066`f4d7b8b8 00000260`e854c758 : 00000260`e854c568 00000000`00000000 00000066`f4d7b8b0 00007ffd`4f85db7e : 0x00007ffd`50236ce8
00000066`f4d7b8c0 00000260`e854c568 : 00000000`00000000 00000066`f4d7b8b0 00007ffd`4f85db7e 00000000`00000000 : 0x00000260`e854c758
00000066`f4d7b8c8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00000260`e854c568
STACK_COMMAND: ~0s; .ecxr ; kb
FAULTING_SOURCE_LINE: D:\a\_work\1\s\src\coreclr\gc\gc.cpp
FAULTING_SOURCE_FILE: D:\a\_work\1\s\src\coreclr\gc\gc.cpp
FAULTING_SOURCE_LINE_NUMBER: 14738
FAULTING_SOURCE_SRV_COMMAND: https://raw.githubusercontent.com/dotnet/runtime/9d5a6a9aa463d6d10b0b0ba6d5982cc82f363dc3/src/coreclr/gc/gc.cpp
FAULTING_SOURCE_CODE:
No source found for 'D:\a\_work\1\s\src\coreclr\gc\windows\gcenv.windows.cpp'
SYMBOL_NAME: coreclr!SVR::gc_heap::wait_for_gc_done+5d
MODULE_NAME: coreclr
IMAGE_NAME: coreclr.dll
FAILURE_BUCKET_ID: BREAKPOINT_80000003_coreclr.dll!SVR::gc_heap::wait_for_gc_done
OS_VERSION: 10.0.20348.1
BUILDLAB_STR: fe_release
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
IMAGE_VERSION: 9.0.24.52809
FAILURE_ID_HASH: {e9e7019b-8511-725f-3866-9e920f973aa5}
Followup: MachineOwner
---------
Reproduction Steps
We have not been able to reproduce this on demand, but are seeing it on a nightly basis
Expected behavior
Application not to hang
Actual behavior
Application hangs indefinitely
Regression?
Seems to have come in since upgrading to .Net 9
Known Workarounds
None
Configuration
Some background on our setup:
Application Servers (windows x64)
- Many instances (~40) of the app run as windows services
- Following GC environment variables set
- DOTNET_GCConserveMemory=6
- DOTNET_gcServer=1
- DOTNET_gcTrimCommitOnLowMemory=1
- DOTNET_GCDynamicAdaptationMode=1
- Note DATAS made a fantastic improvment for us in .Net 8 and we have been pretty happy with how GC has been working with this config
- Have not had a single instance of the issue on these servers
Processing Servers (windows x64)
- Run the same app as a cmd line process to do some short processing (20-40 mins) over night
- Up to 5 instances running at a time
- No GC environment variable set (have never needed to worry about GC here as they are short lived processes)
- Most nights seeing 1-5 processes hang indefinitely
Have collected many memory dumps and the hang is often in the same area of code (Entity Framework) which looks like it is doing quite a lot of allocations.
Have tried applying the same GC settings to the processing servers and have had mixed results. No hangs in our dev environment after 2 days but 2 hangs in production the first day after applying the settings.
Other information
Possibly related:
#110350
#105780
Apologies if this is a duplicate of either of those, bit hard for me to tell so I figured a separate issue report might be best.