Skip to content

Garbage Collection hang since upgrading to .Net 9 #112203

@scotttho-datacom

Description

@scotttho-datacom

Description

Since upgrading our application to .Net 9 we are seeing processes lock up and hang indefinitely (stuck for several hours before being killed manually). Analysis of the memory dumps suggests a garbage collection issue.

Have collected many memory dumps and can probably share a memory dump from our dev environment privately if needed.

WinDBg output:

0:000> !analyze -v
*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************

*** WARNING: Unable to verify checksum for Datascape.exe

KEY_VALUES_STRING: 1

    Key  : Analysis.CPU.mSec
    Value: 3625

    Key  : Analysis.Elapsed.mSec
    Value: 76019

    Key  : Analysis.IO.Other.Mb
    Value: 3

    Key  : Analysis.IO.Read.Mb
    Value: 1

    Key  : Analysis.IO.Write.Mb
    Value: 13

    Key  : Analysis.Init.CPU.mSec
    Value: 437

    Key  : Analysis.Init.Elapsed.mSec
    Value: 6136

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 233

    Key  : Analysis.Version.DbgEng
    Value: 10.0.27725.1000

    Key  : Analysis.Version.Description
    Value: 10.2408.27.01 amd64fre

    Key  : Analysis.Version.Ext
    Value: 1.2408.27.1

    Key  : CLR.Engine
    Value: CORECLR

    Key  : CLR.Version
    Value: 9.0.24.52809

    Key  : Failure.Bucket
    Value: BREAKPOINT_80000003_coreclr.dll!SVR::gc_heap::wait_for_gc_done

    Key  : Failure.Hash
    Value: {e9e7019b-8511-725f-3866-9e920f973aa5}

    Key  : Failure.Source.FileLine
    Value: 14738

    Key  : Failure.Source.FilePath
    Value: D:\a\_work\1\s\src\coreclr\gc\gc.cpp

    Key  : Failure.Source.SourceServerCommand
    Value: raw.githubusercontent.com/dotnet/runtime/9d5a6a9aa463d6d10b0b0ba6d5982cc82f363dc3/src/coreclr/gc/gc.cpp

    Key  : Timeline.OS.Boot.DeltaSec
    Value: 886293

    Key  : Timeline.Process.Start.DeltaSec
    Value: 461511

    Key  : WER.OS.Branch
    Value: fe_release

    Key  : WER.OS.Version
    Value: 10.0.20348.1

    Key  : WER.Process.Version
    Value: 2.7.122.1440


FILE_IN_CAB:  Datascape (2).DMP

NTGLOBALFLAG:  0

APPLICATION_VERIFIER_FLAGS:  0

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 0000000000000000
   ExceptionCode: 80000003 (Break instruction exception)
  ExceptionFlags: 00000000
NumberParameters: 0

FAULTING_THREAD:  000023b4

PROCESS_NAME:  Datascape.dll

ERROR_CODE: (NTSTATUS) 0x80000003 - {EXCEPTION}  Breakpoint  A breakpoint has been reached.

EXCEPTION_CODE_STR:  80000003

STACK_TEXT:  
00000066`f4d7b438 00007ffd`d01ada4e     : 00000000`00000000 00000000`00001528 00000000`00000030 00000000`00000018 : ntdll!NtWaitForSingleObject+0x14
00000066`f4d7b440 00007ffd`a39d0991     : 00000000`00000000 00000000`00000018 00000260`00000000 00000000`000000a8 : KERNELBASE!WaitForSingleObjectEx+0x8e
00000066`f4d7b4e0 00007ffd`a3912632     : 00000000`00000001 00000000`00000000 00000000`00000000 00007ffd`00000000 : coreclr!SVR::gc_heap::wait_for_gc_done+0x5d
00000066`f4d7b510 00007ffd`a39f8819     : 00000260`a4ebf118 00000260`a4e21728 00000260`a4ebf118 00000000`00000040 : coreclr!SVR::GCHeap::GarbageCollectGeneration+0xee
00000066`f4d7b560 00007ffd`a3a34d96     : 00000000`00000000 00000260`a4ebf118 00000000`00000040 00000260`a4e21728 : coreclr!SVR::gc_heap::trigger_gc_for_alloc+0x15
00000066`f4d7b590 00007ffd`a3a34c7e     : 00000000`00000000 00000000`00000000 00000000`00000000 00000260`e851e288 : coreclr!SVR::gc_heap::try_allocate_more_space+0x656de
00000066`f4d7b600 00007ffd`a3920417     : 00000000`00000002 00000000`00000040 00000260`a4e21728 00000260`f1e34b28 : coreclr!SVR::gc_heap::allocate_more_space+0x65772
00000066`f4d7b660 00007ffd`a3948a71     : 00000000`00000002 00007ffd`44f31a4e 00000260`a4e21728 00000260`c5be9e68 : coreclr!SVR::GCHeap::Alloc+0xb7
00000066`f4d7b6c0 00007ffd`a39488cd     : 00000260`e854bb10 00000000`00000000 00000066`f4d7b970 00000260`e854c500 : coreclr!AllocateObject+0x101
00000066`f4d7b750 00007ffd`4ffdbe60     : 00007ffd`50236ce8 00000260`e854c758 00000260`e854c568 00000000`00000000 : coreclr!JIT_New+0xdd
00000066`f4d7b8b0 00007ffd`50236ce8     : 00000260`e854c758 00000260`e854c568 00000000`00000000 00000066`f4d7b8b0 : EntityFramework!System.Data.Entity.Core.Mapping.ViewGeneration.Validation.ForeignConstraint.CheckIfConstraintMappedToForeignKeyAssociation+0x580
00000066`f4d7b8b8 00000260`e854c758     : 00000260`e854c568 00000000`00000000 00000066`f4d7b8b0 00007ffd`4f85db7e : 0x00007ffd`50236ce8
00000066`f4d7b8c0 00000260`e854c568     : 00000000`00000000 00000066`f4d7b8b0 00007ffd`4f85db7e 00000000`00000000 : 0x00000260`e854c758
00000066`f4d7b8c8 00000000`00000000     : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00000260`e854c568


STACK_COMMAND:  ~0s; .ecxr ; kb

FAULTING_SOURCE_LINE:  D:\a\_work\1\s\src\coreclr\gc\gc.cpp

FAULTING_SOURCE_FILE:  D:\a\_work\1\s\src\coreclr\gc\gc.cpp

FAULTING_SOURCE_LINE_NUMBER:  14738

FAULTING_SOURCE_SRV_COMMAND:  https://raw.githubusercontent.com/dotnet/runtime/9d5a6a9aa463d6d10b0b0ba6d5982cc82f363dc3/src/coreclr/gc/gc.cpp

FAULTING_SOURCE_CODE:  
No source found for 'D:\a\_work\1\s\src\coreclr\gc\windows\gcenv.windows.cpp'


SYMBOL_NAME:  coreclr!SVR::gc_heap::wait_for_gc_done+5d

MODULE_NAME: coreclr

IMAGE_NAME:  coreclr.dll

FAILURE_BUCKET_ID:  BREAKPOINT_80000003_coreclr.dll!SVR::gc_heap::wait_for_gc_done

OS_VERSION:  10.0.20348.1

BUILDLAB_STR:  fe_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

IMAGE_VERSION:  9.0.24.52809

FAILURE_ID_HASH:  {e9e7019b-8511-725f-3866-9e920f973aa5}

Followup:     MachineOwner
---------

stacks.txt

Reproduction Steps

We have not been able to reproduce this on demand, but are seeing it on a nightly basis

Expected behavior

Application not to hang

Actual behavior

Application hangs indefinitely

Regression?

Seems to have come in since upgrading to .Net 9

Known Workarounds

None

Configuration

Some background on our setup:
Application Servers (windows x64)

  • Many instances (~40) of the app run as windows services
  • Following GC environment variables set
    • DOTNET_GCConserveMemory=6
    • DOTNET_gcServer=1
    • DOTNET_gcTrimCommitOnLowMemory=1
    • DOTNET_GCDynamicAdaptationMode=1
  • Note DATAS made a fantastic improvment for us in .Net 8 and we have been pretty happy with how GC has been working with this config
  • Have not had a single instance of the issue on these servers

Processing Servers (windows x64)

  • Run the same app as a cmd line process to do some short processing (20-40 mins) over night
  • Up to 5 instances running at a time
  • No GC environment variable set (have never needed to worry about GC here as they are short lived processes)
  • Most nights seeing 1-5 processes hang indefinitely

Have collected many memory dumps and the hang is often in the same area of code (Entity Framework) which looks like it is doing quite a lot of allocations.

Have tried applying the same GC settings to the processing servers and have had mixed results. No hangs in our dev environment after 2 days but 2 hangs in production the first day after applying the settings.

Other information

Possibly related:
#110350
#105780

Apologies if this is a duplicate of either of those, bit hard for me to tell so I figured a separate issue report might be best.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-GC-coreclrneeds-author-actionAn issue or pull request that requires more info or actions from the author.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions