From fda62585e989ab573c7527de8793e696b47fd31f Mon Sep 17 00:00:00 2001 From: Jakob Botsch Nielsen Date: Mon, 23 May 2022 19:17:41 +0200 Subject: [PATCH 1/4] Add arm64 support for EnC This adds support for EnC on arm64. A couple of notes on the implementation compared to x64: - On x64 we get the fixed stack size from unwind info. However, for the frames we set up on arm64 for EnC it is not possible to extract the frame size from there because their prologs generally look like stp fp, lr, [sp,#-16]! mov fp, sp sub sp, sp, #144 with unwind codes like the following: set_fp; mov fp, sp save_fplr_x #1 (0x01); tp fp, lr, [sp, #-16]! As can be seen, it is not possible to get the fixed stack size from unwind info in this case. Instead we pass it through the GC info that already has a section for EnC data. - On arm64 the JIT is required to place the PSPSym at the same offset from caller-SP for both the main function and for funclets. Due to this we try to allocate the PSPSym as early as possible in the main function and we must take some care in funclets. However, this conflicts with the EnC frame header that the JIT uses to place values that must be preserved on EnC transitions. This is currently callee-saved registers and the MonitorAcquired boolean. Before this change we were allocating PSPSym above (before) the monitor acquired boolean, but we now have to allocate MonitorAcquired first, particularly because the size of the preserved header cannot change on EnC transitions, while the PSPSym can disappear or appear. This changes frame allocation slightly for synchronized functions. --- docs/design/coreclr/botr/clr-abi.md | 21 ++-- src/coreclr/clrdefinitions.cmake | 4 +- src/coreclr/debug/ee/arm64/arm64walker.cpp | 11 +- src/coreclr/gcinfo/gcinfoencoder.cpp | 19 +++- src/coreclr/inc/gcinfodecoder.h | 8 +- src/coreclr/inc/gcinfoencoder.h | 8 +- src/coreclr/inc/gcinfotypes.h | 1 + src/coreclr/jit/codegenarm.cpp | 7 +- src/coreclr/jit/codegenarm64.cpp | 7 ++ src/coreclr/jit/codegenarmarch.cpp | 56 +++++++--- src/coreclr/jit/codegencommon.cpp | 18 ++-- src/coreclr/jit/codegenxarch.cpp | 21 ++-- src/coreclr/jit/flowgraph.cpp | 20 ++-- src/coreclr/jit/lclvars.cpp | 60 +++++------ src/coreclr/jit/lsra.cpp | 11 +- src/coreclr/jit/targetamd64.h | 3 + src/coreclr/jit/targetarm64.h | 6 ++ src/coreclr/vm/eetwain.cpp | 118 +++++++++++++++------ src/coreclr/vm/encee.cpp | 25 ++++- src/coreclr/vm/gcinfodecoder.cpp | 18 +++- src/coreclr/vm/jitinterface.cpp | 4 +- 21 files changed, 312 insertions(+), 134 deletions(-) diff --git a/docs/design/coreclr/botr/clr-abi.md b/docs/design/coreclr/botr/clr-abi.md index 4e4193b984515..2c8ab96c6a32e 100644 --- a/docs/design/coreclr/botr/clr-abi.md +++ b/docs/design/coreclr/botr/clr-abi.md @@ -653,13 +653,15 @@ Edit and Continue (EnC) is a special flavor of un-optimized code. The debugger h In the current design, the JIT does not have access to the previous versions of the method and so it has to assume the worst case. EnC is designed for simplicity, not for performance of the generated code. -EnC is currently enabled on x86 and x64 only, but the same principles would apply if it is ever enabled on other platforms. +EnC is currently enabled on x86, x64 and ARM64 only, but the same principles would apply if it is ever enabled on other platforms. The following sections describe the various Edit and Continue code conventions that must be followed. ## EnC flag in GCInfo -The JIT records the fact that it has followed conventions for EnC code in GC Info. On x64, this flag is implied by recording the size of the stack frame region preserved between EnC edits (`GcInfoEncoder::SetSizeOfEditAndContinuePreservedArea`). For normal methods on JIT64, the size of this region is 2 slots (saved `RBP` and return address). On RyuJIT/AMD64, the size of this region is increased to include `RSI` and `RDI`, so that `rep stos` can be used for block initialization and block moves. +The JIT records the fact that it has followed conventions for EnC code in GC Info. On x64/ARM64, this flag is implied by recording the size of the stack frame region preserved between EnC edits (`GcInfoEncoder::SetSizeOfEditAndContinuePreservedArea`). For x64 the size of this region is increased to include `RSI` and `RDI`, so that `rep stos` can be used for block initialization and block moves. ARM64 saves only the FP and LR registers when EnC is enabled and does not use other callee saved registers. + +To successfully perform EnC transitions the runtime needs to know the size of the stack frames it is transitioning between. For x64 code the size can be extracted from unwind codes. This is not possible for arm64 code as the frames are set up in such a way that the unwind codes do not allow to retrieve this value. Therefore, on ARM64 the GC info contains also the size of the fixed stack frame to be used for EnC purposes. ## Allocating local variables backward @@ -667,27 +669,30 @@ This is required to preserve addresses of the existing locals when an EnC edit a ## Fixed set of callee-saved registers -This eliminates need to deal with the different sets in the VM, and makes preservation of local addresses easier. On x64, we choose to always save `RBP` only. There are plenty of volatile registers and so lack of non-volatile registers does not impact quality of non-optimized code. +This eliminates need to deal with the different sets in the VM, and makes preservation of local addresses easier. There are plenty of volatile registers and so lack of non-volatile registers does not heavily impact quality of non-optimized code. +x64 currently saves RBP, RSI and RDI while ARM64 saves just FP and LR. ## EnC is supported for methods with EH However, EnC remap is not supported inside funclets. The stack layout of funclets does not matter for EnC. -## Initial RSP == RBP == PSPSym +## Considerations with regards to PSPSym -This invariant allows VM to compute new value of `RBP` and PSPSym after the edit without any additional information. Location of PSPSym is found via GC info. +As explained previously in this document, on x64 we have Initial RSP == PSPSym. For EnC methods, as we disallow remappings after localloc (see below), we furthermore have RBP == PSPSym. +For ARM64 we have Caller SP == PSPSym and the FP points to the previously saved FP/LR pair. For EnC the JIT always sets up the stack frame so that the FP/LR pair is at Caller SP - 16 and does not save any additional callee saves. +These invariants allow the VM to compute new value of the frame pointer and PSPSym after the edit without any additional information. Note that the frame pointer and PSPSym do not change values or location on ARM64. However, EH may be added to a function in which case a new PSPSym needs to be materialized, even on ARM64. Location of PSPSym is found via GC info. ## Localloc -Localloc is allowed in EnC code, but remap is disallowed after the method has executed a localloc instruction. VM uses the invariant above (`RSP == RBP`) to detect whether localloc was executed by the method. +Localloc is allowed in EnC code, but remap is disallowed after the method has executed a localloc instruction. VM uses the invariants above (`RSP == RBP` on x64, `FP + 16 == SP + stack size` on ARM64) to detect whether localloc was executed by the method. ## Security object -This does not require any special handling by the JIT on x64. (Different from x86). The security object is copied over by the VM during remap if necessary. Location of security object is found via GC info. +This does not require any special handling by the JIT on x64/arm64. (Different from x86). The security object is copied over by the VM during remap if necessary. Location of security object is found via GC info. ## Synchronized methods -The extra state created by the JIT for synchronized methods (original `this` and lock taken flag) must be preserved during remap. The JIT stores this state in the preserved region, and increases the size of the preserved region reported in GC info accordingly. +The extra state created by the JIT for synchronized methods (lock taken flag) must be preserved during remap. The JIT stores this state in the preserved region, and increases the size of the preserved region reported in GC info accordingly. ## Generics diff --git a/src/coreclr/clrdefinitions.cmake b/src/coreclr/clrdefinitions.cmake index 432059aa9e39f..19adf12292568 100644 --- a/src/coreclr/clrdefinitions.cmake +++ b/src/coreclr/clrdefinitions.cmake @@ -56,11 +56,11 @@ endif(CLR_CMAKE_HOST_WIN32) if (NOT (CLR_CMAKE_TARGET_ARCH_I386 AND CLR_CMAKE_TARGET_UNIX)) add_compile_definitions(EnC_SUPPORTED) endif() -if(CLR_CMAKE_TARGET_ARCH_AMD64 OR (CLR_CMAKE_TARGET_ARCH_I386 AND CLR_CMAKE_TARGET_WIN32)) +if(CLR_CMAKE_TARGET_ARCH_AMD64 OR CLR_CMAKE_TARGET_ARCH_I386 OR CLR_CMAKE_TARGET_ARCH_ARM64) if(CLR_CMAKE_TARGET_WIN32) add_compile_definitions(FEATURE_ENC_SUPPORTED) endif(CLR_CMAKE_TARGET_WIN32) -endif(CLR_CMAKE_TARGET_ARCH_AMD64 OR (CLR_CMAKE_TARGET_ARCH_I386 AND CLR_CMAKE_TARGET_WIN32)) +endif(CLR_CMAKE_TARGET_ARCH_AMD64 OR CLR_CMAKE_TARGET_ARCH_I386 OR CLR_CMAKE_TARGET_ARCH_ARM64) # Features - please keep them alphabetically sorted if(CLR_CMAKE_TARGET_WIN32) diff --git a/src/coreclr/debug/ee/arm64/arm64walker.cpp b/src/coreclr/debug/ee/arm64/arm64walker.cpp index 6c4dee9349700..4a605b874b7b5 100644 --- a/src/coreclr/debug/ee/arm64/arm64walker.cpp +++ b/src/coreclr/debug/ee/arm64/arm64walker.cpp @@ -127,15 +127,22 @@ BYTE* NativeWalker::SetupOrSimulateInstructionForPatchSkip(T_CONTEXT * context, offset = offset | 0xFFFFFFFFFFE00000; } + PCODE finalAddr; if ((opcode & 0x80000000) != 0) //ADRP { offset = offset << 12; - LOG((LF_CORDB, LL_INFO100000, "Arm64Walker::Simulate opcode: %x to ADRP X%d %p\n", opcode, RegNum, offset)); + finalAddr = (PCODE)(address + offset) & ~0xFFF; + LOG((LF_CORDB, LL_INFO100000, "Arm64Walker::Simulate opcode: %x to ADRP X%d %p finalAddr = %p\n", opcode, RegNum, offset, finalAddr)); } else { - LOG((LF_CORDB, LL_INFO100000, "Arm64Walker::Simulate opcode: %x to ADR X%d %p\n", opcode, RegNum, offset)); + finalAddr = (PCODE)(address + offset); + LOG((LF_CORDB, LL_INFO100000, "Arm64Walker::Simulate opcode: %x to ADR X%d %p finalAddr = %p\n", opcode, RegNum, offset, finalAddr)); } + + CORDbgSetInstruction((CORDB_ADDRESS_TYPE *)patchBypass, 0xd503201f); //Add Nop in buffer + SetReg(context, RegNum, finalAddr); + return patchBypass; } else if ((opcode & 0x3B000000) == 0x18000000) //LDR Literal (General or SIMD) diff --git a/src/coreclr/gcinfo/gcinfoencoder.cpp b/src/coreclr/gcinfo/gcinfoencoder.cpp index 19e914676b2f8..8d011591350f4 100644 --- a/src/coreclr/gcinfo/gcinfoencoder.cpp +++ b/src/coreclr/gcinfo/gcinfoencoder.cpp @@ -344,7 +344,7 @@ GcInfoSize& GcInfoSize::operator+=(const GcInfoSize& other) NumUntracked += other.NumUntracked; NumTransitions += other.NumTransitions; SizeOfCode += other.SizeOfCode; - EncPreservedSlots += other.EncPreservedSlots; + EncInfoSize += other.EncInfoSize; UntrackedSlotSize += other.UntrackedSlotSize; NumUntrackedSize += other.NumUntrackedSize; @@ -392,7 +392,7 @@ void GcInfoSize::Log(DWORD level, const char * header) LogSpew(LF_GCINFO, level, "NumUntracked: %Iu\n", NumUntracked); LogSpew(LF_GCINFO, level, "NumTransitions: %Iu\n", NumTransitions); LogSpew(LF_GCINFO, level, "SizeOfCode: %Iu\n", SizeOfCode); - LogSpew(LF_GCINFO, level, "EncPreservedSlots: %Iu\n", EncPreservedSlots); + LogSpew(LF_GCINFO, level, "EncInfoSize: %Iu\n", EncInfoSize); LogSpew(LF_GCINFO, level, "---SIZES(bits)---\n"); LogSpew(LF_GCINFO, level, "Total: %Iu\n", TotalSize); @@ -484,6 +484,9 @@ GcInfoEncoder::GcInfoEncoder( m_StackBaseRegister = NO_STACK_BASE_REGISTER; m_SizeOfEditAndContinuePreservedArea = NO_SIZE_OF_EDIT_AND_CONTINUE_PRESERVED_AREA; +#ifdef TARGET_ARM64 + m_SizeOfEditAndContinueFixedStackFrame = 0; +#endif m_ReversePInvokeFrameSlot = NO_REVERSE_PINVOKE_FRAME; #ifdef TARGET_AMD64 m_WantsReportOnlyLeaf = false; @@ -760,6 +763,13 @@ void GcInfoEncoder::SetSizeOfEditAndContinuePreservedArea( UINT32 slots ) m_SizeOfEditAndContinuePreservedArea = slots; } +#ifdef TARGET_ARM64 +void GcInfoEncoder::SetSizeOfEditAndContinueFixedStackFrame( UINT32 size ) +{ + m_SizeOfEditAndContinueFixedStackFrame = size; +} +#endif + #ifdef TARGET_AMD64 void GcInfoEncoder::SetWantsReportOnlyLeaf() { @@ -1157,7 +1167,10 @@ void GcInfoEncoder::Build() if (m_SizeOfEditAndContinuePreservedArea != NO_SIZE_OF_EDIT_AND_CONTINUE_PRESERVED_AREA) { - GCINFO_WRITE_VARL_U(m_Info1, m_SizeOfEditAndContinuePreservedArea, SIZE_OF_EDIT_AND_CONTINUE_PRESERVED_AREA_ENCBASE, EncPreservedSlots); + GCINFO_WRITE_VARL_U(m_Info1, m_SizeOfEditAndContinuePreservedArea, SIZE_OF_EDIT_AND_CONTINUE_PRESERVED_AREA_ENCBASE, EncInfoSize); +#ifdef TARGET_ARM64 + GCINFO_WRITE_VARL_U(m_Info1, m_SizeOfEditAndContinueFixedStackFrame, SIZE_OF_EDIT_AND_CONTINUE_FIXED_STACK_FRAME_ENCBASE, EncInfoSize); +#endif } if (hasReversePInvokeFrame) diff --git a/src/coreclr/inc/gcinfodecoder.h b/src/coreclr/inc/gcinfodecoder.h index 66814e6457322..af76b7da65faf 100644 --- a/src/coreclr/inc/gcinfodecoder.h +++ b/src/coreclr/inc/gcinfodecoder.h @@ -238,7 +238,7 @@ enum GcInfoHeaderFlags #elif defined(TARGET_ARM) || defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64) GC_INFO_HAS_TAILCALLS = 0x80, #endif // TARGET_AMD64 - GC_INFO_HAS_EDIT_AND_CONTINUE_PRESERVED_SLOTS = 0x100, + GC_INFO_HAS_EDIT_AND_CONTINUE_INFO = 0x100, GC_INFO_REVERSE_PINVOKE_FRAME = 0x200, GC_INFO_FLAGS_BIT_SIZE_VERSION_1 = 9, @@ -547,6 +547,9 @@ class GcInfoDecoder UINT32 GetCodeLength(); UINT32 GetStackBaseRegister(); UINT32 GetSizeOfEditAndContinuePreservedArea(); +#ifdef TARGET_ARM64 + UINT32 GetSizeOfEditAndContinueFixedStackFrame(); +#endif size_t GetNumBytesRead(); #ifdef FIXED_STACK_PARAMETER_SCRATCH_AREA @@ -578,6 +581,9 @@ class GcInfoDecoder UINT32 m_CodeLength; UINT32 m_StackBaseRegister; UINT32 m_SizeOfEditAndContinuePreservedArea; +#ifdef TARGET_ARM64 + UINT32 m_SizeOfEditAndContinueFixedStackFrame; +#endif ReturnKind m_ReturnKind; #ifdef PARTIALLY_INTERRUPTIBLE_GC_SUPPORTED UINT32 m_NumSafePoints; diff --git a/src/coreclr/inc/gcinfoencoder.h b/src/coreclr/inc/gcinfoencoder.h index 2848612303c1f..dd438a7eb2d2a 100644 --- a/src/coreclr/inc/gcinfoencoder.h +++ b/src/coreclr/inc/gcinfoencoder.h @@ -118,7 +118,7 @@ struct GcInfoSize size_t NumUntracked; size_t NumTransitions; size_t SizeOfCode; - size_t EncPreservedSlots; + size_t EncInfoSize; size_t UntrackedSlotSize; size_t NumUntrackedSize; @@ -434,6 +434,9 @@ class GcInfoEncoder // Number of slots preserved during EnC remap void SetSizeOfEditAndContinuePreservedArea( UINT32 size ); +#ifdef TARGET_ARM64 + void SetSizeOfEditAndContinueFixedStackFrame( UINT32 size ); +#endif #ifdef TARGET_AMD64 // Used to only report a frame once for the leaf function/funclet @@ -510,6 +513,9 @@ class GcInfoEncoder UINT32 m_CodeLength; UINT32 m_StackBaseRegister; UINT32 m_SizeOfEditAndContinuePreservedArea; +#ifdef TARGET_ARM64 + UINT32 m_SizeOfEditAndContinueFixedStackFrame; +#endif INT32 m_ReversePInvokeFrameSlot; InterruptibleRange* m_pLastInterruptibleRange; diff --git a/src/coreclr/inc/gcinfotypes.h b/src/coreclr/inc/gcinfotypes.h index 901f147b99975..77e1401e95596 100644 --- a/src/coreclr/inc/gcinfotypes.h +++ b/src/coreclr/inc/gcinfotypes.h @@ -753,6 +753,7 @@ void FASTCALL decodeCallPattern(int pattern, #define STACK_BASE_REGISTER_ENCBASE 2 // FP encoded as 0, SP as 2. #define SIZE_OF_STACK_AREA_ENCBASE 3 #define SIZE_OF_EDIT_AND_CONTINUE_PRESERVED_AREA_ENCBASE 4 +#define SIZE_OF_EDIT_AND_CONTINUE_FIXED_STACK_FRAME_ENCBASE 4 #define REVERSE_PINVOKE_FRAME_ENCBASE 6 #define NUM_REGISTERS_ENCBASE 3 #define NUM_STACK_SLOTS_ENCBASE 2 diff --git a/src/coreclr/jit/codegenarm.cpp b/src/coreclr/jit/codegenarm.cpp index 6d1857f0b8158..cacfa8af3f4c9 100644 --- a/src/coreclr/jit/codegenarm.cpp +++ b/src/coreclr/jit/codegenarm.cpp @@ -2457,9 +2457,14 @@ void CodeGen::genCaptureFuncletPrologEpilogInfo() regMaskTP rsMaskSaveRegs = regSet.rsMaskCalleeSaved; unsigned saveRegsCount = genCountBits(rsMaskSaveRegs); unsigned saveRegsSize = saveRegsCount * REGSIZE_BYTES; // bytes of regs we're saving + unsigned saveSizeWithPSP = saveRegsSize + REGSIZE_BYTES /* PSP sym */; + if (compiler->lvaMonAcquired != BAD_VAR_NUM) + { + saveSizeWithPSP += TARGET_POINTER_SIZE; + } assert(compiler->lvaOutgoingArgSpaceSize % REGSIZE_BYTES == 0); unsigned funcletFrameSize = - preSpillRegArgSize + saveRegsSize + REGSIZE_BYTES /* PSP slot */ + compiler->lvaOutgoingArgSpaceSize; + preSpillRegArgSize + saveSizeWithPSP + compiler->lvaOutgoingArgSpaceSize; unsigned funcletFrameSizeAligned = roundUp(funcletFrameSize, STACK_ALIGN); unsigned funcletFrameAlignmentPad = funcletFrameSizeAligned - funcletFrameSize; diff --git a/src/coreclr/jit/codegenarm64.cpp b/src/coreclr/jit/codegenarm64.cpp index 1b0f9780cb2e6..f23c733e48cdd 100644 --- a/src/coreclr/jit/codegenarm64.cpp +++ b/src/coreclr/jit/codegenarm64.cpp @@ -1692,6 +1692,13 @@ void CodeGen::genCaptureFuncletPrologEpilogInfo() saveRegsPlusPSPSize += MAX_REG_ARG * REGSIZE_BYTES; } + if (compiler->lvaMonAcquired != BAD_VAR_NUM) + { + // We furthermore allocate the "monitor acquired" bool between PSP and + // the saved registers because this is part of the EnC header. + saveRegsPlusPSPSize += compiler->lvaLclSize(compiler->lvaMonAcquired); + } + unsigned const saveRegsPlusPSPSizeAligned = roundUp(saveRegsPlusPSPSize, STACK_ALIGN); assert(compiler->lvaOutgoingArgSpaceSize % REGSIZE_BYTES == 0); diff --git a/src/coreclr/jit/codegenarmarch.cpp b/src/coreclr/jit/codegenarmarch.cpp index d91c0feb383c4..481e3f2d79b19 100644 --- a/src/coreclr/jit/codegenarmarch.cpp +++ b/src/coreclr/jit/codegenarmarch.cpp @@ -4159,24 +4159,36 @@ void CodeGen::genCreateAndStoreGCInfo(unsigned codeSize, { // what we have to preserve is called the "frame header" (see comments in VM\eetwain.cpp) // which is: - // -return address - // -saved off RBP - // -saved 'this' pointer and bool for synchronized methods + // -return address (saved lr) + // -saved off FP + // -all callee-preserved registers in case of varargs + // -saved bool for synchronized methods - // 4 slots for RBP + return address + RSI + RDI - int preservedAreaSize = 4 * REGSIZE_BYTES; + int preservedAreaSize = (2 + genCountBits(RBM_ENC_CALLEE_SAVED)) * REGSIZE_BYTES; + + if (compiler->info.compIsVarArgs) + { + // Varargs save all int registers between saved registers and PSP. + preservedAreaSize += MAX_REG_ARG * REGSIZE_BYTES; + } if (compiler->info.compFlags & CORINFO_FLG_SYNCH) { - if (!(compiler->info.compFlags & CORINFO_FLG_STATIC)) - preservedAreaSize += REGSIZE_BYTES; + // bool in synchronized methods that tracks whether the lock has been taken (takes a full pointer sized slot) + preservedAreaSize += TARGET_POINTER_SIZE; - preservedAreaSize += 1; // bool for synchronized methods + // Verify that MonAcquired bool is at the bottom of the frame header + assert(compiler->lvaGetCallerSPRelativeOffset(compiler->lvaMonAcquired) == -preservedAreaSize); } // Used to signal both that the method is compiled for EnC, and also the size of the block at the top of the // frame gcInfoEncoder->SetSizeOfEditAndContinuePreservedArea(preservedAreaSize); + gcInfoEncoder->SetSizeOfEditAndContinueFixedStackFrame(genTotalFrameSize()); + + JITDUMP("EnC info:\n"); + JITDUMP(" EnC preserved area size = %d\n", preservedAreaSize); + JITDUMP(" Fixed stack frame size = %d\n", genTotalFrameSize()); } #endif // TARGET_ARM64 @@ -4694,6 +4706,8 @@ void CodeGen::genPushCalleeSavedRegisters() // |-----------------------| // |Callee saved registers | // not including FP/LR; multiple of 8 bytes // |-----------------------| + // | MonitorAcquired | + // |-----------------------| // | PSP slot | // 8 bytes (omitted in NativeAOT ABI) // |-----------------------| // | locals, temps, etc. | @@ -4725,6 +4739,8 @@ void CodeGen::genPushCalleeSavedRegisters() // |-----------------------| // |Callee saved registers | // not including FP/LR; multiple of 8 bytes // |-----------------------| + // | MonitorAcquired | + // |-----------------------| // | PSP slot | // 8 bytes (omitted in NativeAOT ABI) // |-----------------------| // | locals, temps, etc. | @@ -4822,7 +4838,7 @@ void CodeGen::genPushCalleeSavedRegisters() maskSaveRegsInt &= ~(RBM_FP | RBM_LR); // We've already saved FP/LR offset = (int)compiler->compLclFrameSize + 2 * REGSIZE_BYTES; // 2 for FP/LR } - else if (totalFrameSize <= 512) + else if ((totalFrameSize <= 512) && !compiler->opts.compDbgEnC) { // Case #2. // @@ -4956,6 +4972,7 @@ void CodeGen::genPushCalleeSavedRegisters() } else { + assert(!compiler->opts.compDbgEnC); JITDUMP("Frame type 3 (save FP/LR at bottom). #outsz=%d; #framesz=%d; LclFrameSize=%d\n", unsigned(compiler->lvaOutgoingArgSpaceSize), totalFrameSize, compiler->compLclFrameSize); @@ -5110,15 +5127,22 @@ void CodeGen::genPushCalleeSavedRegisters() establishFramePointer = false; int remainingFrameSz = totalFrameSize - calleeSaveSpDelta; - assert(remainingFrameSz > 0); - assert((remainingFrameSz % 16) == 0); // this is guaranteed to be 16-byte aligned because each component -- - // totalFrameSize and calleeSaveSpDelta -- is 16-byte aligned. + if (remainingFrameSz > 0) + { + assert((remainingFrameSz % 16) == 0); // this is guaranteed to be 16-byte aligned because each component -- + // totalFrameSize and calleeSaveSpDelta -- is 16-byte aligned. - JITDUMP(" remainingFrameSz=%d\n", remainingFrameSz); + JITDUMP(" remainingFrameSz=%d\n", remainingFrameSz); - // We've already established the frame pointer, so no need to report the stack pointer change to unwind info. - genStackPointerAdjustment(-remainingFrameSz, initReg, pInitRegZeroed, /* reportUnwindData */ false); - offset += remainingFrameSz; + // We've already established the frame pointer, so no need to report the stack pointer change to unwind info. + genStackPointerAdjustment(-remainingFrameSz, initReg, pInitRegZeroed, /* reportUnwindData */ false); + offset += remainingFrameSz; + } + else + { + // Should only have picked this frame type for EnC. + assert(compiler->opts.compDbgEnC); + } } else { diff --git a/src/coreclr/jit/codegencommon.cpp b/src/coreclr/jit/codegencommon.cpp index 6fc7d4e1d1a0d..7f6ed78d0a5f7 100644 --- a/src/coreclr/jit/codegencommon.cpp +++ b/src/coreclr/jit/codegencommon.cpp @@ -1841,6 +1841,10 @@ void CodeGen::genGenerateMachineCode() { printf("; optimized code\n"); } + else if (compiler->opts.compDbgEnC) + { + printf("; EnC code\n"); + } else if (compiler->opts.compDbgCode) { printf("; debuggable code\n"); @@ -5308,15 +5312,17 @@ void CodeGen::genFinalizeFrame() { // We always save FP. noway_assert(isFramePointerUsed()); -#ifdef TARGET_AMD64 - // On x64 we always save exactly RBP, RSI and RDI for EnC. - regMaskTP okRegs = (RBM_CALLEE_TRASH | RBM_FPBASE | RBM_RSI | RBM_RDI); - regSet.rsSetRegsModified(RBM_RSI | RBM_RDI); +#if defined(TARGET_AMD64) || defined(TARGET_ARM64) + regMaskTP okRegs = (RBM_CALLEE_TRASH | RBM_FPBASE | RBM_ENC_CALLEE_SAVED); + if (RBM_ENC_CALLEE_SAVED != 0) + { + regSet.rsSetRegsModified(RBM_ENC_CALLEE_SAVED); + } noway_assert((regSet.rsGetModifiedRegsMask() & ~okRegs) == 0); -#else // !TARGET_AMD64 +#else // !TARGET_AMD64 && !TARGET_ARM64 // On x86 we save all callee saved regs so the saved reg area size is consistent regSet.rsSetRegsModified(RBM_INT_CALLEE_SAVED & ~RBM_FPBASE); -#endif // !TARGET_AMD64 +#endif // !TARGET_AMD64 && !TARGET_ARM64 } /* If we have any pinvoke calls, we might potentially trash everything */ diff --git a/src/coreclr/jit/codegenxarch.cpp b/src/coreclr/jit/codegenxarch.cpp index be96c95750cc2..ea58abb15433b 100644 --- a/src/coreclr/jit/codegenxarch.cpp +++ b/src/coreclr/jit/codegenxarch.cpp @@ -8436,26 +8436,27 @@ void CodeGen::genCreateAndStoreGCInfoX64(unsigned codeSize, unsigned prologSize // what we have to preserve is called the "frame header" (see comments in VM\eetwain.cpp) // which is: // -return address - // -saved off RBP - // -saved 'this' pointer and bool for synchronized methods + // -saved RBP + // -saved bool for synchronized methods - // 4 slots for RBP + return address + RSI + RDI - int preservedAreaSize = 4 * REGSIZE_BYTES; + // slots for ret address + FP + EnC callee-saves + int preservedAreaSize = (2 + genCountBits(RBM_ENC_CALLEE_SAVED)) * REGSIZE_BYTES; if (compiler->info.compFlags & CORINFO_FLG_SYNCH) { - if (!(compiler->info.compFlags & CORINFO_FLG_STATIC)) - { - preservedAreaSize += REGSIZE_BYTES; - } + // bool in synchronized methods that tracks whether the lock has been taken (takes a full pointer sized slot) + preservedAreaSize += TARGET_POINTER_SIZE; - // bool in synchronized methods that tracks whether the lock has been taken (takes 4 bytes on stack) - preservedAreaSize += 4; + // Verify that MonAcquired bool is at the bottom of the frame header + assert(compiler->lvaGetCallerSPRelativeOffset(compiler->lvaMonAcquired) == -preservedAreaSize); } // Used to signal both that the method is compiled for EnC, and also the size of the block at the top of the // frame gcInfoEncoder->SetSizeOfEditAndContinuePreservedArea(preservedAreaSize); + + JITDUMP("EnC info:\n"); + JITDUMP(" EnC preserved area size = %d\n", preservedAreaSize); } if (compiler->opts.IsReversePInvoke()) diff --git a/src/coreclr/jit/flowgraph.cpp b/src/coreclr/jit/flowgraph.cpp index 7782c35597ed7..09293af0109ca 100644 --- a/src/coreclr/jit/flowgraph.cpp +++ b/src/coreclr/jit/flowgraph.cpp @@ -1783,8 +1783,9 @@ void Compiler::fgAddSyncMethodEnterExit() } // Create a 'monitor acquired' boolean (actually, an unsigned byte: 1 = acquired, 0 = not acquired). - - var_types typeMonAcquired = TYP_UBYTE; + // For EnC this is part of the frame header. Furthermore, this is allocated above PSP on ARM64. + // To avoid complicated reasoning about alignment we always allocate a full pointer sized slot for this. + var_types typeMonAcquired = TYP_I_IMPL; this->lvaMonAcquired = lvaGrabTemp(true DEBUGARG("Synchronized method monitor acquired boolean")); lvaTable[lvaMonAcquired].lvType = typeMonAcquired; @@ -1810,10 +1811,12 @@ void Compiler::fgAddSyncMethodEnterExit() #endif } - // Make a copy of the 'this' pointer to be used in the handler so it does not inhibit enregistration - // of all uses of the variable. - unsigned lvaCopyThis = 0; - if (!info.compIsStatic) + // Make a copy of the 'this' pointer to be used in the handler so it does + // not inhibit enregistration of all uses of the variable. We cannot do + // this optimization in EnC code as we would need to take care to save the + // copy on EnC transitions, so guard this on optimizations being enabled. + unsigned lvaCopyThis = BAD_VAR_NUM; + if (opts.OptimizationEnabled() && !info.compIsStatic) { lvaCopyThis = lvaGrabTemp(true DEBUGARG("Synchronized method copy of this for handler")); lvaTable[lvaCopyThis].lvType = TYP_REF; @@ -1833,7 +1836,7 @@ void Compiler::fgAddSyncMethodEnterExit() } // exceptional case - fgCreateMonitorTree(lvaMonAcquired, lvaCopyThis, faultBB, false /*exit*/); + fgCreateMonitorTree(lvaMonAcquired, lvaCopyThis != BAD_VAR_NUM ? lvaCopyThis : info.compThisArg, faultBB, false /*exit*/); // non-exceptional cases for (BasicBlock* const block : Blocks()) @@ -1857,8 +1860,7 @@ GenTree* Compiler::fgCreateMonitorTree(unsigned lvaMonAcquired, unsigned lvaThis { // Insert the expression "enter/exitCrit(this, &acquired)" or "enter/exitCrit(handle, &acquired)" - var_types typeMonAcquired = TYP_UBYTE; - GenTree* varNode = gtNewLclvNode(lvaMonAcquired, typeMonAcquired); + GenTree* varNode = gtNewLclvNode(lvaMonAcquired, lvaGetDesc(lvaMonAcquired)->TypeGet()); GenTree* varAddrNode = gtNewOperNode(GT_ADDR, TYP_BYREF, varNode); GenTree* tree; diff --git a/src/coreclr/jit/lclvars.cpp b/src/coreclr/jit/lclvars.cpp index a52342f86c7eb..0b3d3750b6ce8 100644 --- a/src/coreclr/jit/lclvars.cpp +++ b/src/coreclr/jit/lclvars.cpp @@ -6315,7 +6315,7 @@ void Compiler::lvaAssignVirtualFrameOffsetsToLocals() if (opts.compJitSaveFpLrWithCalleeSavedRegisters == 0) { // Default configuration - codeGen->SetSaveFpLrWithAllCalleeSavedRegisters((getNeedsGSSecurityCookie() && compLocallocUsed) || + codeGen->SetSaveFpLrWithAllCalleeSavedRegisters((getNeedsGSSecurityCookie() && compLocallocUsed) || opts.compDbgEnC || compStressCompile(STRESS_GENERIC_VARN, 20)); } else if (opts.compJitSaveFpLrWithCalleeSavedRegisters == 1) @@ -6500,12 +6500,38 @@ void Compiler::lvaAssignVirtualFrameOffsetsToLocals() } #endif // TARGET_AMD64 + if (lvaMonAcquired != BAD_VAR_NUM) + { + // For OSR we use the flag set up by the original method. + // + if (opts.IsOSR()) + { + assert(info.compPatchpointInfo->HasMonitorAcquired()); + int originalOffset = info.compPatchpointInfo->MonitorAcquiredOffset(); + int offset = originalFrameStkOffs + originalOffset; + + JITDUMP( + "---OSR--- V%02u (on tier0 frame, monitor aquired) tier0 FP-rel offset %d tier0 frame offset %d new " + "virt offset %d\n", + lvaMonAcquired, originalOffset, originalFrameStkOffs, offset); + + lvaTable[lvaMonAcquired].SetStackOffset(offset); + } + else + { + // This var must go first, in what is called the 'frame header' for EnC so that it is + // preserved when remapping occurs. See vm\eetwain.cpp for detailed comment specifying frame + // layout requirements for EnC to work. + stkOffs = lvaAllocLocalAndSetVirtualOffset(lvaMonAcquired, lvaLclSize(lvaMonAcquired), stkOffs); + } + } + #if defined(FEATURE_EH_FUNCLETS) && (defined(TARGET_ARMARCH) || defined(TARGET_LOONGARCH64)) if (lvaPSPSym != BAD_VAR_NUM) { - // On ARM/ARM64, if we need a PSPSym, allocate it first, before anything else, including - // padding (so we can avoid computing the same padding in the funclet - // frame). Note that there is no special padding requirement for the PSPSym. + // On ARM/ARM64, if we need a PSPSym we allocate it early since funclets + // will need to have it at the same caller-SP relative offset so anything + // allocated before this will also leak into the funclet's frame. noway_assert(codeGen->isFramePointerUsed()); // We need an explicit frame pointer stkOffs = lvaAllocLocalAndSetVirtualOffset(lvaPSPSym, TARGET_POINTER_SIZE, stkOffs); } @@ -6543,32 +6569,6 @@ void Compiler::lvaAssignVirtualFrameOffsetsToLocals() } } - if (lvaMonAcquired != BAD_VAR_NUM) - { - // For OSR we use the flag set up by the original method. - // - if (opts.IsOSR()) - { - assert(info.compPatchpointInfo->HasMonitorAcquired()); - int originalOffset = info.compPatchpointInfo->MonitorAcquiredOffset(); - int offset = originalFrameStkOffs + originalOffset; - - JITDUMP( - "---OSR--- V%02u (on tier0 frame, monitor aquired) tier0 FP-rel offset %d tier0 frame offset %d new " - "virt offset %d\n", - lvaMonAcquired, originalOffset, originalFrameStkOffs, offset); - - lvaTable[lvaMonAcquired].SetStackOffset(offset); - } - else - { - // This var must go first, in what is called the 'frame header' for EnC so that it is - // preserved when remapping occurs. See vm\eetwain.cpp for detailed comment specifying frame - // layout requirements for EnC to work. - stkOffs = lvaAllocLocalAndSetVirtualOffset(lvaMonAcquired, lvaLclSize(lvaMonAcquired), stkOffs); - } - } - #ifdef JIT32_GCENCODER if (lvaLocAllocSPvar != BAD_VAR_NUM) { diff --git a/src/coreclr/jit/lsra.cpp b/src/coreclr/jit/lsra.cpp index 6ed24c6dbd93e..deb97db21bc43 100644 --- a/src/coreclr/jit/lsra.cpp +++ b/src/coreclr/jit/lsra.cpp @@ -677,17 +677,16 @@ LinearScan::LinearScan(Compiler* theCompiler) availableFloatRegs = RBM_ALLFLOAT; availableDoubleRegs = RBM_ALLDOUBLE; -#ifdef TARGET_AMD64 +#if defined(TARGET_AMD64) || defined(TARGET_ARM64) if (compiler->opts.compDbgEnC) { - // On x64 when the EnC option is set, we always save exactly RBP, RSI and RDI. - // RBP is not available to the register allocator, so RSI and RDI are the only - // callee-save registers available. - availableIntRegs &= ~RBM_CALLEE_SAVED | RBM_RSI | RBM_RDI; + // When the EnC option is set we have an exact set of registers that we always save + // that are also available in future versions. + availableIntRegs &= ~RBM_CALLEE_SAVED | RBM_ENC_CALLEE_SAVED; availableFloatRegs &= ~RBM_CALLEE_SAVED; availableDoubleRegs &= ~RBM_CALLEE_SAVED; } -#endif // TARGET_AMD64 +#endif // TARGET_AMD64 || TARGET_ARM64 compiler->rpFrameType = FT_NOT_SET; compiler->rpMustCreateEBPCalled = false; diff --git a/src/coreclr/jit/targetamd64.h b/src/coreclr/jit/targetamd64.h index 1d4a3cbc9aac7..ee2868fef4238 100644 --- a/src/coreclr/jit/targetamd64.h +++ b/src/coreclr/jit/targetamd64.h @@ -228,6 +228,9 @@ #define CALLEE_SAVED_REG_MAXSZ (CNT_CALLEE_SAVED*REGSIZE_BYTES) #define CALLEE_SAVED_FLOAT_MAXSZ (CNT_CALLEE_SAVED_FLOAT*16) + // callee-preserved registers we always save and allow use of for EnC code + #define RBM_ENC_CALLEE_SAVED (RBM_RSI | RBM_RDI) + // register to hold shift amount #define REG_SHIFT REG_ECX #define RBM_SHIFT RBM_ECX diff --git a/src/coreclr/jit/targetarm64.h b/src/coreclr/jit/targetarm64.h index c1bd6f838490c..dbfd813ac090c 100644 --- a/src/coreclr/jit/targetarm64.h +++ b/src/coreclr/jit/targetarm64.h @@ -110,6 +110,12 @@ #define CALLEE_SAVED_REG_MAXSZ (CNT_CALLEE_SAVED * REGSIZE_BYTES) #define CALLEE_SAVED_FLOAT_MAXSZ (CNT_CALLEE_SAVED_FLOAT * FPSAVE_REGSIZE_BYTES) + // On ARM64 we do not use any additional callee-saves for ENC + // since there are so many volatile registers available, and + // callee saves have to be aggressively saved by ENC codegen + // because a future version could use them. + #define RBM_ENC_CALLEE_SAVED 0 + // Temporary registers used for the GS cookie check. #define REG_GSCOOKIE_TMP_0 REG_IP0 #define REG_GSCOOKIE_TMP_1 REG_IP1 diff --git a/src/coreclr/vm/eetwain.cpp b/src/coreclr/vm/eetwain.cpp index 71802141bb9dd..02cc16d46114b 100644 --- a/src/coreclr/vm/eetwain.cpp +++ b/src/coreclr/vm/eetwain.cpp @@ -915,7 +915,7 @@ HRESULT EECodeManager::FixContextForEnC(PCONTEXT pCtx, LOG((LF_ENC, LL_INFO100, "EECM::FixContextForEnC: Checks out\n")); -#elif defined(TARGET_AMD64) +#elif defined(TARGET_AMD64) || defined(TARGET_ARM64) // Strategy for zeroing out the frame on x64: // @@ -930,7 +930,7 @@ HRESULT EECodeManager::FixContextForEnC(PCONTEXT pCtx, // Local variables (if any) // --------------------------------------- // Frame header (stuff we must preserve, such as bool for synchronized - // methods, saved RBP, etc.) + // methods, saved FP, saved callee-preserved registers, etc.) // Return address (also included in frame header) // --------------------------------------- // Arguments for this frame (that's getting remapped). Will naturally be preserved @@ -952,6 +952,27 @@ HRESULT EECodeManager::FixContextForEnC(PCONTEXT pCtx, // // We'll need to restore PSPSym; location gotten from GCInfo. // We'll need to copy security object; location gotten from GCInfo. + // + // On ARM64 the JIT generates a slightly different frame and we do not have + // the invariant FP == SP, since the FP needs to point at the saved fp/lr + // pair for ETW stack walks. The frame there looks something like: + // ======================================= + // Arguments for next call (if there is one) <- SP + // JIT temporaries + // Locals + // PSPSym + // --------------------------------------- ^ zeroed area + // MonitorAcquired (for synchronized methods) + // Saved FP <- FP + // Saved LR + // --------------------------------------- ^ preserved area + // Arguments + // + // The JIT reports the size of the "preserved" area, which includes + // MonitorAcquired when it is present. It could also include other local + // values that need to be preserved across EnC transitions, but no explicit + // treatment of these is necessary here beyond preserving the values in + // this region. // GCInfo for old method GcInfoDecoder oldGcDecoder( @@ -978,16 +999,33 @@ HRESULT EECodeManager::FixContextForEnC(PCONTEXT pCtx, return CORDBG_E_ENC_INFOLESS_METHOD; } + TADDR oldStackBase = GetSP(&oldCtx); + +#if defined(TARGET_AMD64) + // Note: we cannot assert anything about the relationship between oldFixedStackSize + // and newFixedStackSize. It's possible the edited frame grows (new locals) or + // shrinks (less temporaries). + DWORD oldFixedStackSize = pOldCodeInfo->GetFixedStackSize(); + DWORD newFixedStackSize = pNewCodeInfo->GetFixedStackSize(); + + // This verifies no localallocs were used in the old method. // JIT is required to emit frame register for EnC-compliant code _ASSERTE(pOldCodeInfo->HasFrameRegister()); _ASSERTE(pNewCodeInfo->HasFrameRegister()); - TADDR oldStackBase = GetSP(&oldCtx); + // x64: SP == FP before localloc + if (oldStackBase != GetFP(&oldCtx)) + return E_FAIL; +#elif defined(TARGET_ARM64) + DWORD oldFixedStackSize = oldGcDecoder.GetSizeOfEditAndContinueFixedStackFrame(); + DWORD newFixedStackSize = newGcDecoder.GetSizeOfEditAndContinueFixedStackFrame(); - // This verifies no localallocs were used in the old method. (RBP == RSP for - // EnC-compliant x64 code.) - if (oldStackBase != oldCtx.Rbp) + // ARM64: FP + 16 == SP + oldFixedStackSize before localloc + if (GetFP(&oldCtx) + 16 != oldStackBase + oldFixedStackSize) return E_FAIL; +#else + PORTABILITY_ASSERT("Edit-and-continue not enabled on this platform."); +#endif // EnC remap inside handlers is not supported if (pOldCodeInfo->IsFunclet() || pNewCodeInfo->IsFunclet()) @@ -999,13 +1037,6 @@ HRESULT EECodeManager::FixContextForEnC(PCONTEXT pCtx, return E_FAIL; } - // Note: we cannot assert anything about the relationship between oldFixedStackSize - // and newFixedStackSize. It's possible the edited frame grows (new locals) or - // shrinks (less temporaries). - - DWORD oldFixedStackSize = pOldCodeInfo->GetFixedStackSize(); - DWORD newFixedStackSize = pNewCodeInfo->GetFixedStackSize(); - TADDR callerSP = oldStackBase + oldFixedStackSize; // If the old code saved a security object, store the object's reference now. @@ -1017,15 +1048,17 @@ HRESULT EECodeManager::FixContextForEnC(PCONTEXT pCtx, } #ifdef _DEBUG - // If the old method has a PSPSym, then its value should == FP + // If the old method has a PSPSym, then its value should == FP for x64 and callerSP for arm64 INT32 nOldPspSymStackSlot = oldGcDecoder.GetPSPSymStackSlot(); if (nOldPspSymStackSlot != NO_PSP_SYM) { - // Read the PSP. +#if defined(TARGET_AMD64) TADDR oldPSP = *PTR_TADDR(oldStackBase + nOldPspSymStackSlot); - - // Now we're set up to assert that PSPSym's value == FP _ASSERTE(oldPSP == GetFP(&oldCtx)); +#elif defined(TARGET_ARM64) + TADDR oldPSP = *PTR_TADDR(callerSP + nOldPspSymStackSlot); + _ASSERTE(oldPSP == callerSP); +#endif } #endif // _DEBUG @@ -1238,13 +1271,6 @@ HRESULT EECodeManager::FixContextForEnC(PCONTEXT pCtx, pCtx->Xmm4.High = pCtx->Xmm4.Low = 0; pCtx->Xmm5.High = pCtx->Xmm5.Low = 0; - // Any saved nonvolatile registers should also be zeroed out, but there are none - // in EnC-compliant x64 code. Yes, you read that right. Registers like RDI, RSI, - // RBX, etc., which are often saved in the prolog of non-EnC code are NOT saved in - // EnC code. EnC code instead just agrees never to use those registers so they - // remain pristine for the caller (except RBP, which is considered part of the frame - // header, and is thus not zeroed out by us). - // 3) zero out the stack frame - this'll initialize _all_ variables /*------------------------------------------------------------------------- @@ -1264,9 +1290,31 @@ HRESULT EECodeManager::FixContextForEnC(PCONTEXT pCtx, _ASSERTE(frameHeaderSize <= oldFixedStackSize); _ASSERTE(frameHeaderSize <= newFixedStackSize); - // For EnC-compliant x64 code, Rbp == Rsp. Since Rsp changed above, update Rbp now + // For EnC-compliant x64 code, FP == SP. Since SP changed above, update FP now pCtx->Rbp = newStackBase; -#else // !X86, !AMD64 + +#elif defined(TARGET_ARM64) + // Zero out volatile part of stack frame + // x0-x17 + memset(&pCtx->X[0], 0, sizeof(pCtx->X[0]) * 18); + // v0-v7 + memset(&pCtx->V[0], 0, sizeof(pCtx->V[0]) * 8); + // v16-v31 + memset(&pCtx->V[16], 0, sizeof(pCtx->V[0]) * 16); + + TADDR newStackBase = callerSP - newFixedStackSize; + + SetSP(pCtx, newStackBase); + + size_t frameHeaderSize = newSizeOfPreservedArea; + _ASSERTE(frameHeaderSize <= oldFixedStackSize); + _ASSERTE(frameHeaderSize <= newFixedStackSize); + + // EnC prolog saves only fp,lr and does it at sp-16. It should already + // be set up from previous version. + _ASSERTE(GetFP(pCtx) == callerSP - 16); + +#else // !X86 PORTABILITY_ASSERT("Edit-and-continue not enabled on this platform."); #endif @@ -1319,7 +1367,7 @@ HRESULT EECodeManager::FixContextForEnC(PCONTEXT pCtx, // and so (since the stack grows towards 0) can't easily determine where the end of // the local lies. } -#elif defined (TARGET_AMD64) +#elif defined(TARGET_AMD64) || defined(TARGET_ARM64) switch(newMethodVarsSortedBase[i].loc.vlType) { default: @@ -1365,7 +1413,7 @@ HRESULT EECodeManager::FixContextForEnC(PCONTEXT pCtx, ((TADDR) pVarStackLocation >= newStackBase + newFixedStackSize)); break; } -#else // !X86, !X64 +#else // !X86, !X64, !ARM64 PORTABILITY_ASSERT("Edit-and-continue not enabled on this platform."); #endif } @@ -1374,12 +1422,12 @@ HRESULT EECodeManager::FixContextForEnC(PCONTEXT pCtx, // Clear the local and temporary stack space -#if defined (TARGET_X86) +#if defined(TARGET_X86) memset((void*)(size_t)(pCtx->Esp), 0, newInfo.stackSize - frameHeaderSize ); -#elif defined (TARGET_AMD64) +#elif defined(TARGET_AMD64) || defined(TARGET_ARM64) memset((void*)newStackBase, 0, newFixedStackSize - frameHeaderSize); - // On AMD64, after zeroing out the stack, restore the security object and PSPSym... + // On AMD64/ARM64, after zeroing out the stack, restore the security object and PSPSym... // There is no relationship we can guarantee between the old code having a security // object and the new code having a security object. If the new code does have a @@ -1397,9 +1445,15 @@ HRESULT EECodeManager::FixContextForEnC(PCONTEXT pCtx, INT32 nNewPspSymStackSlot = newGcDecoder.GetPSPSymStackSlot(); if (nNewPspSymStackSlot != NO_PSP_SYM) { +#if defined(TARGET_AMD64) *PTR_TADDR(newStackBase + nNewPspSymStackSlot) = GetFP(pCtx); +#elif defined(TARGET_ARM64) + *PTR_TADDR(callerSP + nNewPspSymStackSlot) = callerSP; +#else + PORTABILITY_ASSERT("Edit-and-continue not enabled on this platform."); +#endif } -#else // !X86, !X64 +#else // !X86, !X64, !ARM64 PORTABILITY_ASSERT("Edit-and-continue not enabled on this platform."); #endif diff --git a/src/coreclr/vm/encee.cpp b/src/coreclr/vm/encee.cpp index 8911f976bf47b..f704fce36e891 100644 --- a/src/coreclr/vm/encee.cpp +++ b/src/coreclr/vm/encee.cpp @@ -609,7 +609,7 @@ HRESULT EditAndContinueModule::ResumeInUpdatedFunction( SIZE_T newILOffset, CONTEXT *pOrigContext) { -#if defined(TARGET_ARM64) || defined(TARGET_ARM) || defined(TARGET_LOONGARCH64) +#if defined(TARGET_ARM) || defined(TARGET_LOONGARCH64) return E_NOTIMPL; #else LOG((LF_ENC, LL_INFO100, "EnCModule::ResumeInUpdatedFunction for %s at IL offset 0x%x, ", @@ -653,8 +653,27 @@ HRESULT EditAndContinueModule::ResumeInUpdatedFunction( _ASSERTE(oldCodeInfo.GetCodeManager() == newCodeInfo.GetCodeManager()); +#ifdef TARGET_ARM64 + // GCInfo for old method + GcInfoDecoder oldGcDecoder( + oldCodeInfo.GetGCInfoToken(), + GcInfoDecoderFlags(DECODE_EDIT_AND_CONTINUE), + 0 // Instruction offset (not needed) + ); + + // GCInfo for new method + GcInfoDecoder newGcDecoder( + newCodeInfo.GetGCInfoToken(), + GcInfoDecoderFlags(DECODE_EDIT_AND_CONTINUE), + 0 // Instruction offset (not needed) + ); + + DWORD oldFrameSize = oldGcDecoder.GetSizeOfEditAndContinueFixedStackFrame(); + DWORD newFrameSize = newGcDecoder.GetSizeOfEditAndContinueFixedStackFrame(); +#else DWORD oldFrameSize = oldCodeInfo.GetFixedStackSize(); DWORD newFrameSize = newCodeInfo.GetFixedStackSize(); +#endif // FixContextAndResume() will replace the old stack frame of the function with the new // one and will initialize that new frame to null. Anything on the stack where that new @@ -694,7 +713,7 @@ HRESULT EditAndContinueModule::ResumeInUpdatedFunction( // Win32 handlers on the stack so cannot ever return from this function. EEPOLICY_HANDLE_FATAL_ERROR(CORDBG_E_ENC_INTERNAL_ERROR); return hr; -#endif // #if define(TARGET_ARM64) || defined(TARGET_ARM) +#endif // #if defined(TARGET_ARM) } //--------------------------------------------------------------------------------------- @@ -738,7 +757,7 @@ NOINLINE void EditAndContinueModule::FixContextAndResume( memcpy(&context, pContext, sizeof(CONTEXT)); pContext = &context; -#if defined(TARGET_AMD64) +#if defined(TARGET_AMD64) || defined(TARGET_ARM64) // Since we made a copy of the incoming CONTEXT in context, clear any new flags we // don't understand (like XSAVE), since we'll eventually be passing a CONTEXT based // on this copy to ClrRestoreNonvolatileContext, and this copy doesn't have the extra info diff --git a/src/coreclr/vm/gcinfodecoder.cpp b/src/coreclr/vm/gcinfodecoder.cpp index 9018c66b11322..7370d055f9e84 100644 --- a/src/coreclr/vm/gcinfodecoder.cpp +++ b/src/coreclr/vm/gcinfodecoder.cpp @@ -135,7 +135,7 @@ GcInfoDecoder::GcInfoDecoder( #elif defined(TARGET_ARM) || defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64) m_HasTailCalls = ((headerFlags & GC_INFO_HAS_TAILCALLS) != 0); #endif // TARGET_AMD64 - int hasSizeOfEditAndContinuePreservedArea = headerFlags & GC_INFO_HAS_EDIT_AND_CONTINUE_PRESERVED_SLOTS; + int hasEncInfo = headerFlags & GC_INFO_HAS_EDIT_AND_CONTINUE_INFO; int hasReversePInvokeFrame = headerFlags & GC_INFO_REVERSE_PINVOKE_FRAME; int returnKindBits = (slimHeader) ? SIZE_OF_RETURN_KIND_IN_SLIM_HEADER : SIZE_OF_RETURN_KIND_IN_FAT_HEADER; @@ -263,13 +263,19 @@ GcInfoDecoder::GcInfoDecoder( m_StackBaseRegister = NO_STACK_BASE_REGISTER; } - if (hasSizeOfEditAndContinuePreservedArea) + if (hasEncInfo) { m_SizeOfEditAndContinuePreservedArea = (UINT32) m_Reader.DecodeVarLengthUnsigned(SIZE_OF_EDIT_AND_CONTINUE_PRESERVED_AREA_ENCBASE); +#ifdef TARGET_ARM64 + m_SizeOfEditAndContinueFixedStackFrame = (UINT32) m_Reader.DecodeVarLengthUnsigned(SIZE_OF_EDIT_AND_CONTINUE_FIXED_STACK_FRAME_ENCBASE); +#endif } else { m_SizeOfEditAndContinuePreservedArea = NO_SIZE_OF_EDIT_AND_CONTINUE_PRESERVED_AREA; +#ifdef TARGET_ARM64 + m_SizeOfEditAndContinueFixedStackFrame = 0; +#endif } if (hasReversePInvokeFrame) @@ -565,6 +571,14 @@ UINT32 GcInfoDecoder::GetSizeOfEditAndContinuePreservedArea() return m_SizeOfEditAndContinuePreservedArea; } +#ifdef TARGET_ARM64 +UINT32 GcInfoDecoder::GetSizeOfEditAndContinueFixedStackFrame() +{ + _ASSERTE( m_Flags & DECODE_EDIT_AND_CONTINUE ); + return m_SizeOfEditAndContinueFixedStackFrame; +} +#endif + size_t GcInfoDecoder::GetNumBytesRead() { return (m_Reader.GetCurrentPos() + 7) / 8; diff --git a/src/coreclr/vm/jitinterface.cpp b/src/coreclr/vm/jitinterface.cpp index cb4f697d8ba99..804ee2e274fe3 100644 --- a/src/coreclr/vm/jitinterface.cpp +++ b/src/coreclr/vm/jitinterface.cpp @@ -14547,8 +14547,8 @@ ULONG EECodeInfo::GetFixedStackSize() #define kRBP 5 // The information returned by this method is only valid if we are not in a prolog or an epilog. -// Since this method is only used for the security stackwalk cache, this assumption is valid, since -// we cannot make a call in a prolog or an epilog. +// Since this method is only used for the security stackwalk cache and EnC transition, this assumption is +// valid, since we cannot see these in a prolog or an epilog. // // The next assumption is that only rbp is used as a frame register in jitted code. There is an // assert below to guard this assumption. From 501531f5f560306c81d3c0641f187a05c0a224cb Mon Sep 17 00:00:00 2001 From: Jakob Botsch Nielsen Date: Mon, 23 May 2022 19:25:45 +0200 Subject: [PATCH 2/4] Run jit-format --- src/coreclr/jit/codegenarm.cpp | 9 ++++----- src/coreclr/jit/codegenarmarch.cpp | 6 ++++-- src/coreclr/jit/codegenxarch.cpp | 3 ++- src/coreclr/jit/flowgraph.cpp | 9 +++++---- src/coreclr/jit/lclvars.cpp | 4 ++-- 5 files changed, 17 insertions(+), 14 deletions(-) diff --git a/src/coreclr/jit/codegenarm.cpp b/src/coreclr/jit/codegenarm.cpp index cacfa8af3f4c9..f3674d33660a3 100644 --- a/src/coreclr/jit/codegenarm.cpp +++ b/src/coreclr/jit/codegenarm.cpp @@ -2454,17 +2454,16 @@ void CodeGen::genCaptureFuncletPrologEpilogInfo() unsigned preSpillRegArgSize = genCountBits(regSet.rsMaskPreSpillRegs(true)) * REGSIZE_BYTES; genFuncletInfo.fiFunctionCallerSPtoFPdelta = preSpillRegArgSize + 2 * REGSIZE_BYTES; - regMaskTP rsMaskSaveRegs = regSet.rsMaskCalleeSaved; - unsigned saveRegsCount = genCountBits(rsMaskSaveRegs); - unsigned saveRegsSize = saveRegsCount * REGSIZE_BYTES; // bytes of regs we're saving + regMaskTP rsMaskSaveRegs = regSet.rsMaskCalleeSaved; + unsigned saveRegsCount = genCountBits(rsMaskSaveRegs); + unsigned saveRegsSize = saveRegsCount * REGSIZE_BYTES; // bytes of regs we're saving unsigned saveSizeWithPSP = saveRegsSize + REGSIZE_BYTES /* PSP sym */; if (compiler->lvaMonAcquired != BAD_VAR_NUM) { saveSizeWithPSP += TARGET_POINTER_SIZE; } assert(compiler->lvaOutgoingArgSpaceSize % REGSIZE_BYTES == 0); - unsigned funcletFrameSize = - preSpillRegArgSize + saveSizeWithPSP + compiler->lvaOutgoingArgSpaceSize; + unsigned funcletFrameSize = preSpillRegArgSize + saveSizeWithPSP + compiler->lvaOutgoingArgSpaceSize; unsigned funcletFrameSizeAligned = roundUp(funcletFrameSize, STACK_ALIGN); unsigned funcletFrameAlignmentPad = funcletFrameSizeAligned - funcletFrameSize; diff --git a/src/coreclr/jit/codegenarmarch.cpp b/src/coreclr/jit/codegenarmarch.cpp index 481e3f2d79b19..6a2ea012d1966 100644 --- a/src/coreclr/jit/codegenarmarch.cpp +++ b/src/coreclr/jit/codegenarmarch.cpp @@ -4174,7 +4174,8 @@ void CodeGen::genCreateAndStoreGCInfo(unsigned codeSize, if (compiler->info.compFlags & CORINFO_FLG_SYNCH) { - // bool in synchronized methods that tracks whether the lock has been taken (takes a full pointer sized slot) + // bool in synchronized methods that tracks whether the lock has been taken (takes a full pointer sized + // slot) preservedAreaSize += TARGET_POINTER_SIZE; // Verify that MonAcquired bool is at the bottom of the frame header @@ -5134,7 +5135,8 @@ void CodeGen::genPushCalleeSavedRegisters() JITDUMP(" remainingFrameSz=%d\n", remainingFrameSz); - // We've already established the frame pointer, so no need to report the stack pointer change to unwind info. + // We've already established the frame pointer, so no need to report the stack pointer change to unwind + // info. genStackPointerAdjustment(-remainingFrameSz, initReg, pInitRegZeroed, /* reportUnwindData */ false); offset += remainingFrameSz; } diff --git a/src/coreclr/jit/codegenxarch.cpp b/src/coreclr/jit/codegenxarch.cpp index ea58abb15433b..bab2c05407b74 100644 --- a/src/coreclr/jit/codegenxarch.cpp +++ b/src/coreclr/jit/codegenxarch.cpp @@ -8444,7 +8444,8 @@ void CodeGen::genCreateAndStoreGCInfoX64(unsigned codeSize, unsigned prologSize if (compiler->info.compFlags & CORINFO_FLG_SYNCH) { - // bool in synchronized methods that tracks whether the lock has been taken (takes a full pointer sized slot) + // bool in synchronized methods that tracks whether the lock has been taken (takes a full pointer sized + // slot) preservedAreaSize += TARGET_POINTER_SIZE; // Verify that MonAcquired bool is at the bottom of the frame header diff --git a/src/coreclr/jit/flowgraph.cpp b/src/coreclr/jit/flowgraph.cpp index 09293af0109ca..be024611bf3bf 100644 --- a/src/coreclr/jit/flowgraph.cpp +++ b/src/coreclr/jit/flowgraph.cpp @@ -1836,7 +1836,8 @@ void Compiler::fgAddSyncMethodEnterExit() } // exceptional case - fgCreateMonitorTree(lvaMonAcquired, lvaCopyThis != BAD_VAR_NUM ? lvaCopyThis : info.compThisArg, faultBB, false /*exit*/); + fgCreateMonitorTree(lvaMonAcquired, lvaCopyThis != BAD_VAR_NUM ? lvaCopyThis : info.compThisArg, faultBB, + false /*exit*/); // non-exceptional cases for (BasicBlock* const block : Blocks()) @@ -1860,9 +1861,9 @@ GenTree* Compiler::fgCreateMonitorTree(unsigned lvaMonAcquired, unsigned lvaThis { // Insert the expression "enter/exitCrit(this, &acquired)" or "enter/exitCrit(handle, &acquired)" - GenTree* varNode = gtNewLclvNode(lvaMonAcquired, lvaGetDesc(lvaMonAcquired)->TypeGet()); - GenTree* varAddrNode = gtNewOperNode(GT_ADDR, TYP_BYREF, varNode); - GenTree* tree; + GenTree* varNode = gtNewLclvNode(lvaMonAcquired, lvaGetDesc(lvaMonAcquired)->TypeGet()); + GenTree* varAddrNode = gtNewOperNode(GT_ADDR, TYP_BYREF, varNode); + GenTree* tree; if (info.compIsStatic) { diff --git a/src/coreclr/jit/lclvars.cpp b/src/coreclr/jit/lclvars.cpp index 0b3d3750b6ce8..85b24c6373e73 100644 --- a/src/coreclr/jit/lclvars.cpp +++ b/src/coreclr/jit/lclvars.cpp @@ -6315,8 +6315,8 @@ void Compiler::lvaAssignVirtualFrameOffsetsToLocals() if (opts.compJitSaveFpLrWithCalleeSavedRegisters == 0) { // Default configuration - codeGen->SetSaveFpLrWithAllCalleeSavedRegisters((getNeedsGSSecurityCookie() && compLocallocUsed) || opts.compDbgEnC || - compStressCompile(STRESS_GENERIC_VARN, 20)); + codeGen->SetSaveFpLrWithAllCalleeSavedRegisters((getNeedsGSSecurityCookie() && compLocallocUsed) || + opts.compDbgEnC || compStressCompile(STRESS_GENERIC_VARN, 20)); } else if (opts.compJitSaveFpLrWithCalleeSavedRegisters == 1) { From eaf7c6948b474f5c166dce2a6725401ff666893b Mon Sep 17 00:00:00 2001 From: Jakob Botsch Nielsen Date: Mon, 23 May 2022 22:02:14 +0200 Subject: [PATCH 3/4] Feedback --- src/coreclr/vm/encee.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/coreclr/vm/encee.cpp b/src/coreclr/vm/encee.cpp index f704fce36e891..301101d7a1f51 100644 --- a/src/coreclr/vm/encee.cpp +++ b/src/coreclr/vm/encee.cpp @@ -713,7 +713,7 @@ HRESULT EditAndContinueModule::ResumeInUpdatedFunction( // Win32 handlers on the stack so cannot ever return from this function. EEPOLICY_HANDLE_FATAL_ERROR(CORDBG_E_ENC_INTERNAL_ERROR); return hr; -#endif // #if defined(TARGET_ARM) +#endif // #if defined(TARGET_ARM) || defined(TARGET_LOONGARCH64) } //--------------------------------------------------------------------------------------- From fc628b58675f62918eea5c78e6fbf61ae9b885f5 Mon Sep 17 00:00:00 2001 From: Jakob Botsch Nielsen Date: Mon, 23 May 2022 22:02:20 +0200 Subject: [PATCH 4/4] Fix funclets for OSR methods with monitor-acquired bool OSR methods reuse the bool created by the tier 0 frame so we should not allocate space for it. --- src/coreclr/jit/codegenarm64.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/coreclr/jit/codegenarm64.cpp b/src/coreclr/jit/codegenarm64.cpp index f23c733e48cdd..d03ac91781f5f 100644 --- a/src/coreclr/jit/codegenarm64.cpp +++ b/src/coreclr/jit/codegenarm64.cpp @@ -1692,10 +1692,11 @@ void CodeGen::genCaptureFuncletPrologEpilogInfo() saveRegsPlusPSPSize += MAX_REG_ARG * REGSIZE_BYTES; } - if (compiler->lvaMonAcquired != BAD_VAR_NUM) + if (compiler->lvaMonAcquired != BAD_VAR_NUM && !compiler->opts.IsOSR()) { // We furthermore allocate the "monitor acquired" bool between PSP and // the saved registers because this is part of the EnC header. + // Note that OSR methods reuse the monitor bool created by tier 0. saveRegsPlusPSPSize += compiler->lvaLclSize(compiler->lvaMonAcquired); }