Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arm64 support for EnC #69679

Merged
merged 4 commits into from
May 23, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 13 additions & 8 deletions docs/design/coreclr/botr/clr-abi.md
Original file line number Diff line number Diff line change
Expand Up @@ -653,41 +653,46 @@ Edit and Continue (EnC) is a special flavor of un-optimized code. The debugger h

In the current design, the JIT does not have access to the previous versions of the method and so it has to assume the worst case. EnC is designed for simplicity, not for performance of the generated code.

EnC is currently enabled on x86 and x64 only, but the same principles would apply if it is ever enabled on other platforms.
EnC is currently enabled on x86, x64 and ARM64 only, but the same principles would apply if it is ever enabled on other platforms.

The following sections describe the various Edit and Continue code conventions that must be followed.

## EnC flag in GCInfo

The JIT records the fact that it has followed conventions for EnC code in GC Info. On x64, this flag is implied by recording the size of the stack frame region preserved between EnC edits (`GcInfoEncoder::SetSizeOfEditAndContinuePreservedArea`). For normal methods on JIT64, the size of this region is 2 slots (saved `RBP` and return address). On RyuJIT/AMD64, the size of this region is increased to include `RSI` and `RDI`, so that `rep stos` can be used for block initialization and block moves.
The JIT records the fact that it has followed conventions for EnC code in GC Info. On x64/ARM64, this flag is implied by recording the size of the stack frame region preserved between EnC edits (`GcInfoEncoder::SetSizeOfEditAndContinuePreservedArea`). For x64 the size of this region is increased to include `RSI` and `RDI`, so that `rep stos` can be used for block initialization and block moves. ARM64 saves only the FP and LR registers when EnC is enabled and does not use other callee saved registers.

To successfully perform EnC transitions the runtime needs to know the size of the stack frames it is transitioning between. For x64 code the size can be extracted from unwind codes. This is not possible for arm64 code as the frames are set up in such a way that the unwind codes do not allow to retrieve this value. Therefore, on ARM64 the GC info contains also the size of the fixed stack frame to be used for EnC purposes.

## Allocating local variables backward

This is required to preserve addresses of the existing locals when an EnC edit appends new ones. In other words, the first local must be allocated at the highest stack address. Special care has to be taken to deal with alignment. The total size of the method frame can either grow (more locals added) or shrink (fewer temps needed) after the edit. The VM zeros out newly added locals.

## Fixed set of callee-saved registers

This eliminates need to deal with the different sets in the VM, and makes preservation of local addresses easier. On x64, we choose to always save `RBP` only. There are plenty of volatile registers and so lack of non-volatile registers does not impact quality of non-optimized code.
This eliminates need to deal with the different sets in the VM, and makes preservation of local addresses easier. There are plenty of volatile registers and so lack of non-volatile registers does not heavily impact quality of non-optimized code.
x64 currently saves RBP, RSI and RDI while ARM64 saves just FP and LR.

## EnC is supported for methods with EH

However, EnC remap is not supported inside funclets. The stack layout of funclets does not matter for EnC.

## Initial RSP == RBP == PSPSym
## Considerations with regards to PSPSym

This invariant allows VM to compute new value of `RBP` and PSPSym after the edit without any additional information. Location of PSPSym is found via GC info.
As explained previously in this document, on x64 we have Initial RSP == PSPSym. For EnC methods, as we disallow remappings after localloc (see below), we furthermore have RBP == PSPSym.
For ARM64 we have Caller SP == PSPSym and the FP points to the previously saved FP/LR pair. For EnC the JIT always sets up the stack frame so that the FP/LR pair is at Caller SP - 16 and does not save any additional callee saves.
These invariants allow the VM to compute new value of the frame pointer and PSPSym after the edit without any additional information. Note that the frame pointer and PSPSym do not change values or location on ARM64. However, EH may be added to a function in which case a new PSPSym needs to be materialized, even on ARM64. Location of PSPSym is found via GC info.

## Localloc

Localloc is allowed in EnC code, but remap is disallowed after the method has executed a localloc instruction. VM uses the invariant above (`RSP == RBP`) to detect whether localloc was executed by the method.
Localloc is allowed in EnC code, but remap is disallowed after the method has executed a localloc instruction. VM uses the invariants above (`RSP == RBP` on x64, `FP + 16 == SP + stack size` on ARM64) to detect whether localloc was executed by the method.

## Security object

This does not require any special handling by the JIT on x64. (Different from x86). The security object is copied over by the VM during remap if necessary. Location of security object is found via GC info.
This does not require any special handling by the JIT on x64/arm64. (Different from x86). The security object is copied over by the VM during remap if necessary. Location of security object is found via GC info.

## Synchronized methods

The extra state created by the JIT for synchronized methods (original `this` and lock taken flag) must be preserved during remap. The JIT stores this state in the preserved region, and increases the size of the preserved region reported in GC info accordingly.
The extra state created by the JIT for synchronized methods (lock taken flag) must be preserved during remap. The JIT stores this state in the preserved region, and increases the size of the preserved region reported in GC info accordingly.

## Generics

Expand Down
4 changes: 2 additions & 2 deletions src/coreclr/clrdefinitions.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,11 @@ endif(CLR_CMAKE_HOST_WIN32)
if (NOT (CLR_CMAKE_TARGET_ARCH_I386 AND CLR_CMAKE_TARGET_UNIX))
add_compile_definitions(EnC_SUPPORTED)
endif()
if(CLR_CMAKE_TARGET_ARCH_AMD64 OR (CLR_CMAKE_TARGET_ARCH_I386 AND CLR_CMAKE_TARGET_WIN32))
if(CLR_CMAKE_TARGET_ARCH_AMD64 OR CLR_CMAKE_TARGET_ARCH_I386 OR CLR_CMAKE_TARGET_ARCH_ARM64)
if(CLR_CMAKE_TARGET_WIN32)
add_compile_definitions(FEATURE_ENC_SUPPORTED)
endif(CLR_CMAKE_TARGET_WIN32)
endif(CLR_CMAKE_TARGET_ARCH_AMD64 OR (CLR_CMAKE_TARGET_ARCH_I386 AND CLR_CMAKE_TARGET_WIN32))
endif(CLR_CMAKE_TARGET_ARCH_AMD64 OR CLR_CMAKE_TARGET_ARCH_I386 OR CLR_CMAKE_TARGET_ARCH_ARM64)

# Features - please keep them alphabetically sorted
if(CLR_CMAKE_TARGET_WIN32)
Expand Down
11 changes: 9 additions & 2 deletions src/coreclr/debug/ee/arm64/arm64walker.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -127,15 +127,22 @@ BYTE* NativeWalker::SetupOrSimulateInstructionForPatchSkip(T_CONTEXT * context,
offset = offset | 0xFFFFFFFFFFE00000;
}

PCODE finalAddr;
if ((opcode & 0x80000000) != 0) //ADRP
{
offset = offset << 12;
LOG((LF_CORDB, LL_INFO100000, "Arm64Walker::Simulate opcode: %x to ADRP X%d %p\n", opcode, RegNum, offset));
finalAddr = (PCODE)(address + offset) & ~0xFFF;
LOG((LF_CORDB, LL_INFO100000, "Arm64Walker::Simulate opcode: %x to ADRP X%d %p finalAddr = %p\n", opcode, RegNum, offset, finalAddr));
}
else
{
LOG((LF_CORDB, LL_INFO100000, "Arm64Walker::Simulate opcode: %x to ADR X%d %p\n", opcode, RegNum, offset));
finalAddr = (PCODE)(address + offset);
LOG((LF_CORDB, LL_INFO100000, "Arm64Walker::Simulate opcode: %x to ADR X%d %p finalAddr = %p\n", opcode, RegNum, offset, finalAddr));
}

CORDbgSetInstruction((CORDB_ADDRESS_TYPE *)patchBypass, 0xd503201f); //Add Nop in buffer
SetReg(context, RegNum, finalAddr);
return patchBypass;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am including a bugfix to the stepper. On arm64 we would not handle adr/adrp correctly which causes issues when running diagnostics tests.


else if ((opcode & 0x3B000000) == 0x18000000) //LDR Literal (General or SIMD)
Expand Down
19 changes: 16 additions & 3 deletions src/coreclr/gcinfo/gcinfoencoder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -344,7 +344,7 @@ GcInfoSize& GcInfoSize::operator+=(const GcInfoSize& other)
NumUntracked += other.NumUntracked;
NumTransitions += other.NumTransitions;
SizeOfCode += other.SizeOfCode;
EncPreservedSlots += other.EncPreservedSlots;
EncInfoSize += other.EncInfoSize;

UntrackedSlotSize += other.UntrackedSlotSize;
NumUntrackedSize += other.NumUntrackedSize;
Expand Down Expand Up @@ -392,7 +392,7 @@ void GcInfoSize::Log(DWORD level, const char * header)
LogSpew(LF_GCINFO, level, "NumUntracked: %Iu\n", NumUntracked);
LogSpew(LF_GCINFO, level, "NumTransitions: %Iu\n", NumTransitions);
LogSpew(LF_GCINFO, level, "SizeOfCode: %Iu\n", SizeOfCode);
LogSpew(LF_GCINFO, level, "EncPreservedSlots: %Iu\n", EncPreservedSlots);
LogSpew(LF_GCINFO, level, "EncInfoSize: %Iu\n", EncInfoSize);

LogSpew(LF_GCINFO, level, "---SIZES(bits)---\n");
LogSpew(LF_GCINFO, level, "Total: %Iu\n", TotalSize);
Expand Down Expand Up @@ -484,6 +484,9 @@ GcInfoEncoder::GcInfoEncoder(

m_StackBaseRegister = NO_STACK_BASE_REGISTER;
m_SizeOfEditAndContinuePreservedArea = NO_SIZE_OF_EDIT_AND_CONTINUE_PRESERVED_AREA;
#ifdef TARGET_ARM64
m_SizeOfEditAndContinueFixedStackFrame = 0;
#endif
m_ReversePInvokeFrameSlot = NO_REVERSE_PINVOKE_FRAME;
#ifdef TARGET_AMD64
m_WantsReportOnlyLeaf = false;
Expand Down Expand Up @@ -760,6 +763,13 @@ void GcInfoEncoder::SetSizeOfEditAndContinuePreservedArea( UINT32 slots )
m_SizeOfEditAndContinuePreservedArea = slots;
}

#ifdef TARGET_ARM64
void GcInfoEncoder::SetSizeOfEditAndContinueFixedStackFrame( UINT32 size )
{
m_SizeOfEditAndContinueFixedStackFrame = size;
}
#endif

#ifdef TARGET_AMD64
void GcInfoEncoder::SetWantsReportOnlyLeaf()
{
Expand Down Expand Up @@ -1157,7 +1167,10 @@ void GcInfoEncoder::Build()

if (m_SizeOfEditAndContinuePreservedArea != NO_SIZE_OF_EDIT_AND_CONTINUE_PRESERVED_AREA)
{
GCINFO_WRITE_VARL_U(m_Info1, m_SizeOfEditAndContinuePreservedArea, SIZE_OF_EDIT_AND_CONTINUE_PRESERVED_AREA_ENCBASE, EncPreservedSlots);
GCINFO_WRITE_VARL_U(m_Info1, m_SizeOfEditAndContinuePreservedArea, SIZE_OF_EDIT_AND_CONTINUE_PRESERVED_AREA_ENCBASE, EncInfoSize);
#ifdef TARGET_ARM64
GCINFO_WRITE_VARL_U(m_Info1, m_SizeOfEditAndContinueFixedStackFrame, SIZE_OF_EDIT_AND_CONTINUE_FIXED_STACK_FRAME_ENCBASE, EncInfoSize);
#endif
}

if (hasReversePInvokeFrame)
Expand Down
8 changes: 7 additions & 1 deletion src/coreclr/inc/gcinfodecoder.h
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ enum GcInfoHeaderFlags
#elif defined(TARGET_ARM) || defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64)
GC_INFO_HAS_TAILCALLS = 0x80,
#endif // TARGET_AMD64
GC_INFO_HAS_EDIT_AND_CONTINUE_PRESERVED_SLOTS = 0x100,
GC_INFO_HAS_EDIT_AND_CONTINUE_INFO = 0x100,
GC_INFO_REVERSE_PINVOKE_FRAME = 0x200,

GC_INFO_FLAGS_BIT_SIZE_VERSION_1 = 9,
Expand Down Expand Up @@ -547,6 +547,9 @@ class GcInfoDecoder
UINT32 GetCodeLength();
UINT32 GetStackBaseRegister();
UINT32 GetSizeOfEditAndContinuePreservedArea();
#ifdef TARGET_ARM64
UINT32 GetSizeOfEditAndContinueFixedStackFrame();
#endif
size_t GetNumBytesRead();

#ifdef FIXED_STACK_PARAMETER_SCRATCH_AREA
Expand Down Expand Up @@ -578,6 +581,9 @@ class GcInfoDecoder
UINT32 m_CodeLength;
UINT32 m_StackBaseRegister;
UINT32 m_SizeOfEditAndContinuePreservedArea;
#ifdef TARGET_ARM64
UINT32 m_SizeOfEditAndContinueFixedStackFrame;
#endif
ReturnKind m_ReturnKind;
#ifdef PARTIALLY_INTERRUPTIBLE_GC_SUPPORTED
UINT32 m_NumSafePoints;
Expand Down
8 changes: 7 additions & 1 deletion src/coreclr/inc/gcinfoencoder.h
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ struct GcInfoSize
size_t NumUntracked;
size_t NumTransitions;
size_t SizeOfCode;
size_t EncPreservedSlots;
size_t EncInfoSize;

size_t UntrackedSlotSize;
size_t NumUntrackedSize;
Expand Down Expand Up @@ -434,6 +434,9 @@ class GcInfoEncoder

// Number of slots preserved during EnC remap
void SetSizeOfEditAndContinuePreservedArea( UINT32 size );
#ifdef TARGET_ARM64
void SetSizeOfEditAndContinueFixedStackFrame( UINT32 size );
#endif

#ifdef TARGET_AMD64
// Used to only report a frame once for the leaf function/funclet
Expand Down Expand Up @@ -510,6 +513,9 @@ class GcInfoEncoder
UINT32 m_CodeLength;
UINT32 m_StackBaseRegister;
UINT32 m_SizeOfEditAndContinuePreservedArea;
#ifdef TARGET_ARM64
UINT32 m_SizeOfEditAndContinueFixedStackFrame;
#endif
INT32 m_ReversePInvokeFrameSlot;
InterruptibleRange* m_pLastInterruptibleRange;

Expand Down
1 change: 1 addition & 0 deletions src/coreclr/inc/gcinfotypes.h
Original file line number Diff line number Diff line change
Expand Up @@ -753,6 +753,7 @@ void FASTCALL decodeCallPattern(int pattern,
#define STACK_BASE_REGISTER_ENCBASE 2 // FP encoded as 0, SP as 2.
#define SIZE_OF_STACK_AREA_ENCBASE 3
#define SIZE_OF_EDIT_AND_CONTINUE_PRESERVED_AREA_ENCBASE 4
#define SIZE_OF_EDIT_AND_CONTINUE_FIXED_STACK_FRAME_ENCBASE 4
#define REVERSE_PINVOKE_FRAME_ENCBASE 6
#define NUM_REGISTERS_ENCBASE 3
#define NUM_STACK_SLOTS_ENCBASE 2
Expand Down
14 changes: 9 additions & 5 deletions src/coreclr/jit/codegenarm.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2454,12 +2454,16 @@ void CodeGen::genCaptureFuncletPrologEpilogInfo()
unsigned preSpillRegArgSize = genCountBits(regSet.rsMaskPreSpillRegs(true)) * REGSIZE_BYTES;
genFuncletInfo.fiFunctionCallerSPtoFPdelta = preSpillRegArgSize + 2 * REGSIZE_BYTES;

regMaskTP rsMaskSaveRegs = regSet.rsMaskCalleeSaved;
unsigned saveRegsCount = genCountBits(rsMaskSaveRegs);
unsigned saveRegsSize = saveRegsCount * REGSIZE_BYTES; // bytes of regs we're saving
regMaskTP rsMaskSaveRegs = regSet.rsMaskCalleeSaved;
unsigned saveRegsCount = genCountBits(rsMaskSaveRegs);
unsigned saveRegsSize = saveRegsCount * REGSIZE_BYTES; // bytes of regs we're saving
unsigned saveSizeWithPSP = saveRegsSize + REGSIZE_BYTES /* PSP sym */;
if (compiler->lvaMonAcquired != BAD_VAR_NUM)
{
saveSizeWithPSP += TARGET_POINTER_SIZE;
}
assert(compiler->lvaOutgoingArgSpaceSize % REGSIZE_BYTES == 0);
unsigned funcletFrameSize =
preSpillRegArgSize + saveRegsSize + REGSIZE_BYTES /* PSP slot */ + compiler->lvaOutgoingArgSpaceSize;
unsigned funcletFrameSize = preSpillRegArgSize + saveSizeWithPSP + compiler->lvaOutgoingArgSpaceSize;

unsigned funcletFrameSizeAligned = roundUp(funcletFrameSize, STACK_ALIGN);
unsigned funcletFrameAlignmentPad = funcletFrameSizeAligned - funcletFrameSize;
Expand Down
8 changes: 8 additions & 0 deletions src/coreclr/jit/codegenarm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1692,6 +1692,14 @@ void CodeGen::genCaptureFuncletPrologEpilogInfo()
saveRegsPlusPSPSize += MAX_REG_ARG * REGSIZE_BYTES;
}

if (compiler->lvaMonAcquired != BAD_VAR_NUM && !compiler->opts.IsOSR())
{
// We furthermore allocate the "monitor acquired" bool between PSP and
// the saved registers because this is part of the EnC header.
// Note that OSR methods reuse the monitor bool created by tier 0.
saveRegsPlusPSPSize += compiler->lvaLclSize(compiler->lvaMonAcquired);
}

unsigned const saveRegsPlusPSPSizeAligned = roundUp(saveRegsPlusPSPSize, STACK_ALIGN);

assert(compiler->lvaOutgoingArgSpaceSize % REGSIZE_BYTES == 0);
Expand Down
58 changes: 42 additions & 16 deletions src/coreclr/jit/codegenarmarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4159,24 +4159,37 @@ void CodeGen::genCreateAndStoreGCInfo(unsigned codeSize,
{
// what we have to preserve is called the "frame header" (see comments in VM\eetwain.cpp)
// which is:
// -return address
// -saved off RBP
// -saved 'this' pointer and bool for synchronized methods
// -return address (saved lr)
// -saved off FP
// -all callee-preserved registers in case of varargs
// -saved bool for synchronized methods

// 4 slots for RBP + return address + RSI + RDI
int preservedAreaSize = 4 * REGSIZE_BYTES;
int preservedAreaSize = (2 + genCountBits(RBM_ENC_CALLEE_SAVED)) * REGSIZE_BYTES;

if (compiler->info.compIsVarArgs)
{
// Varargs save all int registers between saved registers and PSP.
preservedAreaSize += MAX_REG_ARG * REGSIZE_BYTES;
}

if (compiler->info.compFlags & CORINFO_FLG_SYNCH)
{
if (!(compiler->info.compFlags & CORINFO_FLG_STATIC))
preservedAreaSize += REGSIZE_BYTES;
// bool in synchronized methods that tracks whether the lock has been taken (takes a full pointer sized
// slot)
preservedAreaSize += TARGET_POINTER_SIZE;

preservedAreaSize += 1; // bool for synchronized methods
// Verify that MonAcquired bool is at the bottom of the frame header
assert(compiler->lvaGetCallerSPRelativeOffset(compiler->lvaMonAcquired) == -preservedAreaSize);
}

// Used to signal both that the method is compiled for EnC, and also the size of the block at the top of the
// frame
gcInfoEncoder->SetSizeOfEditAndContinuePreservedArea(preservedAreaSize);
gcInfoEncoder->SetSizeOfEditAndContinueFixedStackFrame(genTotalFrameSize());

JITDUMP("EnC info:\n");
JITDUMP(" EnC preserved area size = %d\n", preservedAreaSize);
JITDUMP(" Fixed stack frame size = %d\n", genTotalFrameSize());
}

#endif // TARGET_ARM64
Expand Down Expand Up @@ -4694,6 +4707,8 @@ void CodeGen::genPushCalleeSavedRegisters()
// |-----------------------|
// |Callee saved registers | // not including FP/LR; multiple of 8 bytes
// |-----------------------|
// | MonitorAcquired |
// |-----------------------|
// | PSP slot | // 8 bytes (omitted in NativeAOT ABI)
// |-----------------------|
// | locals, temps, etc. |
Expand Down Expand Up @@ -4725,6 +4740,8 @@ void CodeGen::genPushCalleeSavedRegisters()
// |-----------------------|
// |Callee saved registers | // not including FP/LR; multiple of 8 bytes
// |-----------------------|
// | MonitorAcquired |
// |-----------------------|
// | PSP slot | // 8 bytes (omitted in NativeAOT ABI)
// |-----------------------|
// | locals, temps, etc. |
Expand Down Expand Up @@ -4822,7 +4839,7 @@ void CodeGen::genPushCalleeSavedRegisters()
maskSaveRegsInt &= ~(RBM_FP | RBM_LR); // We've already saved FP/LR
offset = (int)compiler->compLclFrameSize + 2 * REGSIZE_BYTES; // 2 for FP/LR
}
else if (totalFrameSize <= 512)
else if ((totalFrameSize <= 512) && !compiler->opts.compDbgEnC)
{
// Case #2.
//
Expand Down Expand Up @@ -4956,6 +4973,7 @@ void CodeGen::genPushCalleeSavedRegisters()
}
else
{
assert(!compiler->opts.compDbgEnC);
JITDUMP("Frame type 3 (save FP/LR at bottom). #outsz=%d; #framesz=%d; LclFrameSize=%d\n",
unsigned(compiler->lvaOutgoingArgSpaceSize), totalFrameSize, compiler->compLclFrameSize);

Expand Down Expand Up @@ -5110,15 +5128,23 @@ void CodeGen::genPushCalleeSavedRegisters()
establishFramePointer = false;

int remainingFrameSz = totalFrameSize - calleeSaveSpDelta;
assert(remainingFrameSz > 0);
assert((remainingFrameSz % 16) == 0); // this is guaranteed to be 16-byte aligned because each component --
// totalFrameSize and calleeSaveSpDelta -- is 16-byte aligned.
if (remainingFrameSz > 0)
{
assert((remainingFrameSz % 16) == 0); // this is guaranteed to be 16-byte aligned because each component --
// totalFrameSize and calleeSaveSpDelta -- is 16-byte aligned.

JITDUMP(" remainingFrameSz=%d\n", remainingFrameSz);
JITDUMP(" remainingFrameSz=%d\n", remainingFrameSz);

// We've already established the frame pointer, so no need to report the stack pointer change to unwind info.
genStackPointerAdjustment(-remainingFrameSz, initReg, pInitRegZeroed, /* reportUnwindData */ false);
offset += remainingFrameSz;
// We've already established the frame pointer, so no need to report the stack pointer change to unwind
// info.
genStackPointerAdjustment(-remainingFrameSz, initReg, pInitRegZeroed, /* reportUnwindData */ false);
offset += remainingFrameSz;
}
else
{
// Should only have picked this frame type for EnC.
assert(compiler->opts.compDbgEnC);
}
}
else
{
Expand Down
Loading