-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix LSRA handling of arm32 double registers and spilling #94947
Fix LSRA handling of arm32 double registers and spilling #94947
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsIf LSRA uses
|
/azp run runtime-coreclr outerloop, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 3 pipeline(s). |
/azp run runtime-coreclr superpmi-diffs |
Azure Pipelines successfully started running 1 pipeline(s). |
All the test failures are known or infra |
fyi, I found this testing |
@kunalspathak PTAL |
@@ -12457,6 +12457,14 @@ regMaskTP LinearScan::RegisterSelection::select(Interval* currentInterval, | |||
regMaskTP busyRegs = linearScan->regsBusyUntilKill | linearScan->regsInUseThisLocation; | |||
candidates &= ~busyRegs; | |||
|
|||
#ifdef TARGET_ARM | |||
// For TYP_DOUBLE on ARM, we can only use register for which the odd half is also available. | |||
if (currentInterval->registerType == TYP_DOUBLE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If busyRegs
does not contain both registers of the pairs, a more appropriate fix would be to find place where we miss setting 1 of the register and fix it. When we set regMask
in regsBusyUntilKill
or regsInUseThisLocation
, we mostly use getRegMask
which takes into account the "double register" as well.
runtime/src/coreclr/jit/lsra.h
Lines 1740 to 1746 in 740ecd0
#ifdef TARGET_ARM | |
if (regType == TYP_DOUBLE) | |
{ | |
assert(genIsValidDoubleReg(reg)); | |
regMask |= (regMask << 1); | |
} | |
#endif // TARGET_ARM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model used for arm32 doubles is possibly inconsistent: it seems like only the low register of the pair is set in some (most) cases, whereas in getRegMask
both are set (IMO, that function has poorly named given genRegMask
exists). Also, it's not clear when LSRA should use genRegMask
versus getRegMask
.
However, here, busyRegs
is correct: only one of two of a double pair are busy. But because we're asking for a double pair, we need to avoid it if the pair is not fully available.
The example is that when we're here allocating a double, the candidates are all the even registers (the odd registers are not candidates, although when we allocate an even register, we implicitly also allocate the odd register). So, say f2
is a candidate, and f3
is a busy register. we need to remove f2
from the candidate set due to f3
being busy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
genRegMask
versusgetRegMask
I think we should use getRegMask(reg, var_type)
version and delete genRegMask(reg, var_type)
. They are doing same thing.
So, say f2 is a candidate, and f3 is a busy register. we need to remove f2 from the candidate set due to f3 being busy.
Correct, thats what I expect, but it should happen at place that added f3
as busy register. Do you know which of regsBusyUntilKill
or regsInUseThisLocation
does it gets marked as busy? The change looks correct, but I want to make sure we do not accidently mark a register (probably in jitstressregs) not part of current even/odd pair as busy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case was regsBusyUntilKill
.
but it should happen at place that added f3 as busy register
If f3
is marked as busy because it is a float (single-precision) type, then you wouldn't want to mark f2
busy at that point. It's only when we're looking for candidates for a double that we need to mask off both f2
and f3
if f3
is already marked as busy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind sharing the repro or the JitDump file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sent a link to JitDump offline.
You can also repro locally using:
py src\coreclr\scripts\superpmi.py replay -f coreclr_tests -arch x86 -target_arch arm -target_os linux -jitoption JitOptRepeat#* -jitoption JitOptRepeatCount#2 -jitoption JitStressRegs#3
And looking for:
Assertion failed '!isRegBusy(physRegRecord->regNum, current->registerType)'
(for me, MC#459399)
(I’ve fixed the other asserts in #94250)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to repro and see how f3
ends up in regsBusyUntilKill
. I am ok with this fix, but can we add assert((busyRegs & 0x555555550000) == RBM_NONE)
which will confirm at least that we never have f2
in this mask and hence we don't accidently remove f1
from the mask?
@kunalspathak I updated the comment |
can you also explain the implication of |
freeRegs = freeRegs & ((freeRegs << 1) & RBM_ALLDOUBLE); | ||
// Exclude any doubles for which the odd half isn't in freeRegs, | ||
// and restrict down to just the even part of the even/odd pair. | ||
freeRegs &= (freeRegs & RBM_ALLDOUBLE_HIGH) >> 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better understanding this concept, should we create a function or macro something like:
#define RBM_ALLDOUBLE_EVEN(mask) (mask & RBM_ALLDOUBLE_HIGH) >> 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm... that seems very confusing to me. RBM_ALLDOUBLE
already is the even or "low" registers of the pairs.
This change fixes a bug I noticed just by inspection, and leads to some diffs. I'm still investigating, but I also found some pre-existing corruption in double register output in double register disasm, so I'm continuing down the rabbit hole...
If LSRA uses `regsBusyUntilKill` to eliminate registers from consideration for spilling, for spilling arm32 double type registers, it needs to remove the even registers from the candidates. For example, if `regsBusyUntilKill` contains `f3`, candidates also needs to remove `f2`, since double register candidates only contains the even registers to represent to pair of registers for an arm32 double.
register part of ARM double registers even/odd pairs.
76f0dff
to
ef25ee6
Compare
I fixed a bug with disasm output of arm double registers d0-d15. There are register allocation diffs due to the @kunalspathak PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If LSRA uses
regsBusyUntilKill
to eliminate registers from consideration for spilling, for spilling arm32 double type registers, it needs to remove the even registers from the candidates. For example, ifregsBusyUntilKill
containsf3
, candidates also needs to removef2
, since double register candidates only contains the even registers to represent to pair of registers for an arm32 double.