-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle more than 64 registers - Part 4 #102921
Conversation
This reverts commit d53b620. By not passing `regMaskTP` using constant reference, there is a small cost we have to pay: Without constant reference: Overall: 1.80% MinOpts: 2.05% FullOpts: 1.62% With constant reference: Overall: 1.74% MinOpts: 1.94% FullOpts: 1.6%
Moved the *= operators as instance of `regMaskTP` so the `.low` private field can directly be manipulated instead of converting the `64-bit` value in `regMaskTP` before doing any operation. Overall: 0.74% MinOpts: 0.82% FullOpts: 0.68%
@dotnet/jit-contrib @jakobbotsch PTAL |
#endif // TARGET_XARCH | ||
} | ||
#ifdef TARGET_ARM | ||
if (call->IsVirtualStub()) | ||
{ | ||
killMask |= compiler->virtualStubParamInfo->GetRegMask(); | ||
killMask.AddGprRegs(compiler->virtualStubParamInfo->GetRegMask()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this using AddGprRegs
? Isn't virtualStubParamInfo->GetRegMask()
returning a regMaskTP
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GetRegMask()
returns _regMask_enum
which is uint64_t
. So this was the simplest thing to do ATM
@@ -911,15 +919,15 @@ regMaskTP LinearScan::getKillSetForBlockStore(GenTreeBlk* blkNode) | |||
if (isCopyBlk) | |||
{ | |||
// rep movs kills RCX, RDI and RSI | |||
killMask = RBM_RCX | RBM_RDI | RBM_RSI; | |||
killMask.AddGprRegs(RBM_RCX | RBM_RDI | RBM_RSI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't RBM_*
regMaskTP
already? I don't really understand changes here (in multiple places it seems like we shouldn't need to switch to SingleRegTypeMask
overloads)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, all the RBM_*
are still uint64_t
except for allAvailableRegs
. Here is what I am planning to do after this PR:
- Send a PR that convert the
killMask
's registerAssignment toregMaskTP
(by unioning existing field or equivalent) - I am still deciding if we should convert all or just subset of
RBM_*
, but most likely I will convert just someRBM_*
toregMaskTP
that contains mask. All the remaining can continue to beuint64_t
. In future, if we feel a need for TP improvement reason, we can convert them all toregMaskTP
(once predicate register work is complete).
SingleTypeRegSet value = genSingleTypeRegMask(reg); | ||
#ifdef HAS_MORE_THAN_64_REGISTERS | ||
if (reg < 64) | ||
{ | ||
low |= value; | ||
} | ||
else | ||
{ | ||
high |= value; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more efficient to inline the computation of value
in the "else" case as reg - 64
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mask
value for registers > 63, will anyway start from 0x0
, so it should be fine, the way we are doing currently.
@@ -305,56 +364,69 @@ struct regMaskTP | |||
|
|||
SingleTypeRegSet GetPredicateRegSet() const | |||
{ | |||
#ifdef HAS_MORE_THAN_64_REGISTERS | |||
return getHigh(); | |||
#else | |||
return getLow(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't it be getLow() >> REG_P0
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For x64, when we create the RBM_ALLMASK
, we already have bits set to 0 except the ones that represent mask registers, hence should be safe to just return getLow()
.
}; | ||
|
||
static regMaskTP operator^(regMaskTP first, regMaskTP second) | ||
static regMaskTP operator^(const regMaskTP& first, const regMaskTP& second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this make a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think so. I just wanted to make it consistent like other methods.
Do you know what's up with the linux-x64 TP diffs? We don't define Can you post the detailed TP diff after your latest changes? |
yes, that's a good question. Let me see the numbers with latest round and debug from there.
Will do. |
It's similar to what I posted in #102921 (comment). This is for overall collection, but for MinOpts, the
|
as we saw offline, this is the fact that our linux builds are not link time optimization enabled and hence give the difference because of new methods like |
allocateRegistersMinimalbulk of difference is coming from the fact that we are updating |
If I remove
|
#103058 should fix that. Let me know if you have more comments or is this good to go. |
#if defined(TARGET_XARCH) | ||
killMask &= ~(RBM_FLT_CALLEE_TRASH | RBM_MSK_CALLEE_TRASH); | ||
|
||
#ifdef TARGET_AMD64 | ||
killMask.RemoveRegsetForType(RBM_FLT_CALLEE_TRASH.getLow(), FloatRegisterType); | ||
killMask.RemoveRegsetForType(RBM_MSK_CALLEE_TRASH.getLow(), MaskRegisterType); | ||
#else | ||
killMask &= ~RBM_FLT_CALLEE_TRASH; | ||
killMask.RemoveRegsetForType(RBM_FLT_CALLEE_TRASH, FloatRegisterType); | ||
killMask &= ~RBM_MSK_CALLEE_TRASH; | ||
#endif // TARGET_AMD64 | ||
|
||
#else | ||
killMask.RemoveRegsetForType(RBM_FLT_CALLEE_TRASH, FloatRegisterType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this will change back to what it was before once we change RBM_*
to be regMaskTP
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to block on it, but I guess it would be nice to make the RBM_*
change first, since it seems a lot of the changes of this PR end up having to deal with the fact that RBM_*
aren't regMaskTP
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, there are few places that we have to do, but at this point I think I will go ahead and merge this PR as-is and probably start working on RBM_
change next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We should open an issue about doing a more detailed investigation of where the throughput cost inside allocateRegistersMinimal
comes from since at this point we do not have a good understanding of that.
Opened #103079 |
This adds
high
field inregMaskTP
underHAS_MORE_THAN_64_REGISTERS
and update various code paths to should take into account querying or updating this field.I have made commits that are isolated, so putting here for reference for ease review.
[ 6f07fc3 ] (HEAD -> part4, kp/part4) jit format
[ 95ca449 ] genRegNumFromMask() to take SingleTypeRegSet
[ 7de39e7 ] Add assert in genFindLowestBit() and genMaxOneBit() to be only applicable for gpr/float
[ fb0c4ca ] bug fix
[ c72147f ] Initialize regMaskTP in genRegMask()
[ edccae2 ] Change some more methods to return SingleTypeRegMask
[ cedb079 ] Add various variants of genRegNumFromMask() and use them
[ 8b7084a ] Move registerType in Referencable, add getRegisterType()
[ 3741ac6 ] Move mapTypeToRegTypeIndex() and mapRegNumtoRegTypeIndex() in compiler.hpp
[ 687ed43 ] Add operator[]
[ b2d9b1c ] jit format
[ e56dc9b ] fix some other build failures
[ 5f63e60 ] misc cleanup
[ 7542417 ] operators() that operate on regNum
[ 2d1d2bb ] GetRegSetForType() defintion
[ d14f56a ] RemoveRegsetForType()
[ 147a88b ] AddRegsetForType()
[ 535c2e1 ] AddGprRegs()
[ 1d0f492 ] IsRegNumInMask() and IsRegNumPresent()
[ f370cf1 ] Use IsEmpty() and IsNonEmpty()
[ fd18c7f ] RemoveRegNumFromMask() and RemoveRegNum() and consume them
[ ed203f3 ] Use AddRegNumInMask() and AddRegNum()
[ c192d95 ] AddRegNumInMask/AddRegNum
[ a38460a ] change the structure of regMaskTP for performance
[ 4fe1caa ] Add regTypeTag in register*.h files
[ 27b8170 ] Use HAS_MORE_THAN_64_REGISTERS macro
[ 6ffccb3 ] Make high only to arm64
[ fed5c2a ] use genSingleTypeRegMask()
[ b4b08fa ] Add high to all platforms
[ 53c29d1 ] jit fornat
[ ccddf1c ] Furture reduce the TP cost
[ b3218e1 ] Revert "Do not use
const regMaskTP&
as parameter"[ 78b34fa ] Do not use
const regMaskTP&
as parameter[ e895528 ] Add high field