-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Generalize assignment decomposition in physical promotion #85323
JIT: Generalize assignment decomposition in physical promotion #85323
Conversation
Generalize assignment decomposition to handle arbitrary combinations of physically promoted structs. Do this by introducing a DecompositionPlan class that keeps track of the copies to do that involve replacement fields. The first step is then to fill out this plan. In the general case where both the source and destination are physically promoted this involves iterating the replacements in lockstep. For promotions that map exactly, a direct copy between their locals is queued into the plan; in other cases (e.g. partial overlap) it may involve writing the source back to the struct local. The plan is used to generate the IR and to figure out whether the writes of all the replacement fields covers the destination such that the original struct copy can be omitted. Also unnest StatementList and rename it to DecompositionStatementList as the other name conflicted with another class.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsGeneralize assignment decomposition to handle arbitrary combinations of physically promoted structs. Do this by introducing a DecompositionPlan class that keeps track of the copies to do that involve replacement fields. The first step is then to fill out this plan. In the general case where both the source and destination are physically promoted this involves iterating the replacements in lockstep. For promotions that map exactly, a direct copy between their locals is queued into the plan; in other cases (e.g. partial overlap) it may involve writing the source back to the struct local. The plan is used to generate the IR and to figure out whether the writes of all the replacement fields covers the destination such that the original struct copy can be omitted. Also unnest StatementList and rename it to DecompositionStatementList as the other name conflicted with another class.
|
/azp run runtime-jit-experimental, runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 3 pipeline(s). |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
The JIT has a flag GTF_ICON_INITCLASS that represents that accesses off that address are cctor dependent. Hoisting uses this to avoid hoisting cctor dependent indirections unless all cctors are also hoisted. However, local constant prop/VN-based constant prop do not handle this flag, so we could run into cases where addresses with GTF_ICON_INITCLASS were propagated and then subsequently hoisted incorrectly. This change moves the flag to an OperIsIndir() flag instead of being a flag on the constant. After some digging, I found that the original reason the flag was not an indir flag was simply that there were no more indir flags available, but we do have available flags today. This fix is much simpler than the alternatives which would be to teach VN/local copy prop to propagate this GTF_ICON_INITCLASS flag. Also remove GTF_FLD_INITCLASS which is never set.
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-jit-experimental |
Azure Pipelines successfully started running 3 pipeline(s). |
Azure Pipelines successfully started running 3 pipeline(s). |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-jit-experimental |
Azure Pipelines successfully started running 3 pipeline(s). |
if (m_compiler->lvaGetDesc(entry.ToLclNum)->lvIsStructField) | ||
UpdateEarlyRefCount(m_compiler, dst); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit odd right now because I'm trying to avoid adding any ref counts to the replacement locals created by this pass. So this logic tracks them correctly only for existing locals and relies on forward sub to give up on the physical promotion locals that have 0 ref count.
I will submit a separate PR after this one to correctly track the ref counts much more uniformly for all the IR created by this pass.
@@ -24,13 +25,1181 @@ class DecompositionStatementList | |||
|
|||
for (GenTree* cur = m_head->gtNext; cur != nullptr; cur = cur->gtNext) | |||
{ | |||
tree = comp->gtNewOperNode(GT_COMMA, tree->TypeGet(), cur, tree); | |||
tree = comp->gtNewOperNode(GT_COMMA, TYP_VOID, cur, tree); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Physical promotion counterpart to #85320.
The failure is #83658. cc @dotnet/jit-contrib PTAL @AndyAyersMS... if some of the structure looks a bit unfamiliar it's because I refactored it in #85728. Diffs with physical promotion enabled. Diffs with physical promotion enabled and old promotion disabled. I analyzed the regressions in #85323 (comment), there is still quite a lot of potential improvements to be had. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall
primitiveType = TYP_INT; | ||
break; | ||
#endif | ||
case TARGET_POINTER_SIZE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you should peel this case out and check for it up front, then just do power of 2 sized case analysis beyond that.
Does alignment matter if there are no GC refs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if you should peel this case out and check for it up front, then just do power of 2 sized case analysis beyond that.
Sure, I can do that.
Does alignment matter if there are no GC refs?
AFAIK no, as of recently we do the same trick in block lowering for odd sizes by copying [start..start+8)
and [end-8..end)
.
// If the remainder is a full block and is going to incur write barrier | ||
// then avoid incurring multiple write barriers for each source | ||
// replacement that is a GC pointer -- write them back to the struct | ||
// first instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think some of these strategy choices might benefit from having pseudo-IR "pictures" showing what we're trying to avoid/create.
Either here in comments or perhaps in some companion write-up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add a comment on this specific optimization. For completeness:
// If the remainder is a full block and is going to incur write barrier
// then avoid incurring multiple write barriers for each source
// replacement that is a GC pointer -- write them back to the struct
// first instead. That is, instead of:
//
// ▌ COMMA void
// ├──▌ ASG struct (copy) <- write barrier
// │ ├──▌ BLK struct<Program+S, 32>
// │ │ └──▌ LCL_VAR byref V01 arg1
// │ └──▌ LCL_VAR struct<Program+S, 32> V00 arg0
// └──▌ COMMA void
// ├──▌ ASG ref <- write barrier
// │ ├──▌ IND ref
// │ │ └──▌ ADD byref
// │ │ ├──▌ LCL_VAR byref V01 arg1
// │ │ └──▌ CNS_INT long 8
// │ └──▌ LCL_VAR ref V05 tmp3
// └──▌ ASG ref <- write barrier
// ├──▌ IND ref
// │ └──▌ ADD byref
// │ ├──▌ LCL_VAR byref V01 arg1
// │ └──▌ CNS_INT long 24
// └──▌ LCL_VAR ref V06 tmp4
//
// Produce:
//
// ▌ COMMA void
// ├──▌ ASG ref <- no write barrier
// │ ├──▌ LCL_FLD ref V00 arg0 [+8]
// │ └──▌ LCL_VAR ref V05 tmp3
// └──▌ COMMA void
// ├──▌ ASG ref <- no write barrier
// │ ├──▌ LCL_FLD ref V00 arg0 [+24]
// │ └──▌ LCL_VAR ref V06 tmp4
// └──▌ ASG struct (copy) <- write barrier
// ├──▌ BLK struct<Program+S, 32>
// │ └──▌ LCL_VAR byref V01 arg1 (last use)
// └──▌ LCL_VAR struct<Program+S, 32> V00 arg0
//
} | ||
} | ||
|
||
if (remainderStrategy.Type != RemainderStrategy::NoRemainder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonder if you should represent the number of additional address uses as part of the remainder strategy?
Say someday you decided to support two primitives... I suppose there are enough other changes needed that perhaps this one would not be overlooked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, it would probably be reasonable to try to make this more polymorphic, but let me leave it as is for now and see how (if) it ends up abstracting/generalizing in the future to more sophisticated strategies.
} | ||
|
||
//------------------------------------------------------------------------ | ||
// CreateInitValue: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this belongs over in gentree.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a Compiler::gtNewConWithPattern
and made regular block morphing use it too.
if (addr->OperIsLocal() && (!m_dst->OperIs(GT_LCL_VAR, GT_LCL_FLD) || | ||
(addr->AsLclVarCommon()->GetLclNum() != m_dst->AsLclVarCommon()->GetLclNum()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is missing address exposed check, but it seems to be preexisting so let me fix it in a follow-up.
Generalize assignment decomposition to handle arbitrary combinations of physically and regularly promoted structs. Do this by introducing a DecompositionPlan class that keeps track of the copies to do that involve replacement fields. The first step is then to fill out this plan. In the general case where both the source and destination are physically promoted this involves iterating the replacements in lockstep. For promotions that map exactly, a direct copy between their locals is queued into the plan; in other cases (e.g. partial overlap) it may involve writing the source back to the struct local.
The plan is used to generate the IR and to compute the remainder that needs to be handled after all replacements have been handled. Example:
Example
(Also unnest StatementList and rename it to DecompositionStatementList as the other name conflicted with another class.)