Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -6368,7 +6368,7 @@ class Compiler
void AddNonFallthroughPreds(unsigned blockPos);
bool RunGreedyThreeOptPass(unsigned startPos, unsigned endPos);

bool RunThreeOptPass();
bool RunThreeOpt();

public:
ThreeOptLayout(Compiler* comp);
Expand Down
38 changes: 15 additions & 23 deletions src/coreclr/jit/fgopt.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5244,13 +5244,6 @@ void Compiler::ThreeOptLayout::Run()
assert(finalBlock != nullptr);
assert(!finalBlock->isBBCallFinallyPair());

// For methods with fewer than three candidate blocks, we cannot partition anything
if (finalBlock->IsFirst() || finalBlock->Prev()->IsFirst())
{
JITDUMP("Not enough blocks to partition anything. Skipping 3-opt.\n");
return;
}

// Get an upper bound on the number of hot blocks without walking the whole block list.
// We will only consider blocks reachable via normal flow.
const unsigned numBlocksUpperBound = compiler->m_dfsTree->GetPostOrderCount();
Expand All @@ -5267,13 +5260,20 @@ void Compiler::ThreeOptLayout::Run()
}

assert(numCandidateBlocks < numBlocksUpperBound);
blockOrder[numCandidateBlocks] = tempOrder[numCandidateBlocks] = block;
blockOrder[numCandidateBlocks] = block;

// Repurpose 'bbPostorderNum' for the block's ordinal
block->bbPostorderNum = numCandidateBlocks++;
}

const bool modified = RunThreeOptPass();
// For methods with fewer than three candidate blocks, we cannot partition anything
if (numCandidateBlocks < 3)
{
JITDUMP("Not enough blocks to partition anything. Skipping reordering.\n");
return;
}

const bool modified = RunThreeOpt();

if (modified)
{
Expand Down Expand Up @@ -5502,31 +5502,23 @@ bool Compiler::ThreeOptLayout::RunGreedyThreeOptPass(unsigned startPos, unsigned
}

//-----------------------------------------------------------------------------
// Compiler::ThreeOptLayout::RunThreeOptPass: Runs 3-opt on the candidate span of blocks.
// Compiler::ThreeOptLayout::RunThreeOpt: Runs 3-opt on the candidate span of blocks.
//
// Returns:
// True if we reordered anything, false otherwise
//
bool Compiler::ThreeOptLayout::RunThreeOptPass()
bool Compiler::ThreeOptLayout::RunThreeOpt()
Copy link
Contributor Author

@amanasifkhalid amanasifkhalid Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I want the hierarchy of 3-opt methods to look like:

  • ThreeOptLayout::Run: Set up some data structures, run 3-opt, and reorder the block list
  • ThreeOptLayout::RunThreeOpt: Run a 3-opt pass, and evaluate the new layout cost. TODO: If we want to try searching for a better local optimum, run another 3-opt pass with a different initial layout. I'd be surprised if this second pass makes a difference when we don't have loops, so a decent starting heuristic is to run another pass only when we have them. My current plan is to first run 3-opt without fgMoveHotJumps run on the initial layout, and then if we have loops, run fgMoveHotJumps on the initial layout, and try 3-opt again. We'll keep the better of the two.
  • ThreeOptLayout::RunGreedyThreeOptPass: Actually run 3-opt until convergence on whatever initial layout we're given.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll keep the better of the two.

As we saw with the fgMoveHotJumps data, layout score better doesn't reflect everything we care about... any thoughts on how we might also assess the compactness of a layout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this computation would only happen twice per compilation, I think we can implement a more sophisticated cost model using one of the techniques you mentioned in #112016 just for comparing candidate layouts, without it being too expensive. If we can't do that, then assuming 3-opt converges to the same cost with and without fgMoveHotJumps, we could break ties by choosing the layout with fgMoveHotJumps under the assumption that it's more compact.

{
const unsigned startPos = 0;
const unsigned endPos = numCandidateBlocks - 1;
const unsigned numBlocks = (endPos - startPos + 1);
assert(startPos <= endPos);

if (numBlocks < 3)
{
JITDUMP("Not enough blocks to partition anything. Skipping reordering.\n");
return false;
}
// We better have enough blocks to create partitions
assert(numCandidateBlocks > 2);
const unsigned startPos = 0;
const unsigned endPos = numCandidateBlocks - 1;

JITDUMP("Initial layout cost: %f\n", GetLayoutCost(startPos, endPos));
const bool modified = RunGreedyThreeOptPass(startPos, endPos);

// Write back to 'tempOrder' so changes to this region aren't lost next time we swap 'tempOrder' and 'blockOrder'
if (modified)
{
memcpy(tempOrder + startPos, blockOrder + startPos, sizeof(BasicBlock*) * numBlocks);
JITDUMP("Final layout cost: %f\n", GetLayoutCost(startPos, endPos));
}
else
Expand Down
Loading