Skip to content

Commit

Permalink
Recycle block local registers in fast pass (#1448)
Browse files Browse the repository at this point in the history
Summary:
Original Author: [email protected]
Original Git: 6b69a06
Original Reviewed By: avp
Original Revision: D59072005

The register allocator has the ability to honour a memory limit that is
proportional to the product of the number of instructions and basic
blocks in the function being allocated. Unfortunately, functions that
hit this limit by definition have a lot of instructions Even in the
most degenerate case where every block has one instruction, you need
4000 instructions to hit the 10M limit.

This diff tries to improve the quality of generated code in cases where
most values are used within the basic block they are defined in. In such
cases, we currently make the register available after the end of the
block. With this diff, the registers become available after their last
use in the block.

This is useful for functions with extremely large basic blocks, where
the current approach would end up allocating a huge number of registers
since the registers cannot be used within the same block.

Closes #1448

Reviewed By: avp

Differential Revision: D60241766

fbshipit-source-id: 5196333862517cd546d675cf8fe005eb1ed5a790
  • Loading branch information
neildhar authored and facebook-github-bot committed Jul 27, 2024
1 parent cdec20e commit 20b13b9
Show file tree
Hide file tree
Showing 12 changed files with 600 additions and 579 deletions.
6 changes: 3 additions & 3 deletions include/hermes/BCGen/RegAlloc.h
Original file line number Diff line number Diff line change
Expand Up @@ -384,9 +384,9 @@ class RegisterAllocator {
/// predecessor blocks.
void lowerPhis(ArrayRef<BasicBlock *> order);

/// Allocate the registers for the instructions in the function. The order of
/// the block needs to match the order which we'll use for instruction
/// selection.
/// Allocate the registers for the instructions in the function. The blocks
/// must be in reverse-post-order, and must match the order which we'll use
/// for instruction selection.
void allocate(ArrayRef<BasicBlock *> order);

/// Reserves consecutive registers that will be manually managed by the user.
Expand Down
89 changes: 60 additions & 29 deletions lib/BCGen/RegAlloc.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -535,19 +535,6 @@ void RegisterAllocator::coalesce(
}
}

namespace {
/// Determines whether the Instruction is ever used outside its BasicBlock.
bool isBlockLocal(Instruction *inst) {
BasicBlock *parent = inst->getParent();
for (auto user : inst->getUsers()) {
if (parent != user->getParent()) {
return false;
}
}
return true;
}
} // namespace

void RegisterAllocator::allocateFastPass(ArrayRef<BasicBlock *> order) {
// Make sure Phis and related Movs get the same register
for (auto *bb : order) {
Expand All @@ -563,26 +550,70 @@ void RegisterAllocator::allocateFastPass(ArrayRef<BasicBlock *> order) {
}
}

llvh::SmallVector<Register, 16> blockLocals;
// Bit vector indicating whether a register with a given index is being used
// as a block local register.
llvh::BitVector blockLocals;

// List of free block local registers. We have to maintain this outside the
// file because we cannot determine interference between local and global
// registers. So we have to ensure that the local registers are only reused
// for other block-local instructions.
llvh::SmallVector<Register, 8> freeBlockLocals;

// A dummy register used for all instructions that have no users.
Register deadReg = file.allocateRegister();

// Iterate in reverse, so we can cheaply determine whether an instruction
// is local, and assign it a register accordingly.
for (auto *bb : llvh::reverse(order)) {
for (auto &inst : llvh::reverse(*bb)) {
if (isAllocated(&inst)) {
// If this is using a local register, we know the register is free after
// we visit the definition.
auto reg = getRegister(&inst);
auto idx = reg.getIndex();
if (idx < blockLocals.size() && blockLocals.test(idx))
freeBlockLocals.push_back(reg);
} else {
// Unallocated instruction means the result is dead, because all users
// are visited first. Allocate a temporary register.
// Note that we cannot assert that the instruction has no users, because
// there may be users in dead blocks.
updateRegister(&inst, deadReg);
}

// Then just allocate the rest sequentially, while optimizing the case
// where an inst is only ever used in its own block.
for (auto *bb : order) {
for (auto &inst : *bb) {
if (!isAllocated(&inst)) {
Register R = file.allocateRegister();
updateRegister(&inst, R);
if (inst.getNumUsers() == 0) {
file.killRegister(R);
} else if (isBlockLocal(&inst)) {
blockLocals.push_back(R);
// Allocate a register to unallocated operands.
for (size_t i = 0, e = inst.getNumOperands(); i < e; ++i) {
auto *op = llvh::dyn_cast<Instruction>(inst.getOperand(i));

// Skip if op is not an instruction or already has a register.
if (!op || isAllocated(op))
continue;

if (op->getParent() != bb) {
// Live across blocks, allocate a global regigster.
updateRegister(op, file.allocateRegister());
continue;
}

// We know this operand is local because:
// 1. The operand is in the same block as this one.
// 2. All blocks dominated by this block have been visited.
// 3. All users must be dominated by their def, since Phis are
// allocated beforehand.
if (!freeBlockLocals.empty()) {
updateRegister(op, freeBlockLocals.pop_back_val());
continue;
}

// No free local register, allocate another one.
Register reg = file.allocateRegister();
if (blockLocals.size() <= reg.getIndex())
blockLocals.resize(reg.getIndex() + 1);
blockLocals.set(reg.getIndex());
updateRegister(op, reg);
}
}
for (auto &reg : blockLocals) {
file.killRegister(reg);
}
blockLocals.clear();
}
}

Expand Down
Loading

0 comments on commit 20b13b9

Please sign in to comment.