Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax BALC[32] to 16-bit variant using trampolines #2

Open
wants to merge 13 commits into
base: nmips/gold_v7
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,10 @@ stamp-*
/mpc*
/gmp*
/isl*

# new-ignores

**/build/
**/install/

.vscode
1 change: 1 addition & 0 deletions elfcpp/nanomips.h
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ enum
R_NANOMIPS_JALR16 = 74,
R_NANOMIPS_JUMPTABLE_LOAD = 75,
R_NANOMIPS_FRAME_REG = 76,
R_NANOMIPS_NOTRAMP = 77,
R_NANOMIPS_TLS_DTPMOD = 80,
R_NANOMIPS_TLS_DTPREL = 81,
R_NANOMIPS_TLS_TPREL = 82,
Expand Down
53 changes: 53 additions & 0 deletions gold/README-BALC-trampolines-nanomips.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# BALC Trampolines – design

If this feature is enabled via command line (see gold/options.h:1216), R_NANOMIPS_PC25_S1 type relocations related to the BALC instruction are used to identify potential locations where the 32-bit BALC can be replaced by a trampoline. The feature is turned on by default.

A special relocation, R_NANOMIPS_NOTRAMP (see elfcpp/nanomips.h:102), is introduced to enable this functionality to be turned off at the instruction level. After the RELAX and EXPAND phases, another linker phase, TRAMPOLINES, is introduced to generate the trampolines (see nanomips.cc, do_relax() method). This phase consists of two passes. In the first pass through all relocations of the mentioned type, occurrences of 32-bit BALC instructions are identified and code segmentation is performed into areas that can be covered by a single trampoline. In this phase, a special container is also constructed in which the locations where the trampolines have been decided to be, that is, the calls of the trampolines, are kept. This data set ie. addresses in it must be maintained during Gold's iterative process. An identical algorithm, but at the section level, has already been applied in the case of conditional branches. In the second pass, some of the 32-bit BALC instructions (a small number) are replaced by trampolines (three instructions), and the others by 16-bit BALC jumps to trampolines. In order to achieve this, two new transformations have been introduced (see nanomips-insn-property.h:80):

- BALC_TRAMP which generates a trampoline,
- BALC_CALL which converts BLAC32 to BALC16.

BALC_TRAMP does not change the relocation, it remains R_NANOMIPS_PC25_S1 and targets the original symbol.

BALC_TRAMP:

Instead of BALC32 <target_function> with R_NANOMIPS_PC25_S1 relocation, we'll have:
A: BALC16 C
B: BC16 D
C: BC32 <target_function> with R_NANOMIPS_PC25_S1 relocation
D:

BALC16 C is necessary to save the return address in RA, and BC16 D is necessary to jump over the trampoline on return.
This can be achieved with a shorter sequence of instructions, using ADDIUPC, but in that case, branch prediction is significantly compromised. In that case, trampoline looks like:

A: LAPC $rt, 4
B: BC32 <target_function>

BALC_CALL replaces R_NANOMIPS_PC25_S1 with R_NANOMIPS_PC11_S1, which still targets the original symbol (although it is too far away). In the relocation application phase, during each R_NANOMIPS_PC11_S1 that is applied to the BALC instruction, it is first checked (by searching the container) whether a trampoline call is expected at the given location, and if so, instead of the address of the target symbol, the address of the target trampoline is used.

struct Balc_trampoline // Represents balc32 instruction (candidate) in the code
{
Address address; // Current address of balc32 instruction
Address target; // balc32 target
bool ignore{true}; // Should this balc32 be ignored in the trampolines algorithm?
bool is_trampoline{false}; // This balc32 is going to become a trampoline

Balc_trampoline(Address address_, Address target_)
: address(address_), target(target_) { }
};

struct Balc_trampoline_target // Represents a target of balc32 instruction
{
int count{0}; // How many calls to this target
size_t first; // Index of first balc32 which calls this target
size_t trampoline; // Index of trampoline which will be used instead of real target
size_t last; // Index of last balc32 which calls this target
Address target; // Real target address
};

See nanomips.cc:6120. We start with an array of balc32 candidates (Balc_trampoline). Then we create an intermediate array of targets (Balc_trampoline_target) and in a few passes populate all fields.
There should be a least 4 calls to the same target in a range of 2048 bytes. Then one of them is converted to a trampoline, and the others become trampoline calls (nanomips.cc:6131). Trampoline is the last candidate within reach of the first BALC candidate.

Ideally, the trampoline will become the center BALC within the 2048 byte frame, while the other BALCs will become calls to it (forward or backward). However, this algorithm does not guarantee that some of the marginal BALC will not go out of the range of +-1024 bytes relative to the trampoline (middle BALC). This can happen due to an "expand" operation or due to shifting entire sections by several bytes due to alignment. In that case, such BALC will remain BALC32. This is of course not completely optimal, but we currently have no idea how to overcome it. We rely on the assumption that this will happen very rarely and has no significant impact on code size.

*The current problem* occurs when maintaining addresses in the container in cases when entire sections are moved (e.g. in order to be 4 bytes aligned). This problem does not exist with conditional branching because they are processed at the section level, not at the entire code. An investigation is in progress.
6 changes: 6 additions & 0 deletions gold/nanomips-insn-property.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,12 @@ enum Transform_type
TT_PCREL16_ZERO,
// Transform to avoid hw110880 issue
TT_IMM48_FIX,
// Transform balc 32-bit to balc 16-bit via trampoline.
TT_BALC_CALL,
TT_BALC_TRAMP,
// Not a transformation just indicates that there is a NOTRAMP reloc
// on balc instruction
TT_BALC_NOTRAMP,
};

// The Nanomips_insn_template class is to store information about a
Expand Down
11 changes: 11 additions & 0 deletions gold/nanomips-insn.def
Original file line number Diff line number Diff line change
Expand Up @@ -208,6 +208,12 @@
// restore.jrc16 u, [dst1, dst2, ...]
#define RESTOREJRC16 NIT16("restore.jrc[16]", 0x1d00, NONE, INS_REG(0, 4, TREG), ins_sres16_fields)

// These two are used for generating trampolines.
// bc16 4
#define BC16_FIXED NIT16("bc[16]", 0x1804, FIXED, NULL, NULL)
// balc16 2
#define BALC16_FIXED NIT16("balc[16]", 0x3802, FIXED, NULL, NULL)

//
//
// Nanomips instruction property
Expand Down Expand Up @@ -250,6 +256,10 @@ NTT(PCREL16, RELS(R(PC25_S1)), INSNS(BALC
NTT(PCREL_NMF, RELS(R(PC25_S1)), INSNS(LAPC48(TREG), JALRC16(SREG)))
NTT(PCREL16_LONG, RELS(R(PC25_S1)), INSNS(ALUIPC32(PC_HI20, SREG), ORI32(LO12, SREG, SREG), JALRC16(SREG)))
NTT(PCREL32_LONG, RELS(R(PC25_S1)), INSNS(ALUIPC32(PC_HI20, SREG), ORI32(LO12, SREG, SREG), JALRC32))
NTT(BALC_TRAMP, RELS(R(PC25_S1)), INSNS(BALC16_FIXED, BC16_FIXED, BC32))
NTT(BALC_CALL, RELS(R(PC25_S1)), INSNS(BALC16))
djtodoro marked this conversation as resolved.
Show resolved Hide resolved
// NOTRAMP, just to find out if NOTRAMP really points to balc instruction
NTT(BALC_NOTRAMP, RELS(R(NOTRAMP)), INSNS())

// bc sym
NIP32("bc", 0x28000000, NULL, NULL, NULL, NULL)
Expand Down Expand Up @@ -377,6 +387,7 @@ NTT(GPREL_NMF, RELS(R(TLS_LD)), INSNS(ADDI
NIP32("addiu[gp.b]", 0x440c0000, EXT_REG(21, 5), NULL, NULL, NULL)
NTT(ABS32_LONG, RELS(R(GPREL18)), INSNS(LUI32(HI20, SREG), ORI32(LO12, TREG, SREG)))
NTT(PCREL32_LONG, RELS(R(GPREL18)), INSNS(ALUIPC32(PC_HI20, SREG), ORI32(LO12, TREG, SREG)))
NTT(GPREL32_WORD, RELS(R(GPREL18)), INSNS(ADDIUGPW32))
NTT(GPREL_NMF, RELS(R(GPREL18)), INSNS(ADDIUGP48(GPREL_I32, TREG)))
NTT(GPREL_LONG, RELS(R(GPREL18)), INSNS(LUI32(GPREL_HI20, SREG), ORI32(GPREL_LO12, SREG, SREG), ADDUGP32(SREG, TREG)))

Expand Down
2 changes: 2 additions & 0 deletions gold/nanomips-reloc.def
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,8 @@ NRD(JALR32, PLACEHOLDER, 32, 0, 0xffe0
NRD(JALR16, PLACEHOLDER, 16, 0, 0xfc1f, 0)
NRD(JUMPTABLE_LOAD, PLACEHOLDER, 0, 0, 0, 0)
NRD(FRAME_REG, STATIC, 0, 0, 0, 0)
// Goes only with R_NANOMIPS_PC25_S1 so that is why the mask is the same
NRD(NOTRAMP, STATIC, 32, 26, 0xfe000000, 0)
NRD(COPY, DYNAMIC, 0, 0, 0, 0)
NRD(GLOBAL, DYNAMIC, 0, 0, 0, 0)
NRD(JUMP_SLOT, DYNAMIC, 0, 0, 0, 0)
Expand Down
Loading