perf(parser): make Modifiers::accessibility branchless#20827
Conversation
There was a problem hiding this comment.
Pull request overview
This PR optimizes TypeScript accessibility-modifier detection in the parser by making Modifiers::accessibility branchless and delegating the work to a new ModifierKinds::accessibility implementation that uses a packed lookup table.
Changes:
- Replaced branchy
Modifiers::accessibilitychecks with a call toModifierKinds::accessibility. - Added a new branchless
ModifierKinds::accessibilityusing a packed LUT (u64) for fast decoding. - Added a small
const fn minhelper for const-eval computations used by the LUT setup.
Merging this PR will degrade performance by 4.46%
Performance Changes
Comparing Footnotes
|
682b845 to
35d1279
Compare
d1e288e to
a69f5db
Compare
92c8d6f to
328e6da
Compare
a69f5db to
15ed94a
Compare
328e6da to
cee52c3
Compare
15ed94a to
7ccff49
Compare
7ccff49 to
2cabc70
Compare
Merge activity
|
cee52c3 to
c4b3d05
Compare
`Modifiers::accessibility` is called in many places. Previously it was 10 instructions, including 2 unpredictable branches.
Reduce it to 4 instructions and no branches, by using an 8-byte lookup table.
The code is verbose, but almost all of it is compile-time constants, which boil down to nothing but the 8-byte table. The code could be made shorter, but only by resorting to unsafe code. The version in this PR contains no unsafe code, and produces the same optimal assembly as an unsafe version.
Before:
```asm
accessibility:
mov al, 2
test dil, 8
jne .LBB1_3
mov al, 1
test dil, 4
jne .LBB1_3
xor eax, eax
test dil, 2
sete al
lea eax, [rax + 2*rax]
.LBB1_3:
ret
```
After:
```asm
accessibility:
shr edi
and edi, 7
lea rax, [rip + LUT::h8cdda7c57c71092a]
movzx eax, byte ptr [rdi + rax]
ret
LUT::h8cdda7c57c71092a:
.ascii "\003\000\001\001\002\002\002\002"
```
+0.1% - +0.2% improvement on parser benchmarks. But CodSpeed doesn't take into account the cost of branch misprediction which we now avoid, so it's likely an underestimate.
2cabc70 to
2da6b13
Compare
c4b3d05 to
2208114
Compare
`Modifiers::accessibility` is called in many places. Previously it was 10 instructions, including 2 unpredictable branches.
Reduce it to 4 instructions and no branches, by using an 8-byte lookup table.
The code is verbose, but almost all of it is compile-time constants, which boil down to nothing but the 8-byte table. The code could be made shorter, but only by resorting to unsafe code. The version in this PR contains no unsafe code, and produces the same optimal assembly as an unsafe version.
Before:
```asm
accessibility:
mov al, 2
test dil, 8
jne .LBB1_3
mov al, 1
test dil, 4
jne .LBB1_3
xor eax, eax
test dil, 2
sete al
lea eax, [rax + 2*rax]
.LBB1_3:
ret
```
After:
```asm
accessibility:
shr edi
and edi, 7
lea rax, [rip + LUT::h8cdda7c57c71092a]
movzx eax, byte ptr [rdi + rax]
ret
LUT::h8cdda7c57c71092a:
.ascii "\003\000\001\001\002\002\002\002"
```
+0.1% - +0.2% improvement on parser benchmarks. But CodSpeed doesn't take into account the cost of branch misprediction which we now avoid, so it's likely an underestimate.
2da6b13 to
5995339
Compare

Modifiers::accessibilityis called in many places. Previously it was 10 instructions, including 2 unpredictable branches.Reduce it to 4 instructions and no branches, by using an 8-byte lookup table.
The code is verbose, but almost all of it is compile-time constants, which boil down to nothing but the 8-byte table. The code could be made shorter, but only by resorting to unsafe code. The version in this PR contains no unsafe code, and produces the same optimal assembly as an unsafe version.
Before:
After:
+0.1% - +0.2% improvement on parser benchmarks. But CodSpeed doesn't take into account the cost of branch misprediction which we now avoid, so it's likely an underestimate.