Skip to content

Commit 4330c11

Browse files
FranckQCSlama, Franck
andauthored
[Hexagon] Fix LWP assembly handler (predicate register) (#17204)
* Fix LWP assembly handler (predicate register) (#2216) This solved the issue with LWP that appears with maxpool. The problem was that the LWP handler was forgetting to save p0 (used by the handler). This predicate register needs to be saved too, just like r0-r5, as it had been decided that it was the responsibility of the handler to save everything (even these theoretically caller-saved registers). Said differently, since it had been decided that calling the LWP handler would not follow the normal ABI, and that the LWP handler would save everything it touches (even normally caller-saved registers like r0-r15 and p0-3), then it absolutely needs to save the predicate registers too (in particular p0, which was causing the issue). The issue appeared only with maxpool because it's the only one that had a state saved in p0 before calling the LWP handler. And this call destroyed the content of what it had saved, making it subsequently branch to different portions of the code. Fix: Allocate 32 bytes (instead of 24 previously), in order to save p3:0, and I save those at the bottom of the stack. Restore it at the end of the LWP handler. * Remove training spaces --------- Co-authored-by: Slama, Franck <[email protected]>
1 parent df33d73 commit 4330c11

File tree

1 file changed

+17
-8
lines changed

1 file changed

+17
-8
lines changed

src/runtime/hexagon/profiler/lwp_handler.S

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -50,12 +50,17 @@ handler itself.
5050
.falign
5151
.type lwp_handler,@function
5252
lwp_handler:
53-
{ allocframe(#24) // Allocate 24 bytes on the stack to save R0-R5 registers
53+
{
54+
allocframe(#32) // Allocate 32 bytes on the stack to save R0-R5 registers (6*4bytes) and P0-P3 (4*1byte) + 4 unused bytes as the stack has to be 8-bytes aligned
5455
memd(r29+#-16) = r5:4 // Save R5,R4
56+
r5 = p3:0 // We will save P3:0 but we need an intermediate usual register (R5) that has already been saved
57+
}
58+
{
59+
memd(r29+#16) = r3:2 // Save R3,R2
60+
memd(r29+#8) = r1:0 // Save R1, R0
5561
}
5662
{
57-
memd(r29+#8) = r3:2 // Save R3,R2
58-
memd(r29+#0) = r1:0 // Save R1, R0
63+
memw(r29+#0) = r5 // Save P3:0 (via R5)
5964
r2 = add(pc,##_GLOBAL_OFFSET_TABLE_@PCREL) // Get GOT address
6065
}
6166
{
@@ -102,14 +107,18 @@ lwp_handler:
102107
memw(r5+#8) = r0 // Save lower 32 bits
103108
}
104109
.falign
105-
.LBB0_3:
110+
.LBB0_3: // Restore the registers from the stack
111+
{
112+
r1 = memw(r29+#0) // We will restore P3:0 but need an intermediate usual register (R1) that hasn't already been restored
113+
r5:4 = memd(r29+#24) // Restore R5:4
114+
}
106115
{
107-
r5:4 = memd(r29+#16) // Restore the registers from the stack
108-
r3:2 = memd(r29+#8)
116+
r3:2 = memd(r29+#16) // Restore R3:2
117+
p3:0 = r1 // Restore P3:0 (via R1, not yet restored)
109118
}
110119
{
111-
r1:0 = memd(r29+#0)
112-
dealloc_return // Deallocate the stack and return
120+
r1:0 = memd(r29+#8) // Restore R1:0
121+
dealloc_return // Deallocate the stack and return
113122
}
114123
.Lfunc_end0:
115124
.size lwp_handler, .Lfunc_end0-lwp_handler

0 commit comments

Comments
 (0)