Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeGen][X86] Fix lowering of tailcalls when -ms-hotpatch is used #77245

Merged
merged 4 commits into from
Jan 22, 2024

Conversation

aganea
Copy link
Member

@aganea aganea commented Jan 7, 2024

Previously, tail jump pseudo-opcodes were skipped by the encodeInstruction() call inside X86AsmPrinter::LowerPATCHABLE_OP. This caused emission of a 2-byte NOP and dropping of the tail jump.

With this PR, we change PATCHABLE_OP to not wrap the first MachineInstr anymore, but inserting itself before the instruction, leaving the instruction unaltered. At lowering time in X86AsmPrinter, we now "look ahead" for the next non-pseudo MachineInstr and lower+encode it, to inspect its size. If the size is below what PATCHABLE_OP expects, it inserts NOPs; otherwise it does nothing. That way, now the first MachineInstr is always lowered as usual even if "patchable-function"="prologue-short-redirect" is used.

Fixes #76879, #76958 and #59039

Previously, tail jump pseudo-opcodes were skipped by the `encodeInstruction()` call inside `X86AsmPrinter::LowerPATCHABLE_OP`. This caused emission of a 2-byte NOP and dropping of the tail jump

Also, the `PatchableFunction` pass didn't properly update the call site info for the existing MachineInstr, after wrapping it into a `TargetOpcode::PATCHABLE_OP`.
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 7, 2024

@llvm/pr-subscribers-llvm-support

@llvm/pr-subscribers-backend-x86

Author: Alexandre Ganea (aganea)

Changes

Previously, tail jump pseudo-opcodes were skipped by the encodeInstruction() call inside X86AsmPrinter::LowerPATCHABLE_OP. This caused emission of a 2-byte NOP and dropping of the tail jump.

Also, the PatchableFunction pass didn't properly update/erase the call site info for the existing MachineInstr, after wrapping it into a TargetOpcode::PATCHABLE_OP.


Full diff: https://github.com/llvm/llvm-project/pull/77245.diff

3 Files Affected:

  • (modified) llvm/lib/CodeGen/PatchableFunction.cpp (+3)
  • (modified) llvm/lib/Target/X86/X86MCInstLower.cpp (+6-2)
  • (added) llvm/test/CodeGen/X86/patchable-prologue-tailcall.ll (+69)
diff --git a/llvm/lib/CodeGen/PatchableFunction.cpp b/llvm/lib/CodeGen/PatchableFunction.cpp
index 9449f143366f0f..768e8dff2efc24 100644
--- a/llvm/lib/CodeGen/PatchableFunction.cpp
+++ b/llvm/lib/CodeGen/PatchableFunction.cpp
@@ -87,6 +87,9 @@ bool PatchableFunction::runOnMachineFunction(MachineFunction &MF) {
   for (auto &MO : FirstActualI->operands())
     MIB.add(MO);
 
+  if (FirstActualI->shouldUpdateCallSiteInfo())
+    MF.eraseCallSiteInfo(&*FirstActualI);
+
   FirstActualI->eraseFromParent();
   MF.ensureAlignment(Align(16));
   return true;
diff --git a/llvm/lib/Target/X86/X86MCInstLower.cpp b/llvm/lib/Target/X86/X86MCInstLower.cpp
index e1a67f61e76640..93aa0fc23eb9f2 100644
--- a/llvm/lib/Target/X86/X86MCInstLower.cpp
+++ b/llvm/lib/Target/X86/X86MCInstLower.cpp
@@ -959,7 +959,8 @@ void X86AsmPrinter::LowerPATCHABLE_OP(const MachineInstr &MI,
   bool EmptyInst = (Opcode == TargetOpcode::PATCHABLE_OP);
 
   MCInst MCI;
-  MCI.setOpcode(Opcode);
+  // Make sure below we don't encode pseudo tailcalls.
+  MCI.setOpcode(convertTailJumpOpcode(Opcode));
   for (auto &MO : drop_begin(MI.operands(), 2))
     if (auto MaybeOperand = MCIL.LowerMachineOperand(&MI, MO))
       MCI.addOperand(*MaybeOperand);
@@ -994,8 +995,11 @@ void X86AsmPrinter::LowerPATCHABLE_OP(const MachineInstr &MI,
       (void)NopSize;
     }
   }
-  if (!EmptyInst)
+  if (!EmptyInst) {
+    if (Opcode != convertTailJumpOpcode(Opcode))
+      OutStreamer->AddComment("TAILCALL");
     OutStreamer->emitInstruction(MCI, getSubtargetInfo());
+  }
 }
 
 // Lower a stackmap of the form:
diff --git a/llvm/test/CodeGen/X86/patchable-prologue-tailcall.ll b/llvm/test/CodeGen/X86/patchable-prologue-tailcall.ll
new file mode 100644
index 00000000000000..ab3ff5852a74b0
--- /dev/null
+++ b/llvm/test/CodeGen/X86/patchable-prologue-tailcall.ll
@@ -0,0 +1,69 @@
+; RUN: llc -verify-machineinstrs < %s | FileCheck %s --check-prefix=CHECK
+
+; CHECK: "?mi_new_test@@YAPEAX_K@Z":
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: jmp     mi_new
+
+; CHECK: "?builtin_malloc_test@@YAPEAX_K@Z":
+; CHECK-NEXT: # %bb.0:
+; CHECK-NEXT: jmp     malloc
+
+; // Built with: clang-cl.exe /c patchable-prologue-tailcall.cpp /O2 /hotpatch -Xclang -emit-llvm
+;
+; typedef unsigned long long size_t;
+; 
+; extern "C" {
+;     void* mi_new(size_t size);
+;  }
+;
+; void *mi_new_test(size_t count)
+; {
+;     return mi_new(count);
+; }
+; 
+; void *builtin_malloc_test(size_t count)
+; {
+;     return __builtin_malloc(count);
+; }
+
+; ModuleID = 'patchable-prologue-tailcall.cpp'
+source_filename = "patchable-prologue-tailcall.cpp"
+target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-pc-windows-msvc19.38.33133"
+
+; Function Attrs: mustprogress nounwind sspstrong uwtable
+define dso_local noundef ptr @"?mi_new_test@@YAPEAX_K@Z"(i64 noundef %count) local_unnamed_addr #0 {
+entry:
+  %call = tail call ptr @mi_new(i64 noundef %count) #4
+  ret ptr %call
+}
+
+declare dso_local ptr @mi_new(i64 noundef) local_unnamed_addr #1
+
+; Function Attrs: mustprogress nofree nounwind sspstrong willreturn memory(inaccessiblemem: readwrite) uwtable
+define dso_local noalias noundef ptr @"?builtin_malloc_test@@YAPEAX_K@Z"(i64 noundef %count) local_unnamed_addr #2 {
+entry:
+  %call = tail call ptr @malloc(i64 noundef %count) #4
+  ret ptr %call
+}
+
+; Function Attrs: mustprogress nofree nounwind willreturn allockind("alloc,uninitialized") allocsize(0) memory(inaccessiblemem: readwrite)
+declare dso_local noalias noundef ptr @malloc(i64 noundef) local_unnamed_addr #3
+
+attributes #0 = { mustprogress nounwind sspstrong uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "patchable-function"="prologue-short-redirect" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #1 = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #2 = { mustprogress nofree nounwind sspstrong willreturn memory(inaccessiblemem: readwrite) uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "patchable-function"="prologue-short-redirect" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #3 = { mustprogress nofree nounwind willreturn allockind("alloc,uninitialized") allocsize(0) memory(inaccessiblemem: readwrite) "alloc-family"="malloc" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
+attributes #4 = { nounwind }
+
+!llvm.linker.options = !{!0, !1}
+!llvm.module.flags = !{!2, !3, !4, !5}
+!llvm.ident = !{!6}
+
+!0 = !{!"/DEFAULTLIB:libcmt.lib"}
+!1 = !{!"/DEFAULTLIB:oldnames.lib"}
+!2 = !{i32 1, !"wchar_size", i32 2}
+!3 = !{i32 8, !"PIC Level", i32 2}
+!4 = !{i32 7, !"uwtable", i32 2}
+!5 = !{i32 1, !"MaxTLSAlign", i32 65536}
+!6 = !{!"clang version 18.0.0git (https://github.com/llvm/llvm-project.git 0ccf9e29e9626a3a6813b35608c8959178c50a6f)"}

Comment on lines 29 to 30
; ModuleID = 'patchable-prologue-tailcall.cpp'
source_filename = "patchable-prologue-tailcall.cpp"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete?

Comment on lines 53 to 56
attributes #0 = { mustprogress nounwind sspstrong uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "patchable-function"="prologue-short-redirect" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #2 = { mustprogress nofree nounwind sspstrong willreturn memory(inaccessiblemem: readwrite) uwtable "min-legal-vector-width"="0" "no-trapping-math"="true" "patchable-function"="prologue-short-redirect" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #3 = { mustprogress nofree nounwind willreturn allockind("alloc,uninitialized") allocsize(0) memory(inaccessiblemem: readwrite) "alloc-family"="malloc" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cmov,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the attribute "patchable-function"="prologue-short-redirect" only?

!3 = !{i32 8, !"PIC Level", i32 2}
!4 = !{i32 7, !"uwtable", i32 2}
!5 = !{i32 1, !"MaxTLSAlign", i32 65536}
!6 = !{!"clang version 18.0.0git (https://github.com/llvm/llvm-project.git 0ccf9e29e9626a3a6813b35608c8959178c50a6f)"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove !0, !1, !6?

if (!EmptyInst)
if (!EmptyInst) {
if (Opcode != convertTailJumpOpcode(Opcode))
OutStreamer->AddComment("TAILCALL");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is not tested at all?

@@ -87,6 +87,9 @@ bool PatchableFunction::runOnMachineFunction(MachineFunction &MF) {
for (auto &MO : FirstActualI->operands())
MIB.add(MO);

if (FirstActualI->shouldUpdateCallSiteInfo())
MF.eraseCallSiteInfo(&*FirstActualI);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need this. According to the description above, PatchableFunction needs the first instruction is two-byte long.

http://xxeo.com/single-byte-or-small-x86-opcodes

One-byte instruction is rare in X86. TailCall can be transformed to either a call or a jmp, which is always longer than 2 bytes. I think we can just skip emitting patchable-op if the first instruction is a TailCall.

@sylvain-audi
Copy link
Contributor

I remember trying a similar approach: Making the pass skip the functions for which the first instruction's isCall() returns true, assuming that all calls/jumps are >= 2 bytes long.

One one hand, I think it fixed the issue for the repro provided in the 2 issues, so it could be considered a workaround (and it is not worse than before this code change).

However I don't think it should close #59039, because of the remaining design issues:

  1. Wrapping an instruction into a PATCHABLE_OP loses information that would be needed at lowering time.
    In X86AsmPrinter::emitInstruction, the wrapped opcode doesn't go through X86AsmPrinter::emitInstruction itself, which I'm pretty sure can become problematic. The main concern here is that -fms-hotpatch could alter the expected first instruction (instead of only guaranteeing its minimum size).

  2. PATCHABLE_OP takes the size as parameter (in reality it is always 2), but it is hard for PatchableFunction pass to know precisely the exact size of the instruction at that point. Especially, with your change, we start to assume that the size is 2. Maybe it should be changed to only handle a size of 2, and simplify the lowering code.

What do you think?

@KanRobert
Copy link
Contributor

@sylvain-audi
For 1, I think we can avoid introducing the wrapper PATCHABLE_OP and instead add MIFlag (/llvm/include/llvm/CodeGen/MachineInstr.h) for your purpose.

For 2, instruction length is only determined before emitting. I think it's impossible to know that in PatchableFunction. if the size is always 2, we should not overdesign.

Instead of `PATCHABLE_OP` wrapping the first `MachineInstr`, it now inserts itself before it, leaving the instruction unaltered. At lowering time in `X86AsmPrinter`, we now "look ahead" for the next non-pseudo `MachineInstr` and lower+encode it, to inspect its size. If the size is below what `PATCHABLE_OP` expects, it inserts NOPs; otherwise it does nothing
@aganea
Copy link
Member Author

aganea commented Jan 10, 2024

So I did something different to address the points raised. Instead of PATCHABLE_OP wrapping the first MachineInstr, it now inserts itself before it, leaving the instruction unaltered. At lowering time in X86AsmPrinter, we now "look ahead" for the next non-pseudo MachineInstr and lower+encode it, to inspect its size. If the size is below what PATCHABLE_OP expects, it inserts NOPs; otherwise it does nothing. I think this should address @sylvain-audi 's concerns. PTAL.

If we wanted in a future PR we could change the attribute "patchable-function"="prologue-short-redirect" to "patchable-function"="2". That would make PATCHABLE_OP more flexible, since the X86 backend already supports large sized NOPs (up to 15 if the target CPU supports it).

Also, I've removed the X86::PUSH64r optimization which cannot be applied anymore in this situation. Thus the change in the patchable-prologue.ll test. +@sanjoy

@sylvain-audi
Copy link
Contributor

Also, I've removed the X86::PUSH64r optimization which cannot be applied anymore in this situation. Thus the change in the patchable-prologue.ll test. +@sanjoy

To keep the optimization, is there a (clean) way we could "tell" the X86AsmPrinter to ignore the emission of the next instruction?

@aganea
Copy link
Member Author

aganea commented Jan 10, 2024

Also, I've removed the X86::PUSH64r optimization which cannot be applied anymore in this situation. Thus the change in the patchable-prologue.ll test. +@sanjoy

To keep the optimization, is there a (clean) way we could "tell" the X86AsmPrinter to ignore the emission of the next instruction?

I was under the impression that was initially a low hanging fruit optimization, but I'm not sure it's required ( @sanjoy ). I'll look into what you suggest if that's required.

@sanjoy
Copy link
Contributor

sanjoy commented Jan 10, 2024

Hi @aganea - I touched this code 7+ years back and I don't remember any details, so unfortunately I don't have any useful feedback or suggestions on this PR.

@aganea
Copy link
Member Author

aganea commented Jan 12, 2024

@sylvain-audi For 1, I think we can avoid introducing the wrapper PATCHABLE_OP and instead add MIFlag (/llvm/include/llvm/CodeGen/MachineInstr.h) for your purpose.

One interesting approach that goes along your lines would be to do like x64 MSVC does: always generate a 2-byte instruction at the start of the function, with no consideration to PATCHABLE_OP or any other flag. We could do that for MSVC triples. But that seems more involved and I wanted a quick fix ready for 18.0 branchout. +@rnk in case you have an opinion.

@aganea
Copy link
Member Author

aganea commented Jan 12, 2024

Another thing that @sylvain-audi raised in a offline discussion, is that PatchableFunction pass adds a 16-byte alignement for every single function in the PE, which increases its size and is not required stricto sensu. It is only the usage of /functionpadmin linker option that does the extra padding at the begining of the function and could/should take this alignement decision.

@aganea
Copy link
Member Author

aganea commented Jan 12, 2024

@MolecularMatters would you be able please to run this patch through the Live++ tests, see if that doesn't break other cases?

@tycho
Copy link
Member

tycho commented Jan 12, 2024

I'm the original reporter to @MolecularMatters about the issue, and I've been running with the Live++ build flags in my project for the past 12 hours with the changes in this PR applied on top of 8fb8ad6 and it seems to be fine now without needing -fno-optimize-sibling-calls

@sylvain-audi
Copy link
Contributor

I'm the original reporter to @MolecularMatters about the issue, and I've been running with the Live++ build flags in my project for the past 12 hours with the changes in this PR applied on top of 8fb8ad6 and it seems to be fine now without needing -fno-optimize-sibling-calls

@aganea I think you addressed all my concerns in your latest update.
I tested your change on our game, replacing the workaround we have been using so far, and it worked.
I would think this PR can be merged and that further improvements could probably be kept for separate PRs, but this should probably be validated by a code owner or expert.

@MolecularMatters
Copy link

@aganea @tycho @sylvain-audi If somebody could make a clang/lld build with the patch available for me, I'd be happy to run this against the Live++ test suite.

@sylvain-audi
Copy link
Contributor

@aganea @tycho @sylvain-audi If somebody could make a clang/lld build with the patch available for me, I'd be happy to run this against the Live++ test suite.

@MolecularMatters I sent you a build with this PR modification, based on 17.0.6.
Just mentioning it so @tycho and @aganea don't need to worry about preparing/sending it.

@aganea
Copy link
Member Author

aganea commented Jan 18, 2024

@MolecularMatters Did you have a chance to test this PR? I’d like to merge it before the branch cutoff. Thanks!

@MolecularMatters
Copy link

@aganea Thanks for reminding me!
Yes, I just tested this, seems to work fine.

@aganea
Copy link
Member Author

aganea commented Jan 18, 2024

@aganea Thanks for reminding me! Yes, I just tested this, seems to work fine.

Thanks!

To all other reviewers: is this good to go? Can anyone approve please?

@aganea
Copy link
Member Author

aganea commented Jan 22, 2024

Ping!

Copy link
Contributor

@KanRobert KanRobert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some suggestion

@aganea
Copy link
Member Author

aganea commented Jan 22, 2024

As suggested

@aganea
Copy link
Member Author

aganea commented Jan 22, 2024

@sylvain-audi Would you prefer that we leave #59039 open?

@sylvain-audi
Copy link
Contributor

@sylvain-audi Would you prefer that we leave #59039 open?

No, you can close it in my opinion, thanks!

@aganea aganea merged commit bb28442 into llvm:main Jan 22, 2024
3 of 4 checks passed
@aganea aganea deleted the fix_hotpatch_tailcall branch January 22, 2024 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clang -fms-hotpatch elides memory allocation calls
7 participants