Skip to content

Conversation

@HaohaiWen
Copy link
Contributor

cg_profile in object is from CGProfilePass and it is often inaccurate.
While call-graph-ordering-file is provided by user. It is weird to
aggregate them together especially when call-graph-ordering-file is
accurate enough.

Add tests to track section reordering when both cg_profile section
and call-graph-ordering-file were given.
cg_profile in object is from CGProfilePass and it is often inaccurate.
While call-graph-ordering-file is provided by user. It is weird to
aggregate them together especially when call-graph-ordering-file is
accurate enough.
@HaohaiWen
Copy link
Contributor Author

Add test in #121324.

@llvmbot
Copy link
Member

llvmbot commented Dec 30, 2024

@llvm/pr-subscribers-lld-elf
@llvm/pr-subscribers-platform-windows

@llvm/pr-subscribers-lld-coff

Author: Haohai Wen (HaohaiWen)

Changes

cg_profile in object is from CGProfilePass and it is often inaccurate.
While call-graph-ordering-file is provided by user. It is weird to
aggregate them together especially when call-graph-ordering-file is
accurate enough.


Full diff: https://github.com/llvm/llvm-project/pull/121325.diff

4 Files Affected:

  • (modified) lld/COFF/Driver.cpp (+3-3)
  • (modified) lld/ELF/Driver.cpp (+3-2)
  • (modified) lld/test/COFF/cgprofile-obj.s (+13-6)
  • (modified) lld/test/ELF/cgprofile-obj.s (+13-5)
diff --git a/lld/COFF/Driver.cpp b/lld/COFF/Driver.cpp
index be01ee41c9a2f2..4b9c189e21510d 100644
--- a/lld/COFF/Driver.cpp
+++ b/lld/COFF/Driver.cpp
@@ -2812,10 +2812,10 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
   // Handle /call-graph-ordering-file and /call-graph-profile-sort (default on).
   if (config->callGraphProfileSort) {
     llvm::TimeTraceScope timeScope("Call graph");
-    if (auto *arg = args.getLastArg(OPT_call_graph_ordering_file)) {
+    if (auto *arg = args.getLastArg(OPT_call_graph_ordering_file))
       parseCallGraphFile(arg->getValue());
-    }
-    readCallGraphsFromObjectFiles(ctx);
+    else
+      readCallGraphsFromObjectFiles(ctx);
   }
 
   // Handle /print-symbol-order.
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index f573a8d3e19f3b..e8e99fa874b5d5 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -3215,11 +3215,12 @@ template <class ELFT> void LinkerDriver::link(opt::InputArgList &args) {
 
   // Read the callgraph now that we know what was gced or icfed
   if (ctx.arg.callGraphProfileSort != CGProfileSortKind::None) {
-    if (auto *arg = args.getLastArg(OPT_call_graph_ordering_file))
+    if (auto *arg = args.getLastArg(OPT_call_graph_ordering_file)) {
       if (std::optional<MemoryBufferRef> buffer =
               readFile(ctx, arg->getValue()))
         readCallGraph(ctx, *buffer);
-    readCallGraphsFromObjectFiles<ELFT>(ctx);
+    } else
+      readCallGraphsFromObjectFiles<ELFT>(ctx);
   }
 
   // Write the result to the file.
diff --git a/lld/test/COFF/cgprofile-obj.s b/lld/test/COFF/cgprofile-obj.s
index b267850c46382c..da8011724d9859 100644
--- a/lld/test/COFF/cgprofile-obj.s
+++ b/lld/test/COFF/cgprofile-obj.s
@@ -2,9 +2,12 @@
 
 # RUN: llvm-mc -filetype=obj -triple=x86_64-pc-win32 %s -o %t
 # RUN: lld-link /subsystem:console /entry:A %t /out:%t2 /debug:symtab
-# RUN: llvm-nm --numeric-sort %t2 | FileCheck %s
+# RUN: llvm-nm --numeric-sort %t2 | FileCheck %s --check-prefix=CG-OBJ
 # RUN: lld-link /call-graph-profile-sort:no /subsystem:console /entry:A %t /out:%t3 /debug:symtab
 # RUN: llvm-nm --numeric-sort %t3 | FileCheck %s --check-prefix=NO-CG
+# RUN: echo "D A 200" > %t.call_graph
+# RUN: lld-link /subsystem:console /entry:A %t /out:%t4 /debug:symtab /call-graph-ordering-file:%t.call_graph
+# RUN: llvm-nm --numeric-sort %t4 | FileCheck %s --check-prefix=CG-OBJ-OF
 
     .section    .text,"ax", one_only, D
 D:
@@ -33,13 +36,17 @@ Aa:
     .cg_profile B, C, 30
     .cg_profile C, D, 90
 
-# CHECK: 140001000 T A
-# CHECK: 140001001 T B
-# CHECK: 140001002 T C
-# CHECK: 140001003 t D
-
+# CG-OBJ: 140001000 T A
+# CG-OBJ: 140001001 T B
+# CG-OBJ: 140001002 T C
+# CG-OBJ: 140001003 t D
 
 # NO-CG: 140001000 t D
 # NO-CG: 140001001 T C
 # NO-CG: 140001002 T B
 # NO-CG: 140001003 T A
+
+# CG-OBJ-OF: 140001000 t D
+# CG-OBJ-OF: 140001001 T A
+# CG-OBJ-OF: 140001004 T C
+# CG-OBJ-OF: 140001005 T B
diff --git a/lld/test/ELF/cgprofile-obj.s b/lld/test/ELF/cgprofile-obj.s
index 0848adc5e4279a..14016658707af4 100644
--- a/lld/test/ELF/cgprofile-obj.s
+++ b/lld/test/ELF/cgprofile-obj.s
@@ -2,12 +2,15 @@
 
 # RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o
 # RUN: ld.lld -e A %t.o -o %t
-# RUN: llvm-nm --no-sort %t | FileCheck %s
+# RUN: llvm-nm --no-sort %t | FileCheck %s --check-prefix=CG-OBJ
 # RUN: ld.lld --call-graph-profile-sort=none -e A %t.o -o %t
 # RUN: llvm-nm --no-sort %t | FileCheck %s --check-prefix=NO-CG
 ## --no-call-graph-profile-sort is an alias for --call-graph-profile-sort=none.
 # RUN: ld.lld --no-call-graph-profile-sort -e A %t.o -o %t1
 # RUN: cmp %t %t1
+# RUN: echo "D A 200" > %t.call_graph
+# RUN: ld.lld -e A %t.o -call-graph-ordering-file=%t.call_graph -o %t2
+# RUN: llvm-nm --no-sort %t2 | FileCheck %s --check-prefix=CG-OBJ-OF
 
     .section    .text.D,"ax",@progbits
 D:
@@ -36,12 +39,17 @@ Aa:
     .cg_profile B, C, 30
     .cg_profile C, D, 90
 
-# CHECK: 0000000000201123 t D
-# CHECK: 0000000000201122 T C
-# CHECK: 0000000000201121 T B
-# CHECK: 0000000000201120 T A
+# CG-OBJ: 0000000000201123 t D
+# CG-OBJ: 0000000000201122 T C
+# CG-OBJ: 0000000000201121 T B
+# CG-OBJ: 0000000000201120 T A
 
 # NO-CG: 0000000000201120 t D
 # NO-CG: 0000000000201121 T C
 # NO-CG: 0000000000201122 T B
 # NO-CG: 0000000000201123 T A
+
+# CG-OBJ-OF: 0000000000201120 t D
+# CG-OBJ-OF: 0000000000201124 T C
+# CG-OBJ-OF: 0000000000201125 T B
+# CG-OBJ-OF: 0000000000201121 T A

@llvmbot
Copy link
Member

llvmbot commented Dec 30, 2024

@llvm/pr-subscribers-lld

Author: Haohai Wen (HaohaiWen)

Changes

cg_profile in object is from CGProfilePass and it is often inaccurate.
While call-graph-ordering-file is provided by user. It is weird to
aggregate them together especially when call-graph-ordering-file is
accurate enough.


Full diff: https://github.com/llvm/llvm-project/pull/121325.diff

4 Files Affected:

  • (modified) lld/COFF/Driver.cpp (+3-3)
  • (modified) lld/ELF/Driver.cpp (+3-2)
  • (modified) lld/test/COFF/cgprofile-obj.s (+13-6)
  • (modified) lld/test/ELF/cgprofile-obj.s (+13-5)
diff --git a/lld/COFF/Driver.cpp b/lld/COFF/Driver.cpp
index be01ee41c9a2f2..4b9c189e21510d 100644
--- a/lld/COFF/Driver.cpp
+++ b/lld/COFF/Driver.cpp
@@ -2812,10 +2812,10 @@ void LinkerDriver::linkerMain(ArrayRef<const char *> argsArr) {
   // Handle /call-graph-ordering-file and /call-graph-profile-sort (default on).
   if (config->callGraphProfileSort) {
     llvm::TimeTraceScope timeScope("Call graph");
-    if (auto *arg = args.getLastArg(OPT_call_graph_ordering_file)) {
+    if (auto *arg = args.getLastArg(OPT_call_graph_ordering_file))
       parseCallGraphFile(arg->getValue());
-    }
-    readCallGraphsFromObjectFiles(ctx);
+    else
+      readCallGraphsFromObjectFiles(ctx);
   }
 
   // Handle /print-symbol-order.
diff --git a/lld/ELF/Driver.cpp b/lld/ELF/Driver.cpp
index f573a8d3e19f3b..e8e99fa874b5d5 100644
--- a/lld/ELF/Driver.cpp
+++ b/lld/ELF/Driver.cpp
@@ -3215,11 +3215,12 @@ template <class ELFT> void LinkerDriver::link(opt::InputArgList &args) {
 
   // Read the callgraph now that we know what was gced or icfed
   if (ctx.arg.callGraphProfileSort != CGProfileSortKind::None) {
-    if (auto *arg = args.getLastArg(OPT_call_graph_ordering_file))
+    if (auto *arg = args.getLastArg(OPT_call_graph_ordering_file)) {
       if (std::optional<MemoryBufferRef> buffer =
               readFile(ctx, arg->getValue()))
         readCallGraph(ctx, *buffer);
-    readCallGraphsFromObjectFiles<ELFT>(ctx);
+    } else
+      readCallGraphsFromObjectFiles<ELFT>(ctx);
   }
 
   // Write the result to the file.
diff --git a/lld/test/COFF/cgprofile-obj.s b/lld/test/COFF/cgprofile-obj.s
index b267850c46382c..da8011724d9859 100644
--- a/lld/test/COFF/cgprofile-obj.s
+++ b/lld/test/COFF/cgprofile-obj.s
@@ -2,9 +2,12 @@
 
 # RUN: llvm-mc -filetype=obj -triple=x86_64-pc-win32 %s -o %t
 # RUN: lld-link /subsystem:console /entry:A %t /out:%t2 /debug:symtab
-# RUN: llvm-nm --numeric-sort %t2 | FileCheck %s
+# RUN: llvm-nm --numeric-sort %t2 | FileCheck %s --check-prefix=CG-OBJ
 # RUN: lld-link /call-graph-profile-sort:no /subsystem:console /entry:A %t /out:%t3 /debug:symtab
 # RUN: llvm-nm --numeric-sort %t3 | FileCheck %s --check-prefix=NO-CG
+# RUN: echo "D A 200" > %t.call_graph
+# RUN: lld-link /subsystem:console /entry:A %t /out:%t4 /debug:symtab /call-graph-ordering-file:%t.call_graph
+# RUN: llvm-nm --numeric-sort %t4 | FileCheck %s --check-prefix=CG-OBJ-OF
 
     .section    .text,"ax", one_only, D
 D:
@@ -33,13 +36,17 @@ Aa:
     .cg_profile B, C, 30
     .cg_profile C, D, 90
 
-# CHECK: 140001000 T A
-# CHECK: 140001001 T B
-# CHECK: 140001002 T C
-# CHECK: 140001003 t D
-
+# CG-OBJ: 140001000 T A
+# CG-OBJ: 140001001 T B
+# CG-OBJ: 140001002 T C
+# CG-OBJ: 140001003 t D
 
 # NO-CG: 140001000 t D
 # NO-CG: 140001001 T C
 # NO-CG: 140001002 T B
 # NO-CG: 140001003 T A
+
+# CG-OBJ-OF: 140001000 t D
+# CG-OBJ-OF: 140001001 T A
+# CG-OBJ-OF: 140001004 T C
+# CG-OBJ-OF: 140001005 T B
diff --git a/lld/test/ELF/cgprofile-obj.s b/lld/test/ELF/cgprofile-obj.s
index 0848adc5e4279a..14016658707af4 100644
--- a/lld/test/ELF/cgprofile-obj.s
+++ b/lld/test/ELF/cgprofile-obj.s
@@ -2,12 +2,15 @@
 
 # RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o
 # RUN: ld.lld -e A %t.o -o %t
-# RUN: llvm-nm --no-sort %t | FileCheck %s
+# RUN: llvm-nm --no-sort %t | FileCheck %s --check-prefix=CG-OBJ
 # RUN: ld.lld --call-graph-profile-sort=none -e A %t.o -o %t
 # RUN: llvm-nm --no-sort %t | FileCheck %s --check-prefix=NO-CG
 ## --no-call-graph-profile-sort is an alias for --call-graph-profile-sort=none.
 # RUN: ld.lld --no-call-graph-profile-sort -e A %t.o -o %t1
 # RUN: cmp %t %t1
+# RUN: echo "D A 200" > %t.call_graph
+# RUN: ld.lld -e A %t.o -call-graph-ordering-file=%t.call_graph -o %t2
+# RUN: llvm-nm --no-sort %t2 | FileCheck %s --check-prefix=CG-OBJ-OF
 
     .section    .text.D,"ax",@progbits
 D:
@@ -36,12 +39,17 @@ Aa:
     .cg_profile B, C, 30
     .cg_profile C, D, 90
 
-# CHECK: 0000000000201123 t D
-# CHECK: 0000000000201122 T C
-# CHECK: 0000000000201121 T B
-# CHECK: 0000000000201120 T A
+# CG-OBJ: 0000000000201123 t D
+# CG-OBJ: 0000000000201122 T C
+# CG-OBJ: 0000000000201121 T B
+# CG-OBJ: 0000000000201120 T A
 
 # NO-CG: 0000000000201120 t D
 # NO-CG: 0000000000201121 T C
 # NO-CG: 0000000000201122 T B
 # NO-CG: 0000000000201123 T A
+
+# CG-OBJ-OF: 0000000000201120 t D
+# CG-OBJ-OF: 0000000000201124 T C
+# CG-OBJ-OF: 0000000000201125 T B
+# CG-OBJ-OF: 0000000000201121 T A

Copy link
Member

@MaskRay MaskRay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I believe that --call-graph-ordering-file is under-utilized and the interaction wasn't well studied. Have you tried some examples where not aggregation provides larger performance win?

@HaohaiWen
Copy link
Contributor Author

HaohaiWen commented Jan 5, 2025

Makes sense. I believe that --call-graph-ordering-file is under-utilized and the interaction wasn't well studied. Have you tried some examples where not aggregation provides larger performance win?

I was trying to feed precise call graph profile to lld and found the reorder behavior was not expected due to this issue. Regarding the performance, I didn't see significant difference, but the averaged branch distance (aggregation of LBR address difference / number of LBR) decreased with this patch.

@HaohaiWen HaohaiWen merged commit 2d9d291 into llvm:main Jan 5, 2025
8 checks passed
@HaohaiWen HaohaiWen deleted the lld branch January 5, 2025 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants