Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CMake][PGO] Add option for using an external project to generate profile data #78879

Merged
merged 5 commits into from
Feb 2, 2024

Conversation

tstellar
Copy link
Collaborator

@tstellar tstellar commented Jan 21, 2024

The new CLANG_PGO_TRAINING_DATA_SOURCE_DIR allows users to specify a CMake project to use for generating the profile data. For example, to use the llvm-test-suite to generate profile data you would do:

$ cmake -G Ninja -B build -S llvm -C /clang/cmake/caches/PGO.cmake \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=
-DBOOTSTRAP_CLANG_PGO_TRAINING_DEPS=runtimes

Note that the CLANG_PERF_TRAINING_DEPS has been renamed to CLANG_PGO_TRAINING_DEPS.

This fixes the build since 8f90e69
which made libcxxabi use llvm's libunwind by default.
…file data.

The new CLANG_PGO_TRAINING_DATA_SOURCE_DIR allows users to specify a CMake
project to use for generating the profile data.

For example, to use the llvm-test-suite to generate profile data you
would do:

$ cmake -G Ninja -B build -S llvm -C <path to source>/clang/cmake/caches/PGO.cmake \
        -DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=<path to llvm-test-suite> \
        -DBOOTSTRAP_CLANG_PERF_TRAINING_DEPS=runtimes
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jan 21, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 21, 2024

@llvm/pr-subscribers-clang

Author: Tom Stellard (tstellar)

Changes

The new CLANG_PGO_TRAINING_DATA_SOURCE_DIR allows users to specify a CMake project to use for generating the profile data. For example, to use the llvm-test-suite to generate profile data you would do:

$ cmake -G Ninja -B build -S llvm -C <path to source>/clang/cmake/caches/PGO.cmake \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=<path to llvm-test-suite>
-DBOOTSTRAP_CLANG_PERF_TRAINING_DEPS=runtimes


Full diff: https://github.com/llvm/llvm-project/pull/78879.diff

4 Files Affected:

  • (modified) clang/cmake/caches/PGO.cmake (+1-1)
  • (modified) clang/utils/perf-training/CMakeLists.txt (+11-2)
  • (modified) clang/utils/perf-training/perf-helper.py (+9-7)
  • (modified) llvm/docs/AdvancedBuilds.rst (+23)
diff --git a/clang/cmake/caches/PGO.cmake b/clang/cmake/caches/PGO.cmake
index e1d0585e453f825..15bc755d110d19a 100644
--- a/clang/cmake/caches/PGO.cmake
+++ b/clang/cmake/caches/PGO.cmake
@@ -2,7 +2,7 @@ set(CMAKE_BUILD_TYPE RELEASE CACHE STRING "")
 set(CLANG_ENABLE_BOOTSTRAP ON CACHE BOOL "")
 
 set(LLVM_ENABLE_PROJECTS "clang;lld" CACHE STRING "")
-set(LLVM_ENABLE_RUNTIMES "compiler-rt;libcxx;libcxxabi" CACHE STRING "")
+set(LLVM_ENABLE_RUNTIMES "compiler-rt;libcxx;libcxxabi;libunwind" CACHE STRING "")
 
 set(LLVM_TARGETS_TO_BUILD Native CACHE STRING "")
 set(BOOTSTRAP_LLVM_BUILD_INSTRUMENTED ON CACHE BOOL "")
diff --git a/clang/utils/perf-training/CMakeLists.txt b/clang/utils/perf-training/CMakeLists.txt
index c6d51863fb1b5c2..f9d673b2e92e775 100644
--- a/clang/utils/perf-training/CMakeLists.txt
+++ b/clang/utils/perf-training/CMakeLists.txt
@@ -1,6 +1,10 @@
+include(LLVMExternalProjectUtils)
+
 set(CLANG_PGO_TRAINING_DATA "${CMAKE_CURRENT_SOURCE_DIR}" CACHE PATH
   "The path to a lit testsuite containing samples for PGO and order file generation"
   )
+set(CLANG_PGO_TRAINING_DATA_SOURCE_DIR OFF CACHE STRING "Path to source directory containing cmake project with source files to use for generating pgo data")
+set(CLANG_PERF_TRAINING_DEPS "" CACHE STRING "Extra dependencies needed to build the PGO training data.")
 
 if(LLVM_BUILD_INSTRUMENTED)
   configure_lit_site_cfg(
@@ -15,7 +19,7 @@ if(LLVM_BUILD_INSTRUMENTED)
     )
 
   add_custom_target(clear-profraw
-    COMMAND "${Python3_EXECUTABLE}" ${CMAKE_CURRENT_SOURCE_DIR}/perf-helper.py clean ${CMAKE_CURRENT_BINARY_DIR} profraw
+    COMMAND "${Python3_EXECUTABLE}" ${CMAKE_CURRENT_SOURCE_DIR}/perf-helper.py clean ${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_BINARY_DIR}/profiles/ profraw
     COMMENT "Clearing old profraw data")
 
   if(NOT LLVM_PROFDATA)
@@ -26,9 +30,14 @@ if(LLVM_BUILD_INSTRUMENTED)
     message(STATUS "To enable merging PGO data LLVM_PROFDATA has to point to llvm-profdata")
   else()
     add_custom_target(generate-profdata
-      COMMAND "${Python3_EXECUTABLE}" ${CMAKE_CURRENT_SOURCE_DIR}/perf-helper.py merge ${LLVM_PROFDATA} ${CMAKE_CURRENT_BINARY_DIR}/clang.profdata ${CMAKE_CURRENT_BINARY_DIR}
+      COMMAND "${Python3_EXECUTABLE}" ${CMAKE_CURRENT_SOURCE_DIR}/perf-helper.py merge ${LLVM_PROFDATA} ${CMAKE_CURRENT_BINARY_DIR}/clang.profdata ${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_BINARY_DIR}/profiles/
       COMMENT "Merging profdata"
       DEPENDS generate-profraw)
+    if (CLANG_PGO_TRAINING_DATA_SOURCE_DIR)
+      llvm_ExternalProject_Add(generate-profraw-external ${CLANG_PGO_TRAINING_DATA_SOURCE_DIR}
+              USE_TOOLCHAIN EXLUDE_FROM_ALL NO_INSTALL DEPENDS generate-profraw)
+      add_dependencies(generate-profdata generate-profraw-external)
+    endif()
   endif()
 endif()
 
diff --git a/clang/utils/perf-training/perf-helper.py b/clang/utils/perf-training/perf-helper.py
index 99d6a3333b6ef08..844aa274f049aaa 100644
--- a/clang/utils/perf-training/perf-helper.py
+++ b/clang/utils/perf-training/perf-helper.py
@@ -30,26 +30,28 @@ def findFilesWithExtension(path, extension):
 
 
 def clean(args):
-    if len(args) != 2:
+    if len(args) < 2:
         print(
-            "Usage: %s clean <path> <extension>\n" % __file__
+            "Usage: %s clean <paths> <extension>\n" % __file__
             + "\tRemoves all files with extension from <path>."
         )
         return 1
-    for filename in findFilesWithExtension(args[0], args[1]):
-        os.remove(filename)
+    for path in args[1:-1]:
+        for filename in findFilesWithExtension(path, args[-1]):
+            os.remove(filename)
     return 0
 
 
 def merge(args):
-    if len(args) != 3:
+    if len(args) < 3:
         print(
-            "Usage: %s merge <llvm-profdata> <output> <path>\n" % __file__
+            "Usage: %s merge <llvm-profdata> <output> <paths>\n" % __file__
             + "\tMerges all profraw files from path into output."
         )
         return 1
     cmd = [args[0], "merge", "-o", args[1]]
-    cmd.extend(findFilesWithExtension(args[2], "profraw"))
+    for i in range(2, len(args)):
+        cmd.extend(findFilesWithExtension(args[i], "profraw"))
     subprocess.check_call(cmd)
     return 0
 
diff --git a/llvm/docs/AdvancedBuilds.rst b/llvm/docs/AdvancedBuilds.rst
index 960b19fa5317f3b..3ff630a1425691d 100644
--- a/llvm/docs/AdvancedBuilds.rst
+++ b/llvm/docs/AdvancedBuilds.rst
@@ -145,6 +145,29 @@ that also enables ThinTLO, use the following command:
       -DPGO_INSTRUMENT_LTO=Thin \
       <path to source>/llvm
 
+By default, clang will generate profile data by compiling a simple
+hello world program.  You can also tell clang use an external
+project for generating profile data that may be a better fit for your
+use case.  The project you specify must either be a lit test suite
+(use the CLANG_PGO_TRAINING_DATA option) or a CMake project (use the
+CLANG_PERF_TRAINING_DATA_SOURCE_DIR option).
+
+For example, If you wanted to use the
+`LLVM Test Suite <https://github.com/llvm/llvm-test-suite/>`_ to generate
+profile data you would use the following command:
+
+.. code-block:: console
+
+  $ cmake -G Ninja -C <path to source>/clang/cmake/caches/PGO.cmake \
+       -DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=<path to llvm-test-suite> \
+       -DBOOTSTRAP_CLANG_PERF_TRAINING_DEPS=runtimes
+
+The BOOTSTRAP\_ prefixes tells CMake to pass the variables on to the instrumented
+stage two build.  And the CLANG_PERF_TRAINING_DEPS option let's you specify
+additional build targets to build before building the external project.  The
+LLVM Test Suite requires compiler-rt to build, so we need to add the
+`runtimes` target as a dependency.
+
 After configuration, building the stage2-instrumented-generate-profdata target
 will automatically build the stage1 compiler, build the instrumented compiler
 with the stage1 compiler, and then run the instrumented compiler against the

Copy link
Contributor

@kwk kwk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@llvm-beanz llvm-beanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

set(CLANG_PGO_TRAINING_DATA "${CMAKE_CURRENT_SOURCE_DIR}" CACHE PATH
"The path to a lit testsuite containing samples for PGO and order file generation"
)
set(CLANG_PGO_TRAINING_DATA_SOURCE_DIR OFF CACHE STRING "Path to source directory containing cmake project with source files to use for generating pgo data")
set(CLANG_PERF_TRAINING_DEPS "" CACHE STRING "Extra dependencies needed to build the PGO training data.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this variable being used anywhere, should it be added as a dependency of generate-profraw-external? I'd also consider naming it CLANG_PGO_TRAINING_DEPS for consistency with other variables in this file.

@@ -15,7 +19,7 @@ if(LLVM_BUILD_INSTRUMENTED)
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@petrhosek CLANG_PERF_TRAINING_DEPS was an existing variable that's used here. I think it was meant to be added to cache files, but I added it to CMakeLists.txt file here so that it could be passed on the command line without modifying the cache files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like I added that in https://reviews.llvm.org/D138974 but I'm not aware of it being used anywhere at the moment. I'd still prefer renaming it to CLANG_PGO_TRAINING_DEPS for consistency, especially that we're now adding it to cache.

@aaupov
Copy link
Contributor

aaupov commented Jan 30, 2024

I'm in favor of this change especially in context of CSSPGO profiling (#79942) and BOLT LBR profiling (#69133).

@@ -2,7 +2,7 @@ set(CMAKE_BUILD_TYPE RELEASE CACHE STRING "")
set(CLANG_ENABLE_BOOTSTRAP ON CACHE BOOL "")

set(LLVM_ENABLE_PROJECTS "clang;lld" CACHE STRING "")
set(LLVM_ENABLE_RUNTIMES "compiler-rt;libcxx;libcxxabi" CACHE STRING "")
set(LLVM_ENABLE_RUNTIMES "compiler-rt;libcxx;libcxxabi;libunwind" CACHE STRING "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to avoid touching to prevent conflicts with #78869.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaupov How does this look now?

)

add_custom_target(clear-profraw
COMMAND "${Python3_EXECUTABLE}" ${CMAKE_CURRENT_SOURCE_DIR}/perf-helper.py clean ${CMAKE_CURRENT_BINARY_DIR} profraw
COMMAND "${Python3_EXECUTABLE}" ${CMAKE_CURRENT_SOURCE_DIR}/perf-helper.py clean ${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_BINARY_DIR}/profiles/ profraw
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this new path (${CMAKE_BINARY_DIR}/profiles/) come from?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aaupov That's the directory where clang will write it's profiles to by default.

Copy link
Contributor

@aaupov aaupov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tstellar tstellar merged commit dd0356d into llvm:main Feb 2, 2024
4 of 5 checks passed
llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Feb 3, 2024
…file data (llvm#78879)

The new CLANG_PGO_TRAINING_DATA_SOURCE_DIR allows users to specify a
CMake project to use for generating the profile data. For example, to
use the llvm-test-suite to generate profile data you would do:

$ cmake -G Ninja -B build -S llvm -C <path to
source>/clang/cmake/caches/PGO.cmake \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=<path to llvm-test-suite>
\
        -DBOOTSTRAP_CLANG_PGO_TRAINING_DEPS=runtimes

Note that the CLANG_PERF_TRAINING_DEPS has been renamed to
CLANG_PGO_TRAINING_DEPS.

---------

Co-authored-by: Petr Hosek <[email protected]>
(cherry picked from commit dd0356d)
agozillon pushed a commit to agozillon/llvm-project that referenced this pull request Feb 5, 2024
…file data (llvm#78879)

The new CLANG_PGO_TRAINING_DATA_SOURCE_DIR allows users to specify a
CMake project to use for generating the profile data. For example, to
use the llvm-test-suite to generate profile data you would do:

$ cmake -G Ninja -B build -S llvm -C <path to
source>/clang/cmake/caches/PGO.cmake \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=<path to llvm-test-suite>
\
        -DBOOTSTRAP_CLANG_PGO_TRAINING_DEPS=runtimes

Note that the CLANG_PERF_TRAINING_DEPS has been renamed to
CLANG_PGO_TRAINING_DEPS.

---------

Co-authored-by: Petr Hosek <[email protected]>
llvmbot pushed a commit to llvmbot/llvm-project that referenced this pull request Feb 6, 2024
…file data (llvm#78879)

The new CLANG_PGO_TRAINING_DATA_SOURCE_DIR allows users to specify a
CMake project to use for generating the profile data. For example, to
use the llvm-test-suite to generate profile data you would do:

$ cmake -G Ninja -B build -S llvm -C <path to
source>/clang/cmake/caches/PGO.cmake \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=<path to llvm-test-suite>
\
        -DBOOTSTRAP_CLANG_PGO_TRAINING_DEPS=runtimes

Note that the CLANG_PERF_TRAINING_DEPS has been renamed to
CLANG_PGO_TRAINING_DEPS.

---------

Co-authored-by: Petr Hosek <[email protected]>
(cherry picked from commit dd0356d)
tstellar added a commit to tstellar/llvm-project that referenced this pull request Feb 14, 2024
…file data (llvm#78879)

The new CLANG_PGO_TRAINING_DATA_SOURCE_DIR allows users to specify a
CMake project to use for generating the profile data. For example, to
use the llvm-test-suite to generate profile data you would do:

$ cmake -G Ninja -B build -S llvm -C <path to
source>/clang/cmake/caches/PGO.cmake \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=<path to llvm-test-suite>
\
        -DBOOTSTRAP_CLANG_PGO_TRAINING_DEPS=runtimes

Note that the CLANG_PERF_TRAINING_DEPS has been renamed to
CLANG_PGO_TRAINING_DEPS.

---------

Co-authored-by: Petr Hosek <[email protected]>
(cherry picked from commit dd0356d)
tstellar added a commit to tstellar/llvm-project that referenced this pull request Feb 14, 2024
…file data (llvm#78879)

The new CLANG_PGO_TRAINING_DATA_SOURCE_DIR allows users to specify a
CMake project to use for generating the profile data. For example, to
use the llvm-test-suite to generate profile data you would do:

$ cmake -G Ninja -B build -S llvm -C <path to
source>/clang/cmake/caches/PGO.cmake \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=<path to llvm-test-suite>
\
        -DBOOTSTRAP_CLANG_PGO_TRAINING_DEPS=runtimes

Note that the CLANG_PERF_TRAINING_DEPS has been renamed to
CLANG_PGO_TRAINING_DEPS.

---------

Co-authored-by: Petr Hosek <[email protected]>
(cherry picked from commit dd0356d)
tstellar added a commit to tstellar/llvm-project that referenced this pull request Feb 14, 2024
…file data (llvm#78879)

The new CLANG_PGO_TRAINING_DATA_SOURCE_DIR allows users to specify a
CMake project to use for generating the profile data. For example, to
use the llvm-test-suite to generate profile data you would do:

$ cmake -G Ninja -B build -S llvm -C <path to
source>/clang/cmake/caches/PGO.cmake \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=<path to llvm-test-suite>
\
        -DBOOTSTRAP_CLANG_PGO_TRAINING_DEPS=runtimes

Note that the CLANG_PERF_TRAINING_DEPS has been renamed to
CLANG_PGO_TRAINING_DEPS.

---------

Co-authored-by: Petr Hosek <[email protected]>
(cherry picked from commit dd0356d)
tstellar added a commit to tstellar/llvm-project that referenced this pull request Feb 14, 2024
…file data (llvm#78879)

The new CLANG_PGO_TRAINING_DATA_SOURCE_DIR allows users to specify a
CMake project to use for generating the profile data. For example, to
use the llvm-test-suite to generate profile data you would do:

$ cmake -G Ninja -B build -S llvm -C <path to
source>/clang/cmake/caches/PGO.cmake \
-DBOOTSTRAP_CLANG_PGO_TRAINING_DATA_SOURCE_DIR=<path to llvm-test-suite>
\
        -DBOOTSTRAP_CLANG_PGO_TRAINING_DEPS=runtimes

Note that the CLANG_PERF_TRAINING_DEPS has been renamed to
CLANG_PGO_TRAINING_DEPS.

---------

Co-authored-by: Petr Hosek <[email protected]>
(cherry picked from commit dd0356d)
@pointhex pointhex mentioned this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants