Skip to content

[llvm-profgen][NFC] Detect pre-aggregated format#191593

Closed
aaupov wants to merge 1 commit into
mainfrom
users/aaupov/spr/llvm-profgen-detect-pre-aggregated-format-1
Closed

[llvm-profgen][NFC] Detect pre-aggregated format#191593
aaupov wants to merge 1 commit into
mainfrom
users/aaupov/spr/llvm-profgen-detect-pre-aggregated-format-1

Conversation

@aaupov
Copy link
Copy Markdown
Contributor

@aaupov aaupov commented Apr 11, 2026

Distinguish the following profgen input formats:

  1. perf script with brstack only,
  2. perf script with call stack and brstack (hybrid),
  3. pre-aggregated input with count, call stack and brstack.

Pre-aggregated input lacks mmap information, so to enable using single input file for optimizing the main binary and shared libraries, the addresses will be augmented with buildid information in a follow-up (#190863).

Created using spr 1.3.4
@aaupov aaupov changed the title [llvm-profgen] Detect pre-aggregated format [llvm-profgen][NFC] Detect pre-aggregated format Apr 11, 2026
@aaupov aaupov marked this pull request as ready for review April 11, 2026 04:29
@llvmbot llvmbot added the PGO Profile Guided Optimizations label Apr 11, 2026
@llvmbot
Copy link
Copy Markdown
Member

llvmbot commented Apr 11, 2026

@llvm/pr-subscribers-pgo

Author: Amir Ayupov (aaupov)

Changes

Distinguish the following profgen input formats:

  1. perf script with brstack only,
  2. perf script with call stack and brstack (hybrid),
  3. pre-aggregated input with count, call stack and brstack.

Pre-aggregated input lacks mmap information, so to enable using single input file for optimizing the main binary and shared libraries, the addresses will be augmented with buildid information in a follow-up (#190863).


Full diff: https://github.com/llvm/llvm-project/pull/191593.diff

2 Files Affected:

  • (modified) llvm/tools/llvm-profgen/PerfReader.cpp (+9-6)
  • (modified) llvm/tools/llvm-profgen/PerfReader.h (+7-2)
diff --git a/llvm/tools/llvm-profgen/PerfReader.cpp b/llvm/tools/llvm-profgen/PerfReader.cpp
index 1dc59321fd91f..bbfde1256f2cc 100644
--- a/llvm/tools/llvm-profgen/PerfReader.cpp
+++ b/llvm/tools/llvm-profgen/PerfReader.cpp
@@ -365,9 +365,11 @@ PerfReaderBase::create(ProfiledBinary *Binary, PerfInputFile &PerfInput,
 
   PerfInput.Content =
       PerfScriptReader::checkPerfScriptType(PerfInput.InputFile);
-  if (PerfInput.Content == PerfContent::LBRStack) {
-    PerfReader.reset(
-        new HybridPerfReader(Binary, PerfInput.InputFile, PIDFilter));
+  if (PerfInput.Content == PerfContent::LBRStack ||
+      PerfInput.Content == PerfContent::AggLBRStack) {
+    auto *Reader = new HybridPerfReader(Binary, PerfInput.InputFile, PIDFilter);
+    Reader->setIsPreAggregated(PerfInput.Content == PerfContent::AggLBRStack);
+    PerfReader.reset(Reader);
   } else if (PerfInput.Content == PerfContent::LBR) {
     PerfReader.reset(new LBRPerfReader(Binary, PerfInput.InputFile, PIDFilter));
   } else {
@@ -1191,8 +1193,9 @@ PerfContent PerfScriptReader::checkPerfScriptType(StringRef FileName) {
   TraceStream TraceIt(FileName);
   uint64_t FrameAddr = 0;
   while (!TraceIt.isAtEoF()) {
-    // Skip the aggregated count
-    if (!TraceIt.getCurrentLine().getAsInteger(10, FrameAddr))
+    // Skip the aggregated count and detect pre-aggregated input.
+    bool HasAggCount = !TraceIt.getCurrentLine().getAsInteger(10, FrameAddr);
+    if (HasAggCount)
       TraceIt.advance();
 
     // Detect sample with call stack
@@ -1205,7 +1208,7 @@ PerfContent PerfScriptReader::checkPerfScriptType(StringRef FileName) {
     if (!TraceIt.isAtEoF()) {
       if (isLBRSample(TraceIt.getCurrentLine())) {
         if (Count > 0)
-          return PerfContent::LBRStack;
+          return HasAggCount ? PerfContent::AggLBRStack : PerfContent::LBRStack;
         else
           return PerfContent::LBR;
       }
diff --git a/llvm/tools/llvm-profgen/PerfReader.h b/llvm/tools/llvm-profgen/PerfReader.h
index 2a4c7594d3a93..83c4fb0447c5c 100644
--- a/llvm/tools/llvm-profgen/PerfReader.h
+++ b/llvm/tools/llvm-profgen/PerfReader.h
@@ -69,8 +69,9 @@ enum PerfFormat {
 // The type of perfscript content.
 enum PerfContent {
   UnknownContent = 0,
-  LBR = 1,      // Only LBR sample.
-  LBRStack = 2, // Hybrid sample including call stack and LBR stack.
+  LBR = 1,         // Only LBR sample.
+  LBRStack = 2,    // Hybrid sample including call stack and LBR stack.
+  AggLBRStack = 3, // Pre-aggregated hybrid sample.
 };
 
 struct PerfInputFile {
@@ -631,6 +632,8 @@ class PerfScriptReader : public PerfReaderBase {
   // receiving signals.
   static SmallVector<CleanupInstaller, 2> TempFileCleanups;
 
+  void setIsPreAggregated(bool V) { IsPreAggregated = V; }
+
 protected:
   // Check whether a given line is LBR sample
   static bool isLBRSample(StringRef Line);
@@ -676,6 +679,8 @@ class PerfScriptReader : public PerfReaderBase {
   std::set<uint64_t> InvalidReturnAddresses;
   // PID for the process of interest
   std::optional<int32_t> PIDFilter;
+  // Whether the input is pre-aggregated
+  bool IsPreAggregated = false;
 };
 
 /*

@aaupov aaupov requested review from HighW4y2H3ll and apolloww April 11, 2026 04:32
Copy link
Copy Markdown
Member

@HighW4y2H3ll HighW4y2H3ll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the pre-aggregated input? Is this the same as putting 1 as the aggregated count for each sample? Could you explain why we need the new format? And does this new format only applicable to LBRStack samples?

@aaupov
Copy link
Copy Markdown
Contributor Author

aaupov commented Apr 13, 2026

What is the pre-aggregated input? Is this the same as putting 1 as the aggregated count for each sample? Could you explain why we need the new format? And does this new format only applicable to LBRStack samples?

Pre-aggregated means there's a count line preceding call/branch stacks. It's worth distinguishing it as a separate format to allow buildid prefix only for it – because non-aggregated input comes directly from perf script and includes mmap events which serves as a main mechanism to distinguish addresses from the binary/DSOs.

@HighW4y2H3ll
Copy link
Copy Markdown
Member

Why do we need to restrict buildID annotation only for LBRStack samples? It looks like the current implementation is compatible with the pre-aggregated input and if there's no aggregated count the default will be 1:

uint64_t PerfScriptReader::parseAggregatedCount(TraceStream &TraceIt) {
// The aggregated count is optional, so do not skip the line and return 1 if
// it's unmatched
uint64_t Count = 1;
if (!TraceIt.getCurrentLine().getAsInteger(10, Count))
TraceIt.advance();
return Count;

@aaupov
Copy link
Copy Markdown
Contributor Author

aaupov commented Apr 16, 2026

Why do we need to restrict buildID annotation only for LBRStack samples?

We can add buildid prefix support in two ways:

  1. narrow: pre-aggregated only where it's needed,
  2. uniform: pre-aggregated and perf script input where it's not needed.

I chose option 1 to keep perf script parsing strict.

@aaupov aaupov closed this May 1, 2026
@github-actions github-actions Bot deleted the users/aaupov/spr/llvm-profgen-detect-pre-aggregated-format-1 branch May 4, 2026 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PGO Profile Guided Optimizations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants