Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
387 changes: 387 additions & 0 deletions llvm/docs/PGOProfileFormat.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,387 @@
=====================
IRPGO Profile Format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IRPGO --> Instrumentation PGO. Note that Frontend PGO uses the same format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

=====================

.. contents::
:local:


Overview
==========

IR-based instrumentation (IRPGO) and its context-sensitive variant (CS-IRPGO)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instrumentation PGO (both IR based and Frontend based).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and removed IRPGO only terms like (LLVM IR, basic block counters) from the doc.

inserts `llvm.instrprof.*` `code generator intrinsics <https://llvm.org/docs/LangRef.html#code-generator-intrinsics>`_
in LLVM IR to generate profiles. This document describes two binary profile
formats (raw and indexed) used by IR-based instrumentation.

.. note::

Both the compiler-rt profiling infrastructure and profile format are general
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage test uses (frontend) PGO instrumentation and coverage mapping. The format for coverageMap is not included in this document. Similarly the temporal profiling is not covered here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded based on my understanding that "frontend PGO instrumentation profiles have two use cases, PGO and source coverage" and the input that coverage mapping has its own format. PTAL.

and could support other use cases (e.g., coverage and temporal profiling).
This document will focus on IRPGO while briefly introducing other use cases
with pointers.

Raw PGO Profile Format
========================

The raw PGO profile is generated by running the instrumented binary. It is a
memory dump of the profile data.

Two kinds of frequently used profile information are function's basic block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instrumented binary currently collects two kinds of profile data: ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

counters and its (various flavors of) value profiles. A function's profiled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The profile data for a function can span ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to "The profile data for a function span across several sections in the profile", given the control structure and counters are in two sections.

information span across several sections in the profile.

General Storage Layout
-----------------------

A raw profile for an executable [1]_ consists of a profile header and several
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also shared libary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

sections. The storage layout is illustrated below. Generally, when raw profile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the raw profile ..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

is read into an memory buffer, the actual byte offset of a section is inferred
from the section's order in the layout and size information of all sections
ahead of it.

::

+----+-----------------------+
| | Magic |
| +-----------------------+
| | Version |
| +-----------------------+
H | Size Info for |
E | Section 1 |
A +-----------------------+
D | Size Info for |
E | Section 2 |
R +-----------------------+
| | ... |
| +-----------------------+
| | Size Info for |
| | Section N |
+----+-----------------------+
P | Section 1 |
A +-----------------------+
Y | Section 2 |
L +-----------------------+
O | ... |
A +-----------------------+
D | Section N |
+----+-----------------------+


.. note::
Sections might be padded to meet platform-specific alignment requirements.
For simplicity, header fields and data sections solely for padding purpose
are omitted in the data layout graph above and the rest of this document.

Header
-------

``Magic``
With the magic number, data consumer could detect profile format and
endianness of the data, and quickly tells whether/how to continue reading.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove 'quickly'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


``Version``
The lower 32 bits specifies the actual version and the most significant 32
bits specify the variant types of the profile. IRPGO and CS-IRPGO are two
variant types.

``BinaryIdsSize``
The byte size of binary id section.

``NumData``
The number of per-function profile data control structures. The byte size of
profile data section could be computed with this field.

``NumCounter``
The number of entries in the profile counter section. The byte size of counter
section could be computed with this field.

``NumBitmapBytes``
The number of bytes in the profile bitmap section.

``NamesSize``
The number of bytes in the name section.

``CountersDelta``
Records the in-memory address difference between the data and counter section,
i.e., `start(__llvm_prf_cnts) - start(__llvm_prf_data)`. It's used jointly
with the in-memory address difference of profile data record and its counter
to find the counter of a profile data record. Check out calculation-of-counter-offset_
for details.

``BitmapDelta``
Records the in-memory address difference between the data and bitmap section,
i.e., `start(__llvm_prf_bits) - start(__llvm_prf_data)`. It's used jointly
with the in-memory address difference of a profile data record and its bitmap
to find the bitmap of a profile data record, in a similar to how counters are
referenced as explained by calculation-of-counter-offset_ .

``NamesDelta``
Records the in-memory address of compressed name section. Not used except for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be uncompressed too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed "compressed" as whether compressed or not is not very important for the documentation of this field.

raw profile reader error checking.

``ValueKindLast``
Records the number of value kinds. As of writing, two kinds of value profiles
are supported. `IndirectCallTarget` is to profile the frequent callees of
indirect call instructions and `MemOPSize` is for memory intrinsic function
size profiling.

The number of value kinds affects the byte size of per function profile data
control structure.

Payload Sections
------------------

Binary Ids
^^^^^^^^^^^
Stores the binary ids of the instrumented binaries to associate binaries with
profiles for source code coverage. See `Binary Id RFC`_ for introduction.

.. _`Binary Id RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html

Profile Data
^^^^^^^^^^^^^

This section stores per-function profile data control structure. The in-memory
representation of the control structure is `__llvm_profile_data` and the fields
are defined by `INSTRPROFDATA` macro. Some fields are used to reference data
from other sections in the profile. The fields are documented as follows:

``NameRef``
The MD5 of the function's IRPGO name. IRPGO name has the format
`[<filepath>;]<linkage-name>` where `<filepath>;` is provided for local-linkage
functions to tell possibly identical function names.

``FuncHash``
A fingerprint of the function's control flow graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includes CFG plus some more stuff (memory ops I think). Can you put in a link to the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly reworded and added a link to computeCFGHash


``CounterPtr``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the relative distance (offset) in bytes between the function counter and the start of the counter section>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is correct. (My comment was based on old implementation before recent changes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the recent binary profile correlation effort from @ZequanWu , CounterPtr records the address of counters if I'm reading correctly.

I updated the documentation to point out fields that might have different ways of interpretation. PTAL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CounterPtr is still a relative address (__profc_foo - __profd_foo) in default mode. Under binary profile correlation mode, it will just be the absolute address of the counter __profc_foo.

The in-memory address difference between profile data and its corresponding counters.

``BitmapPtr``
The in-memory address difference between profile data and its bitmap.

``FunctionPointer``
Records the function address when instrumented binary runs. This is used to
map the profiled callee address of indirect calls to the `NameRef` during
conversion from raw to indexed profiles.

``Values``
Represents value profiles in a two dimensional array. The number of elements
in the first dimension is the number of instrumented value sites across all
kinds. Each element in the first dimension is the head of a linked list, and
the each element in the second dimension is linked list element, carrying
`<profiled-value, count>` as payload. This is used by compiler runtime when
writing out value profiles.

``NumCounters``
The number of counters for the instrumented function.

``NumValueSites``
This is an array of counters, and each counter represents the number of
instrumented sites for a kind of value in the function.

``NumBitmapBytes``
The number of bitmap bytes for the function.

Profile Counters
^^^^^^^^^^^^^^^^^

For IRPGO [2]_, the counters within an instrumented function are stored contiguously
and in an order that is consistent with basic block selection in the instrumentation
pass.

.. _calculation-of-counter-offset:

So how are function counters associated with a function?

Basically, the profile reader iterates per-function control structure (from the
profile data section) and makes use of the recorded relative distances, as
illustrated below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is clearer to use an equation: CounterOffset(Func) = Data(Func).CounterPtr + Counter_Delta.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw some equation below.


::

+ --> start(__llvm_prf_data) --> +---------------------+ ------------+
| | Data 1 | |
| +---------------------+ =====|| |
| | Data 2 | || |
| +---------------------+ || |
| | ... | || |
Counter| +---------------------+ || |
Delta | | Data N | || |
| +---------------------+ || | CounterPtr1
| || |
| CounterPtr2 || |
| || |
| || |
+ --> start(__llvm_prf_cnts) --> +---------------------+ || |
| ... | || |
+---------------------+ -----||----+
| Counter 1 | ||
+---------------------+ ||
| ... | ||
+---------------------+ =====||
| Counter 2 |
+---------------------+
| ... |
+---------------------+
| Counter N |
+---------------------+


In the graph,

* The profile header records `CounterDelta` with the value as `start(__llvm_prf_cnts) - start(__llvm_prf_data)`.
We will call it `CounterDeltaInitVal` below for convenience.
* For each profile data record, `CounterPtrN` is recorded as `start(Counter) - start(ProfileData)`.

Each time the reader advances to the next data record, it updates `CounterDelta` to minus the size of one `ProfileData`.

For the counter corresponding to the first data record, the byte offset
relative to the start of the counter section is calculated as `CounterPtr1 - CounterDeltaInitVal`.
When profile reader advances to the second data record, note `CounterDelta` is now `CounterDeltaInitVal - sizeof(ProfileData)`.
Thus the byte offset relative to the start of the counter section is calculated as `CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))`.

Bitmap
^^^^^^^
This section is used for source-based MC/DC code coverage. Check out `Bitmap RFC`_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expand MC/DC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

if interested.

.. _`Bitmap RFC`: https://discourse.llvm.org/t/rfc-source-based-mc-dc-code-coverage/59244

Names
^^^^^^

This section contains the concatenated string of function IRPGO names. If
compressed, zlib compression algorithm is used.

Function names serve as keys in the PGO data hash table when raw profiles are
converted into indexed profiles. They are also crucial for `llvm-profdata` to
show the profiles in a human-readable way.

Value Profile Data
^^^^^^^^^^^^^^^^^^^^

This section contains the profile data for value profiling.

The value profiles corresponding to a profile data are serialized contiguously
as one record, and value profile records are stored in the same order as the
respective profile data, such that a raw profile reader advances the pointer to
profile data and the pointer to value profile records simutaneously [3]_ to find
value profiles for a per function, per cfg fingerprint profile data.

Indexed PGO Profile Format
===========================

General Storage Layout
-----------------------

::

+-----------------------+---+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a comment in the code to update this documentation when the format changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment close to the header definition for both raw and indexed profiles.

Uses link https://llvm.org/docs/InstrProfileFormat.html assuming file name will be InstrProfileFormat.rst.

| Magic | |
+-----------------------+ |
| Version | |
+-----------------------+ |
| HashType | H
+-----------------------+ E
+-------| HashOffset | A
| +-----------------------+ D
+-----------| MemProfOffset | E
| | +-----------------------+ R
| | | BinaryIdOffset | |
| | +-----------------------+ |
+---------------| TemporalProf- | |
| | | | TracesOffset | |
| | | +-----------------------+---+
| | | | Profile Summary | |
| | | +-----------------------+ P
| | +------>| Function PGO data | A
| | +-----------------------+ Y
| +---------- | MemProf profile data | L
| +-----------------------+ O
| | Binary Ids | A
| +-----------------------+ D
+-------------->| Temporal profiles | |
+-----------------------+---+

Header
--------

``Magic``
The purpose of the magic number is to be able to quickly tell if the profile
is an indexed profile.

``Version``
Similar to raw profile version, the lower 32 bits specifies the version of the
indexed profile and the most significant 32 bits are reserved to specify the
variant types of the profile.

``HashType``
The hashing scheme for on-disk hash table keys. Only MD5 hashing is used as of
writing.

``HashOffset``
An on-disk hash table stores the per-function profile records.
Precisely speaking, `HashOffset` records the offset of this hash table's
metadata (i.e., the number of buckets and entries), which follows right after
the payload of the entire hash table.

``MemProfOffset``
Records the byte offset of MemProf profiling data.

``BinaryIdOffset``
Records the byte offset of binary id sections.

``TemporalProfTracesOffset``
Records the byte offset of temporal profiles.

Payload Sections
------------------

(CS) Profile Summary
^^^^^^^^^^^^^^^^^^^^^
This section is right after profile header. It stores the serialized profile
summary. For context-sensitive IRPGO, this section stores an additional profile
summary corresponding to the context-sensitive profiles.

Function PGO data
^^^^^^^^^^^^^^^^^^
This section stores functions and their PGO profiling data as an on-disk hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Profile data for functions with the same name are grouped together and share one hash table entry (the functions may come from different shared libraries for instance). The profile data for them are organized as a sequence of key-value pair where the key is the funcHash (CFG based for IR PGO), and the value is profile counters for the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this.

table. The key of a hash table entry is function's PGO name, and the in-memory
representation of value is a map. The key of this map is CFG hash, and the value
is C++ struct `llvm::InstrProfRecord`. The C++ struct collects the profiling
information like counters and value profiles.

MemProf Profile data
^^^^^^^^^^^^^^^^^^^^^^
This section stores function's memory profiling data. See
`MemProf binary serialization format RFC`_ for the design.

.. _`MemProf binary serialization format RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html

Binary Ids
^^^^^^^^^^^^^^^^^^^^^^
The section to carry on binary-id information from raw profiles.

Temporal Profile Traces
^^^^^^^^^^^^^^^^^^^^^^^^
The section to carry on temporal profile information from raw profiles.
See `Temporal profiling RFC`_ for an overview.

.. _`Temporal profiling RFC`: https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068

Profile Data Usage
=======================================

`llvm-profdata` is the command line tool to display and process profile data.
For supported usages, check out its `documentation <https://llvm.org/docs/CommandGuide/llvm-profdata.html>`_.


.. [1] A raw profile file could contain multiple raw profiles. Raw profile
reader could parse all raw profiles from the file correctly.
.. [2] The counter section is used by a few variant types (like coverage and
temporal profiling) and might have different semantics there.
.. [3] The step size of data pointer is the `sizeof(ProfileData)`, and the step
size of value profile pointer is calcuated based on the number of collected
values.
Loading