Skip to content
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
395 changes: 395 additions & 0 deletions llvm/docs/PGOProfileFormat.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,395 @@
===================================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps name this ProfileFormat.rst, since so much is shared with pure code coverage applications. It's fine that the doc currently focuses on PGO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps name this ProfileFormat.rst, since so much is shared with pure code coverage applications.

Ack. I wonder if we want to use InstrumentationProfileFormat.rst since SamplePGO uses different format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InstrProfileFormat.rst sounds good to me.

Copy link
Contributor Author

@mingmingl-llvm mingmingl-llvm Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated filename to InstrProfileFormat.rst in a standalone local commit

The actual changes should be visible in the commit right before it

Instrumentation PGO Profile Format
===================================

.. contents::
:local:


Overview
=========

Instrumentation PGO inserts `llvm.instrprof.*` `code generator intrinsics`_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

in the code to generate profiles. This document describes two binary profile
formats (raw and indexed) used by instrumentation.

.. _`code generator intrinsics`: https://llvm.org/docs/LangRef.html#code-generator-intrinsics

.. note::
The instrumentation profile format supports non-PGO use cases (e.g., temporal
profiling). This document will focus on PGO. Source coverage uses both
frontend instrumentation profiles and coverage mapping. The format for
coverage mapping has its own `documentation`_.

.. _`documentation`: https://llvm.org/docs/CoverageMappingFormat.html

Raw Profile Format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Highlight compatibility guarantees of Raw Profile Format.
Also mention endianness of raw profile data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned version compatibility guarantees for raw and indexed format.

And mention the endianness where Magic field for raw profile header is documented, since the Magic field is used by raw profile reader to decide whether to swap bytes.

Relatedly, created #76312 to fix one issue related with endiannness.

===================

The raw profile is generated by running the instrumented binary. It is a memory
dump of the profile data.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/profile counters/profile data/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the comment means to say 's/profile data/profile counters'?

Nevertheless, I revised this to The raw profile data from an executable or a shared library consists of a header and multiple sections, with each section as a memory dump. The profile raw data needs to be reasonably compact and fast to generate.. PTAL.


The instrumented binary currently collects two kinds of profile data, counters
to profile branch probability and (various flavors of) value profiles. The
profile data for a function span across several sections in the profile.

General Storage Layout
-----------------------

A raw profile from an executable or a shared library [1]_ consists of a profile
header and several sections. The storage layout is illustrated below. Generally,
when the raw profile is read into an memory buffer, the actual byte offset of a
section is inferred from the section's order in the layout and size information
of all the sections ahead of it.

::

+----+-----------------------+
| | Magic |
| +-----------------------+
| | Version |
| +-----------------------+
H | Size Info for |
E | Section 1 |
A +-----------------------+
D | Size Info for |
E | Section 2 |
R +-----------------------+
| | ... |
| +-----------------------+
| | Size Info for |
| | Section N |
+----+-----------------------+
P | Section 1 |
A +-----------------------+
Y | Section 2 |
L +-----------------------+
O | ... |
A +-----------------------+
D | Section N |
+----+-----------------------+


.. note::
Sections might be padded to meet platform-specific alignment requirements.
For simplicity, header fields and data sections solely for padding purpose
are omitted in the data layout graph above and the rest of this document.

Header
-------

``Magic``
With the magic number, data consumer could detect profile format and
endianness of the data, and tells whether/how to continue reading.

``Version``
The lower 32 bits specifies the actual version and the most significant 32
bits specify the variant types of the profile. IR-based instrumentation PGO
and context-sensitive IR-based instrumentation PGO are two variant types.

``BinaryIdsSize``
The byte size of binary id section.

``NumData``
The number of per-function profile data control structures. The byte size of
profile data section could be computed with this field.

``NumCounter``
The number of entries in the profile counter section. The byte size of counter
section could be computed with this field.

``NumBitmapBytes``
The number of bytes in the profile bitmap section.

``NamesSize``
The number of bytes in the name section.

``CountersDelta``
Records the in-memory address difference between the data and counter section,
i.e., `start(__llvm_prf_cnts) - start(__llvm_prf_data)`. It's used jointly
with the in-memory address difference of profile data record and its counter
to find the counter of a profile data record. Check out calculation-of-counter-offset_
for details.

``BitmapDelta``
Records the in-memory address difference between the data and bitmap section,
i.e., `start(__llvm_prf_bits) - start(__llvm_prf_data)`. It's used jointly
with the in-memory address difference of a profile data record and its bitmap
to find the bitmap of a profile data record, in a similar way to how counters
are referenced as explained by calculation-of-counter-offset_ .

``NamesDelta``
Records the in-memory address of name section. Not used except for raw profile
reader error checking.

``ValueKindLast``
Records the number of value kinds. As of writing, two kinds of value profiles
are supported. `IndirectCallTarget` is to profile the frequent callees of
indirect call instructions and `MemOPSize` is for memory intrinsic function
size profiling.

The number of value kinds affects the byte size of per function profile data
control structure.

Payload Sections
------------------

Binary Ids
^^^^^^^^^^^
Stores the binary ids of the instrumented binaries to associate binaries with
profiles for source code coverage. See `Binary Id RFC`_ for introduction.

.. _`Binary Id RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-June/151154.html

Profile Data
^^^^^^^^^^^^^

This section stores per-function profile data control structure. The in-memory
representation of the control structure is `__llvm_profile_data` and the fields
are defined by `INSTRPROFDATA` macro. Some fields are used to reference data
from other sections in the profile. The fields are documented as follows:

``NameRef``
The MD5 of the function's PGO name. PGO name has the format
`[<filepath><delimiter>]<linkage-or-mangled-name>` where `<filepath>` and
`<delimiter>` is provided for local-linkage functions to tell possibly
identical functions.

``FuncHash``
A fingerprint of the function's control flow graph.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includes CFG plus some more stuff (memory ops I think). Can you put in a link to the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slightly reworded and added a link to computeCFGHash


``CounterPtr``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the relative distance (offset) in bytes between the function counter and the start of the counter section>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is correct. (My comment was based on old implementation before recent changes).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the recent binary profile correlation effort from @ZequanWu , CounterPtr records the address of counters if I'm reading correctly.

I updated the documentation to point out fields that might have different ways of interpretation. PTAL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CounterPtr is still a relative address (__profc_foo - __profd_foo) in default mode. Under binary profile correlation mode, it will just be the absolute address of the counter __profc_foo.

The in-memory address difference between profile data and its corresponding
counters.

``BitmapPtr``
The in-memory address difference between profile data and its bitmap.

``FunctionPointer``
Records the function address when instrumented binary runs. This is used to
map the profiled callee address of indirect calls to the `NameRef` during
conversion from raw to indexed profiles.

``Values``
Represents value profiles in a two dimensional array. The number of elements
in the first dimension is the number of instrumented value sites across all
kinds. Each element in the first dimension is the head of a linked list, and
the each element in the second dimension is linked list element, carrying
`<profiled-value, count>` as payload. This is used by compiler runtime when
writing out value profiles.

``NumCounters``
The number of counters for the instrumented function.

``NumValueSites``
This is an array of counters, and each counter represents the number of
instrumented sites for a kind of value in the function.

``NumBitmapBytes``
The number of bitmap bytes for the function.

Profile Counters
^^^^^^^^^^^^^^^^^

For PGO [2]_, the counters within an instrumented function are stored contiguously
and in an order that is consistent with instrumentation points selection in the
instrumentation pass.

.. _calculation-of-counter-offset:

So how are function counters associated with a function?

Basically, the profile reader iterates per-function control structure (from the
profile data section) and makes use of the recorded relative distances, as
illustrated below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is clearer to use an equation: CounterOffset(Func) = Data(Func).CounterPtr + Counter_Delta.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw some equation below.


::

+ --> start(__llvm_prf_data) --> +---------------------+ ------------+
| | Data 1 | |
| +---------------------+ =====|| |
| | Data 2 | || |
| +---------------------+ || |
| | ... | || |
Counter| +---------------------+ || |
Delta | | Data N | || |
| +---------------------+ || | CounterPtr1
| || |
| CounterPtr2 || |
| || |
| || |
+ --> start(__llvm_prf_cnts) --> +---------------------+ || |
| ... | || |
+---------------------+ -----||----+
| Counter 1 | ||
+---------------------+ ||
| ... | ||
+---------------------+ =====||
| Counter 2 |
+---------------------+
| ... |
+---------------------+
| Counter N |
+---------------------+


In the graph,

* The profile header records `CounterDelta` with the value as `start(__llvm_prf_cnts) - start(__llvm_prf_data)`.
We will call it `CounterDeltaInitVal` below for convenience.
* For each profile data record, `CounterPtrN` is recorded as `start(Counter) - start(ProfileData)`.

Each time the reader advances to the next data record, it updates `CounterDelta`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a link to the code (at a certain commit)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

to minus the size of one `ProfileData`.

For the counter corresponding to the first data record, the byte offset
relative to the start of the counter section is calculated as `CounterPtr1 - CounterDeltaInitVal`.
When profile reader advances to the second data record, note `CounterDelta`
is updated to `CounterDeltaInitVal - sizeof(ProfileData)`.
Thus the byte offset relative to the start of the counter section is calculated
as `CounterPtr2 - (CounterDeltaInitVal - sizeof(ProfileData))`.

Bitmap
^^^^^^^
This section is used for source-based MC/DC code coverage. Check out `Bitmap RFC`_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expand MC/DC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

if interested.

.. _`Bitmap RFC`: https://discourse.llvm.org/t/rfc-source-based-mc-dc-code-coverage/59244

Names
^^^^^^

This section contains possibly compressed concatenated string of functions' PGO
names. If compressed, zlib compression algorithm is used.

Function names serve as keys in the PGO data hash table when raw profiles are
converted into indexed profiles. They are also crucial for `llvm-profdata` to
show the profiles in a human-readable way.

Value Profile Data
^^^^^^^^^^^^^^^^^^^^

This section contains the profile data for value profiling.

The value profiles corresponding to a profile data are serialized contiguously
as one record, and value profile records are stored in the same order as the
respective profile data, such that a raw profile reader advances the pointer to
profile data and the pointer to value profile records simutaneously [3]_ to find
value profiles for a per function, per cfg fingerprint profile data.

Indexed PGO Profile Format
===========================

General Storage Layout
-----------------------

::

+-----------------------+---+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add a comment in the code to update this documentation when the format changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment close to the header definition for both raw and indexed profiles.

Uses link https://llvm.org/docs/InstrProfileFormat.html assuming file name will be InstrProfileFormat.rst.

| Magic | |
+-----------------------+ |
| Version | |
+-----------------------+ |
| HashType | H
+-----------------------+ E
+-------| HashOffset | A
| +-----------------------+ D
+-----------| MemProfOffset | E
| | +-----------------------+ R
| | +--| BinaryIdOffset | |
| | | +-----------------------+ |
+---------------| TemporalProf- | |
| | | | | TracesOffset | |
| | | | +-----------------------+---+
| | | | | Profile Summary | |
| | | | +-----------------------+ P
| | +------>| Function PGO data | A
| | | +-----------------------+ Y
| +---------- | MemProf profile data | L
| | +-----------------------+ O
| +--| Binary Ids | A
| +-----------------------+ D
+-------------->| Temporal profiles | |
+-----------------------+---+

Header
--------

``Magic``
The purpose of the magic number is to be able to quickly tell if the profile
is an indexed profile.

``Version``
Similar to raw profile version, the lower 32 bits specifies the version of the
indexed profile and the most significant 32 bits are reserved to specify the
variant types of the profile.

``HashType``
The hashing scheme for on-disk hash table keys. Only MD5 hashing is used as of
writing.

``HashOffset``
An on-disk hash table stores the per-function profile records. It records the
offset of this hash table's metadata (i.e., the number of buckets and entries),
which follows right after the payload of the entire hash table for
deserialization.

``MemProfOffset``
Records the byte offset of MemProf profiling data.

``BinaryIdOffset``
Records the byte offset of binary id sections.

``TemporalProfTracesOffset``
Records the byte offset of temporal profiles.

Payload Sections
------------------

(CS) Profile Summary
^^^^^^^^^^^^^^^^^^^^^
This section is right after profile header. It stores the serialized profile
summary. For context-sensitive IR-based instrumentation PGO, this section stores
an additional profile summary corresponding to the context-sensitive profiles.

Function PGO data
^^^^^^^^^^^^^^^^^^
This section stores functions and their PGO profiling data as an on-disk hash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Profile data for functions with the same name are grouped together and share one hash table entry (the functions may come from different shared libraries for instance). The profile data for them are organized as a sequence of key-value pair where the key is the funcHash (CFG based for IR PGO), and the value is profile counters for the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this.

table. The key of a hash table entry is function's PGO name, and the in-memory
representation of value is a map. The key of this map is the fingerprint of CFG,
and the value is a C++ struct named `llvm::InstrProfRecord`. The C++ struct
collects the profiling information like counters and value profiles.

MemProf Profile data
^^^^^^^^^^^^^^^^^^^^^^
This section stores function's memory profiling data. See
`MemProf binary serialization format RFC`_ for the design.

.. _`MemProf binary serialization format RFC`: https://lists.llvm.org/pipermail/llvm-dev/2021-September/153007.html

Binary Ids
^^^^^^^^^^^^^^^^^^^^^^
The section is used to carry on binary-id information from raw profiles.

Temporal Profile Traces
^^^^^^^^^^^^^^^^^^^^^^^^
The section is used to carry on temporal profile information from raw profiles.
See `Temporal profiling RFC`_ for the design.

.. _`Temporal profiling RFC`: https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068

Profile Data Usage
=======================================

`llvm-profdata` is the command line tool to display and process instrumentation-
based profile data. For supported usages, check out `llvm-profdata documentation <https://llvm.org/docs/CommandGuide/llvm-profdata.html>`_.


.. [1] A raw profile file could contain the concatenation of multiple raw
profiles. Raw profile reader could parse all raw profiles from the file
correctly.
.. [2] The counter section is used by a few variant types (like temporal
profiling) and might have different semantics there.
.. [3] The step size of data pointer is the `sizeof(ProfileData)`, and the step
size of value profile pointer is calcuated based on the number of collected
values.
Loading