https://www.youtube.com/watch?v=qmEsx4MbKoc
The slides (with links) are available at https://github.com/CppCon/CppCon2022/blob/main/Presentations/Optimization-Remarks.pdf .
This is still the best way to start, as the talk includes example script outputs and recommendations on handling them. The text below surveys background and technical usage.
In the beginning there was the compiler switch -Rpass
, and it was good. Sorta. Clang users who wanted visibility into compiler optimization decisions could dump a wall of text and sift through it trying to make up what's important and what's actionable.
Then, Adam Nemet et. al. added a compiler switch (clang -fsave-optimization-record
) and the opt-viewer python script, as part of LLVM. He presented it at the 2016 LLVM Developers’ Meeting, and lo it was good. Now users could generate and inspect HTMLs of their C/C++ sources, annotated with "optimization-remarks" in place.
Alas, these tools were explicitly designed for use by compiler writers wishing to investigate and improve optimization code, with only a mention of future adaptation for usage by developers wishing to understand and improve their application's optimization.
Hence the birth of OptView2. We aim to make this wonderful optimization data accessible and actionable by developers.
- Ignore system headers,
- Collect only optimization failures,
- Display in index.html only a single entry per type/source loc,
- Replace ‘pass’ with ‘optimization name’,
- Make the index table sortable & resizable (Thanks Ilan Ben-Hagai)
- Use abridged func names.
- Create option to split processing into subfolders ('--split-top-folders') to enable processing of large projects
- Trim repeated remarks in source - keep only 5 per line.
- Enable filtering by remark name/text, preferably via config file (but possible via command line too). Check
config.yaml
for some examples.
I can't see any future potential compatibility considerations, and these are essentially just 5 python scripts and some html+javascript - so at this point there won't be any versioning or releases structure. Just download/clone and use - and please report any problems you come across.
It is not uncommon for an analysis of a ~1000 file project to take an hour or more. Two things can help mitigate the burden:
- The
-j[N]
command line switch to opt-viewer.py controls the number of jobs to spawn for YAML processing. A rule of thumb that worked best for my PC was to setN
to 1.5 times the number of physical cores (for an 8 core machine, set tot 12), but there's no real alternative to experimentation. By default, the number of jobs invoked equals the number of logical cores. - The script uses the python package PyYaml - which uses the C++ package libyaml if available, and if not - falls back to a much, much slower python implementation. In such a case you'd see this line in the script output:
For faster parsing, you may want to install libyaml for PyYAL
One way to install and use libyaml is:
$ sudo apt install libyaml-dev
$ pip --no-cache-dir install --verbose --force-reinstall -I pyyaml
First, build your C/C++ project with Clang + -fsave-optimization-record
. Note that by default this generates YAMLs alongside the obj files. Then -
./optview2/opt-viewer.py --output-dir <HTMLs destination> --source-dir <source location> <YAMLs location>
Note that <source-dir>
needs to be the original root of the build which included -fsave-optimization-record
, even if you're interested only in part of the tree. Express this filter through the <YAML location>
argument.
./optview2/opt-viewer.py -j10 --output-dir <...> --source-dir <...> <YAML dir>
When working on large projects optview2's memory consumption easily gets out of hand. As a quick workaround, you can separate the work to build-subfolders (only first-level subfolders are supported). For example:
./optview2/opt-viewer.py --split-top-folders --output-dir <...> --source-dir <...> <YAMLs dir>
If, for example, the build dir includes subfolders "core", "utils" and "plugins" - the script would process them separately, and create 3 identically named subfolders under output-dir (with separate index files). If this doesn't work for you - you can also filter out comment types via remarks-filter.
A dummy project with a few optimization issues is placed under cpp_optimization_example
. To compile, generate HTML files and open in browser, use the wrapper script:
./optview2/cpp_optimization_example/run_optview2.sh
Note to WSL users: you'd probably need to manually open the resulting HTML.
Two real-life projects were analyzed and the results pushed online - check CPython and OpenCV pages.