Skip to content

Commit

Permalink
Improve superpmi.py scripting (#42238)
Browse files Browse the repository at this point in the history
* Improve superpmi.py scripting

1. Update Azure storage collections location to a new location
where we have appropriate permissions to manage the data.
2. Update the Azure storage upload/download implementation
to the current version of the Azure Python API.
3. Add JIT-EE interface GUID to path of Azure stored
collections. When downloading collections, use the appropriate
collection for your JIT-EE interface GUID. This is all done
by adding a "-printJITEEVersion" option to the SuperPMI
MCS tool. Thus, to determine the JIT-EE version, we assume
that MCS is built with the same version as the JIT (which
will be true in a normal build), and that MCS is available
and able to be run -- this typically requires a Core_Root
location be available. The user can specify the JIT-EE
version explicitly with the new `-jit_ee_version` argument.
4. Simplify the Azure storage format: there is no longer
a JSON mapping of name to MCH file. Instead, there is just
a directory full of files. By default, all files are downloaded
and used for replay/asmdiffs, by that can be filtered with
the new `-filter` argument.
5. The `-mch_files` (previously `-mch_file`) argument used
by `replay`, `asmdiffs`, and `upload`, now accepts a list
of directories and files, and for each directory all MCH
files included in that directory, recursively, are used.
6. Also upload MCT (TOC) files with the MCH files.
7. A `--force_download` argument is added to allow forcing
re-download of the Azure collections to the local cache.
8. PMI.dll is also looked for on the PATH before downloading
a cached version from Azure storage.
9. Some of the lesser-used arguments were renamed to simplify them.
10. Various bugs were fixed and code simplification/reorganization was done.
E.g., some of the commonality between replay and asmdiffs was
factored out.
11. More code documentation was added.
12. The superpmi.md documentation was re-written and simplified.

* Add support for download and caching of UNC paths on Windows

* Support downloading and caching explicitly specified HTTP addressed files
  • Loading branch information
BruceForstall authored Sep 18, 2020
1 parent be4a89d commit 22d206c
Show file tree
Hide file tree
Showing 9 changed files with 1,403 additions and 956 deletions.
166 changes: 66 additions & 100 deletions src/coreclr/scripts/superpmi.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,40 @@
# An overview of using superpmi.py
# Documentation for the superpmi.py tool

SuperPMI is a tool for developing and testing the JIT compiler.
General information on SuperPMI can be found [here](../src/ToolBox/superpmi/readme.md)

## Overview

Although SuperPMI has many uses, setup and use of SuperPMI is not always trivial.
superpmi.py is a tool to help automate the use of SuperPMI, augmenting its usefulness.
superpmi.py is a tool to simplify the use of SuperPMI.
The tool has three primary modes: collect, replay, and asmdiffs.
Below you will find more specific information on each of the different modes.

superpmi.py lives in the dotnet/runtime GitHub repo, src\coreclr\scripts directory.

## General usage

From the usage message:

```
usage: superpmi.py [-h] {collect,replay,asmdiffs,upload,list-collections} ...
usage: superpmi.py [-h]
{collect,replay,asmdiffs,upload,download,list-collections}
...
Script to run SuperPMI replay, ASM diffs, and collections. The script also
manages the Azure store of precreated SuperPMI collection files. Help for each
individual command can be shown by asking for help on the individual command,
for example `superpmi.py collect --help`.
manages the Azure store of pre-created SuperPMI collection files. Help for
each individual command can be shown by asking for help on the individual
command, for example `superpmi.py collect --help`.
positional arguments:
{collect,replay,asmdiffs,upload,list-collections}
{collect,replay,asmdiffs,upload,download,list-collections}
Command to invoke
optional arguments:
-h, --help show this help message and exit
```

## Replay

The simplest usage is to replay using:

```
Expand All @@ -41,92 +47,74 @@ In this case, everything needed is found using defaults:
- The build type is assumed to be Checked.
- Core_Root is found by assuming superpmi.py is in the normal location in the
clone of the repo, and using the processor architecture, build type, and current
OS, to find it.
OS, to find it in the default `artifacts` directory location. Note that you must
have performed a product build for this platform / build type combination, and
created the appropriate Core_Root directory as well.
- The SuperPMI tool and JIT to use for replay is found in Core_Root.
- The collection to use for replay is the default that is found in the
precomputed collections that are stored in Azure.
- The SuperPMI collections to use for replay are found in the Azure store of
precomputed collections for this JIT-EE interface GUID, OS, and processor architecture.

If you want to use a specific MCH file collection, use:
If you want to use a specific MCH file collection, use the `-mch_files` argument to specify
one or more MCH files on your machine:

```
python f:\gh\runtime\src\coreclr\scripts\superpmi.py replay -mch_file f:\spmi\collections\tests.pmi.Windows_NT.x64.Release.mch
python f:\gh\runtime\src\coreclr\scripts\superpmi.py replay -mch_files f:\spmi\collections\tests.pmi.Windows_NT.x64.Release.mch
```

To generate ASM diffs, use the `asmdiffs` command. In this case, you must specify
the path to a baseline JIT compiler, e.g.:
The `-mch_files` argument takes a list of one or more directories or files to use. For
each directory, all the MCH files in that directory are used.

If you want to use just a subset of the collections, either default collections or collections
specified by `-mch_files`, use the `-filter` argument to restrict the MCH files used, e.g.:

```
python f:\gh\runtime\src\coreclr\scripts\superpmi.py asmdiffs f:\jits\baseline_clrjit.dll
python f:\gh\runtime\src\coreclr\scripts\superpmi.py replay -filter tests
```

ASM diffs requires the coredistools library. The script attempts to either find
or download an appropriate version that can be used.
## ASM diffs

## Collections
To generate ASM diffs, use the `asmdiffs` command. In this case, you must specify
the path to a baseline JIT compiler using the `-base_jit_path` argument, e.g.:

SuperPMI requires a collection to enable replay. You can do a collection
yourself, but it is more convenient to use existing precomputed collections.
Superpmi.py can automatically download existing collections
```
python f:\gh\runtime\src\coreclr\scripts\superpmi.py asmdiffs -base_jit_path f:\jits\baseline_clrjit.dll
```

Note that SuperPMI collections are sensitive to JIT/EE interface changes. If
there has been an interface change, the new JIT will not load and SuperPMI
will fail.
ASM diffs requires the coredistools library. The script attempts to find
or download an appropriate version that can be used.

**At the time of writing, collections are done manually. See below for a
full list of supported platforms and where the .mch collection exists.**
As for the "replay" case, the set of collections used defaults to the set available
in Azure, or can be specified using the `mch_files` argument. In either case, the
`-filter` argument can restrict the set used.

## Supported Platforms
## Collections

| OS | Arch | Replay | AsmDiffs | MCH location |
| --- | --- | --- | --- | --- |
| OSX | x64 | <ul><li>- [x] </li></ul> | <ul><li>- [x] </li></ul> | |
| Windows | x64 | <ul><li>- [x] </li></ul> | <ul><li>- [x] </li></ul> | |
| Windows | x86 | <ul><li>- [x] </li></ul> | <ul><li>- [x] </li></ul> | |
| Windows | arm | <ul><li>- [ ] </li></ul> | <ul><li>- [ ] </li></ul> | N/A |
| Windows | arm64 | <ul><li>- [ ] </li></ul> | <ul><li>- [ ] </li></ul> | N/A |
| Ubuntu | x64 | <ul><li>- [x] </li></ul> | <ul><li>- [x] </li></ul> | |
| Ubuntu | arm32 | <ul><li>- [ ] </li></ul> | <ul><li>- [ ] </li></ul> | N/A |
| Ubuntu | arm64 | <ul><li>- [ ] </li></ul> | <ul><li>- [ ] </li></ul> | N/A |
SuperPMI requires a collection to enable replay. You can do a collection
yourself using the superpmi.py `collect` command, but it is more convenient
to use existing precomputed collections stored in Azure.

## Default Collections
You can see which collections are available for your current settings using
the `list-collections` command. You can also see all the available collections
using the `list-collections --all` command. Finally, you can see which Azure stored
collections have been locally cached on your machine in the default cache location
by using `list-collections --local`.

See the table above for locations of default collections that exist. If there
is an MCH file that exists, then SuperPMI will automatically download and
use the MCH from that location. Please note that it is possible that the
collection is out of date, or there is a jitinterface change which makes the
collection invalid. If this is the case, then in order to use the tool a
collection will have to be done manually. In order to reproduce the default
collections, please see below for what command the default collections are
done with.
(Note that when collections are downloaded, they are cached locally. If there are
any cached collections, then no download attempt is made. To force re-download,
use the `--force_download` argument to the `replay`, `asmdiffs`, or `download` command.)

## Collect
### Creating a collection

Example commands to create a collection:
Example commands to create a collection (on Linux, by running the tests):

```
/Users/jashoo/runtime/src/coreclr/build.sh x64 checked
/Users/jashoo/runtime/src/coreclr/build-test.sh x64 checked -priority1
# First, build the product, possibly the tests, and create a Core_Root directory.
/Users/jashoo/runtime/src/coreclr/scripts/superpmi.py collect bash "/Users/jashoo/runtime/src/coreclr/tests/runtest.sh x64 checked"
```

Given a specific command, collect over all of the managed code called by the
The above command collects over all of the managed code called by the
child process. Note that this allows many different invocations of any
managed code. Although it does specifically require that any managed code run
by the child process to handle the COMPlus variables set by SuperPMI and
defer them to the latter. These are below:

```
SuperPMIShimLogPath=<full path to an empty temporary directory>
SuperPMIShimPath=<full path to clrjit.dll, the "standalone" JIT>
COMPlus_AltJit=*
COMPlus_AltJitName=superpmi-shim-collector.dll
```

If these variables are set and a managed exe is run, using for example the
dotnet CLI, the altjit settings will crash the process.

To avoid this, the easiest way is to unset the variables in the beginning to
the root process, and then set them right before calling `$CORE_ROOT/corerun`.
managed code.

You can also collect using PMI instead of running code. Do with with the `--pmi` and `-pmi_assemblies`
arguments. E.g.:
Expand All @@ -136,37 +124,15 @@ python f:\gh\runtime\src\coreclr\scripts\superpmi.py collect --pmi -pmi_assembli
```

Note that collection generates gigabytes of data. Most of this data will
be removed when the collection is finished. That being said, it is worth
mentioning that this process will use 3x the size of the unclean MCH file,
which to give an example of the size, a collection of the coreclr
`priority=1` tests uses roughly `200gb` of disk space. Most of this space
will be used in a temp directory, which on Windows will default to
`C:\Users\blah\AppData\Temp\...`. It is recommended to set the temp variable
to a different location before running collect to avoid running out of disk
space. This can be done by simply running `set TEMP=D:\TEMP`.
be removed when the collection is finished. It is recommended to set the TEMP variable
to a location with adequate space, and preferably on a fast SSD to improve performance,
before running `collect` to avoid running out of disk space.

## Replay
### Azure Storage collections

As stated above, you can use the `list-collections` command to see which collections
are available in Azure.

SuperPMI replay supports faster assertion checking over a collection than
running the tests individually. This is useful if the collection includes a
larger corpus of data that can reasonably be run against by executing the
actual code, or if it is difficult to invoke the JIT across all the code in
the collection. Note that this is similar to the PMI tool, with the same
limitation, that runtime issues will not be caught by SuperPMI replay only
assertions.

## Asm Diffs

SuperPMI will take two different JITs, a baseline and diff JIT and run the
compiler accross all the methods in the MCH file. It uses coredistools to do
a binary difference of the two different outputs. Note that sometimes the
binary will differ, and SuperPMI will be run once again dumping the asm that
was output in text format. Then the text will be diffed, if there are
differences, you should look for text differences. If there are some then it
is worth investigating the asm differences.

superpmi.py can also be asked to generate JitDump differences in addition
to the ASM diff differences generated by default.

It is worth noting as well that SuperPMI gives more stable instructions
retired counters for the JIT.
There is also a `download` command to download one or more Azure stored collection
to the local cache, as well as an `upload` command to populate the Azure collection
set.
Loading

0 comments on commit 22d206c

Please sign in to comment.