Improve superpmi.py scripting (#42238)

* Improve superpmi.py scripting 1. Update Azure storage collections location to a new location where we have appropriate permissions to manage the data. 2. Update the Azure storage upload/download implementation to the current version of the Azure Python API. 3. Add JIT-EE interface GUID to path of Azure stored collections. When downloading collections, use the appropriate collection for your JIT-EE interface GUID. This is all done by adding a "-printJITEEVersion" option to the SuperPMI MCS tool. Thus, to determine the JIT-EE version, we assume that MCS is built with the same version as the JIT (which will be true in a normal build), and that MCS is available and able to be run -- this typically requires a Core_Root location be available. The user can specify the JIT-EE version explicitly with the new `-jit_ee_version` argument. 4. Simplify the Azure storage format: there is no longer a JSON mapping of name to MCH file. Instead, there is just a directory full of files. By default, all files are downloaded and used for replay/asmdiffs, by that can be filtered with the new `-filter` argument. 5. The `-mch_files` (previously `-mch_file`) argument used by `replay`, `asmdiffs`, and `upload`, now accepts a list of directories and files, and for each directory all MCH files included in that directory, recursively, are used. 6. Also upload MCT (TOC) files with the MCH files. 7. A `--force_download` argument is added to allow forcing re-download of the Azure collections to the local cache. 8. PMI.dll is also looked for on the PATH before downloading a cached version from Azure storage. 9. Some of the lesser-used arguments were renamed to simplify them. 10. Various bugs were fixed and code simplification/reorganization was done. E.g., some of the commonality between replay and asmdiffs was factored out. 11. More code documentation was added. 12. The superpmi.md documentation was re-written and simplified. * Add support for download and caching of UNC paths on Windows * Support downloading and caching explicitly specified HTTP addressed files
dotnet · Sep 18, 2020 · 22d206c · 22d206c
1 parent be4a89d
commit 22d206c
Show file tree

Hide file tree

Showing 9 changed files with 1,403 additions and 956 deletions.
diff --git a/src/coreclr/scripts/superpmi.md b/src/coreclr/scripts/superpmi.md
@@ -1,34 +1,40 @@
-# An overview of using superpmi.py
+# Documentation for the superpmi.py tool
 
+SuperPMI is a tool for developing and testing the JIT compiler.
 General information on SuperPMI can be found [here](../src/ToolBox/superpmi/readme.md)
 
 ## Overview
 
-Although SuperPMI has many uses, setup and use of SuperPMI is not always trivial.
-superpmi.py is a tool to help automate the use of SuperPMI, augmenting its usefulness.
+superpmi.py is a tool to simplify the use of SuperPMI.
 The tool has three primary modes: collect, replay, and asmdiffs.
 Below you will find more specific information on each of the different modes.
 
+superpmi.py lives in the dotnet/runtime GitHub repo, src\coreclr\scripts directory.
+
 ## General usage
 
 From the usage message:
 
 ```
-usage: superpmi.py [-h] {collect,replay,asmdiffs,upload,list-collections} ...
+usage: superpmi.py [-h]
+                   {collect,replay,asmdiffs,upload,download,list-collections}
+                   ...
 
 Script to run SuperPMI replay, ASM diffs, and collections. The script also
-manages the Azure store of precreated SuperPMI collection files. Help for each
-individual command can be shown by asking for help on the individual command,
-for example `superpmi.py collect --help`.
+manages the Azure store of pre-created SuperPMI collection files. Help for
+each individual command can be shown by asking for help on the individual
+command, for example `superpmi.py collect --help`.
 
 positional arguments:
-  {collect,replay,asmdiffs,upload,list-collections}
+  {collect,replay,asmdiffs,upload,download,list-collections}
                         Command to invoke
 
 optional arguments:
   -h, --help            show this help message and exit
 ```
 
+## Replay
+
 The simplest usage is to replay using:
 
 ```
@@ -41,92 +47,74 @@ In this case, everything needed is found using defaults:
 - The build type is assumed to be Checked.
 - Core_Root is found by assuming superpmi.py is in the normal location in the
 clone of the repo, and using the processor architecture, build type, and current
-OS, to find it.
+OS, to find it in the default `artifacts` directory location. Note that you must
+have performed a product build for this platform / build type combination, and
+created the appropriate Core_Root directory as well.
 - The SuperPMI tool and JIT to use for replay is found in Core_Root.
-- The collection to use for replay is the default that is found in the
-precomputed collections that are stored in Azure.
+- The SuperPMI collections to use for replay are found in the Azure store of
+precomputed collections for this JIT-EE interface GUID, OS, and processor architecture.
 
-If you want to use a specific MCH file collection, use:
+If you want to use a specific MCH file collection, use the `-mch_files` argument to specify
+one or more MCH files on your machine:
 
 ```
-python f:\gh\runtime\src\coreclr\scripts\superpmi.py replay -mch_file f:\spmi\collections\tests.pmi.Windows_NT.x64.Release.mch
+python f:\gh\runtime\src\coreclr\scripts\superpmi.py replay -mch_files f:\spmi\collections\tests.pmi.Windows_NT.x64.Release.mch
 ```
 
-To generate ASM diffs, use the `asmdiffs` command. In this case, you must specify
-the path to a baseline JIT compiler, e.g.:
+The `-mch_files` argument takes a list of one or more directories or files to use. For
+each directory, all the MCH files in that directory are used.
+
+If you want to use just a subset of the collections, either default collections or collections
+specified by `-mch_files`, use the `-filter` argument to restrict the MCH files used, e.g.:
 
 ```
-python f:\gh\runtime\src\coreclr\scripts\superpmi.py asmdiffs f:\jits\baseline_clrjit.dll
+python f:\gh\runtime\src\coreclr\scripts\superpmi.py replay -filter tests
 ```
 
-ASM diffs requires the coredistools library. The script attempts to either find
-or download an appropriate version that can be used.
+## ASM diffs
 
-## Collections
+To generate ASM diffs, use the `asmdiffs` command. In this case, you must specify
+the path to a baseline JIT compiler using the `-base_jit_path` argument, e.g.:
 
-SuperPMI requires a collection to enable replay. You can do a collection
-yourself, but it is more convenient to use existing precomputed collections.
-Superpmi.py can automatically download existing collections
+```
+python f:\gh\runtime\src\coreclr\scripts\superpmi.py asmdiffs -base_jit_path f:\jits\baseline_clrjit.dll
+```
 
-Note that SuperPMI collections are sensitive to JIT/EE interface changes. If
-there has been an interface change, the new JIT will not load and SuperPMI
-will fail.
+ASM diffs requires the coredistools library. The script attempts to find
+or download an appropriate version that can be used.
 
-**At the time of writing, collections are done manually. See below for a
-full list of supported platforms and where the .mch collection exists.**
+As for the "replay" case, the set of collections used defaults to the set available
+in Azure, or can be specified using the `mch_files` argument. In either case, the
+`-filter` argument can restrict the set used.
 
-## Supported Platforms
+## Collections
 
-| OS      | Arch  | Replay                    | AsmDiffs                  | MCH location |
-| ---     | ---   | ---                       | ---                       | --- |
-| OSX     | x64   |  <ul><li>- [x] </li></ul> |  <ul><li>- [x] </li></ul> |  |
-| Windows | x64   |  <ul><li>- [x] </li></ul> |  <ul><li>- [x] </li></ul> |  |
-| Windows | x86   |  <ul><li>- [x] </li></ul> |  <ul><li>- [x] </li></ul> |  |
-| Windows | arm   |  <ul><li>- [ ] </li></ul> |  <ul><li>- [ ] </li></ul> | N/A |
-| Windows | arm64 |  <ul><li>- [ ] </li></ul> |  <ul><li>- [ ] </li></ul> | N/A |
-| Ubuntu  | x64   |  <ul><li>- [x] </li></ul> |  <ul><li>- [x] </li></ul> |  |
-| Ubuntu  | arm32 |  <ul><li>- [ ] </li></ul> |  <ul><li>- [ ] </li></ul> | N/A |
-| Ubuntu  | arm64 |  <ul><li>- [ ] </li></ul> |  <ul><li>- [ ] </li></ul> | N/A |
+SuperPMI requires a collection to enable replay. You can do a collection
+yourself using the superpmi.py `collect` command, but it is more convenient
+to use existing precomputed collections stored in Azure.
 
-## Default Collections
+You can see which collections are available for your current settings using
+the `list-collections` command. You can also see all the available collections
+using the `list-collections --all` command. Finally, you can see which Azure stored
+collections have been locally cached on your machine in the default cache location
+by using `list-collections --local`.
 
-See the table above for locations of default collections that exist. If there
-is an MCH file that exists, then SuperPMI will automatically download and
-use the MCH from that location. Please note that it is possible that the
-collection is out of date, or there is a jitinterface change which makes the
-collection invalid. If this is the case, then in order to use the tool a
-collection will have to be done manually. In order to reproduce the default
-collections, please see below for what command the default collections are
-done with.
+(Note that when collections are downloaded, they are cached locally. If there are
+any cached collections, then no download attempt is made. To force re-download,
+use the `--force_download` argument to the `replay`, `asmdiffs`, or `download` command.)
 
-## Collect
+### Creating a collection
 
-Example commands to create a collection:
+Example commands to create a collection (on Linux, by running the tests):
 
 ```
-/Users/jashoo/runtime/src/coreclr/build.sh x64 checked
-/Users/jashoo/runtime/src/coreclr/build-test.sh x64 checked -priority1
+# First, build the product, possibly the tests, and create a Core_Root directory.
 /Users/jashoo/runtime/src/coreclr/scripts/superpmi.py collect bash "/Users/jashoo/runtime/src/coreclr/tests/runtest.sh x64 checked"
 ```
 
-Given a specific command, collect over all of the managed code called by the
+The above command collects over all of the managed code called by the
 child process. Note that this allows many different invocations of any
-managed code. Although it does specifically require that any managed code run
-by the child process to handle the COMPlus variables set by SuperPMI and
-defer them to the latter. These are below:
-
-```
-SuperPMIShimLogPath=<full path to an empty temporary directory>
-SuperPMIShimPath=<full path to clrjit.dll, the "standalone" JIT>
-COMPlus_AltJit=*
-COMPlus_AltJitName=superpmi-shim-collector.dll
-```
-
-If these variables are set and a managed exe is run, using for example the
-dotnet CLI, the altjit settings will crash the process.
-
-To avoid this, the easiest way is to unset the variables in the beginning to
-the root process, and then set them right before calling `$CORE_ROOT/corerun`.
+managed code.
 
 You can also collect using PMI instead of running code. Do with with the `--pmi` and `-pmi_assemblies`
 arguments. E.g.:
@@ -136,37 +124,15 @@ python f:\gh\runtime\src\coreclr\scripts\superpmi.py collect --pmi -pmi_assembli
 ```
 
 Note that collection generates gigabytes of data. Most of this data will
-be removed when the collection is finished. That being said, it is worth
-mentioning that this process will use 3x the size of the unclean MCH file,
-which to give an example of the size, a collection of the coreclr
-`priority=1` tests uses roughly `200gb` of disk space. Most of this space
-will be used in a temp directory, which on Windows will default to
-`C:\Users\blah\AppData\Temp\...`. It is recommended to set the temp variable
-to a different location before running collect to avoid running out of disk
-space. This can be done by simply running `set TEMP=D:\TEMP`.
+be removed when the collection is finished. It is recommended to set the TEMP variable
+to a location with adequate space, and preferably on a fast SSD to improve performance,
+before running `collect` to avoid running out of disk space.
 
-## Replay
+### Azure Storage collections
+
+As stated above, you can use the `list-collections` command to see which collections
+are available in Azure.
 
-SuperPMI replay supports faster assertion checking over a collection than
-running the tests individually. This is useful if the collection includes a
-larger corpus of data that can reasonably be run against by executing the
-actual code, or if it is difficult to invoke the JIT across all the code in
-the collection. Note that this is similar to the PMI tool, with the same
-limitation, that runtime issues will not be caught by SuperPMI replay only
-assertions.
-
-## Asm Diffs
-
-SuperPMI will take two different JITs, a baseline and diff JIT and run the
-compiler accross all the methods in the MCH file. It uses coredistools to do
-a binary difference of the two different outputs. Note that sometimes the
-binary will differ, and SuperPMI will be run once again dumping the asm that
-was output in text format. Then the text will be diffed, if there are
-differences, you should look for text differences. If there are some then it
-is worth investigating the asm differences.
-
-superpmi.py can also be asked to generate JitDump differences in addition
-to the ASM diff differences generated by default.
-
-It is worth noting as well that SuperPMI gives more stable instructions
-retired counters for the JIT.
+There is also a `download` command to download one or more Azure stored collection
+to the local cache, as well as an `upload` command to populate the Azure collection
+set.