-
Notifications
You must be signed in to change notification settings - Fork 18
Benchmarking overhaul and pin Flake8 <6 #220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
stephenworsley
merged 19 commits into
SciTools:main
from
trexfeathers:benchmarks_update
Nov 25, 2022
Merged
Changes from 15 commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
a8e6a13
Better benchmarking infrastructure - see SciTools/iris#4571, SciTools…
trexfeathers c3fa502
Minor improvements to benchmark data generation messages.
trexfeathers e543a3c
Better benchmark imports.
trexfeathers af1fd52
Better strategy for data realisation and ASV.
trexfeathers 878b7a3
Introduced on_demand_benchmark decorator - see SciTools/iris#4621.
trexfeathers 16e34d0
Simplify benchmark structure following 878b7a3.
trexfeathers eaff991
Added a benchmarks README mirroring SciTools/iris.
trexfeathers 55f71b2
Merge remote-tracking branch 'upstream/main' into benchmarks_update
trexfeathers f9afcf2
CHANGELOG entry.
trexfeathers fa2076d
Flake8 fixes.
trexfeathers efa66ac
Bump Nox cache.
trexfeathers 7e1cd23
Cirrus benchmarks pass in CIRRUS_BASE_SHA.
trexfeathers a90488c
Benchmark README Conda package cache tips.
trexfeathers 0b147d0
Reset Nox cache.
trexfeathers 31bab9b
New Nox cache.
trexfeathers e9fb19f
Remove licence header from asv_delegated_conda.py.
trexfeathers bd64aef
Always re-create Nox benchmark environment (to avoid CI problems).
trexfeathers 22a6f63
Pin Flake8 <6.
trexfeathers 6957388
Always re-create Nox benchmark environment (to avoid CI problems).
trexfeathers File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,116 @@ | ||
| # iris-esmf-regrid Performance Benchmarking | ||
|
|
||
| iris-esmf-regrid uses an | ||
| [Airspeed Velocity](https://github.com/airspeed-velocity/asv) | ||
| (ASV) setup to benchmark performance. This is primarily designed to check for | ||
| performance shifts between commits using statistical analysis, but can also | ||
| be easily repurposed for manual comparative and scalability analyses. | ||
|
|
||
| The benchmarks are run as part of the CI (the `benchmark_task` in | ||
| [`.cirrus.yml`](../.cirrus.yml)), with any notable shifts in performance | ||
| raising a ❌ failure. | ||
|
|
||
| ## Running benchmarks | ||
|
|
||
| `asv ...` commands must be run from this directory. You will need to have ASV | ||
| installed, as well as Nox (see | ||
| [Benchmark environments](#benchmark-environments)). | ||
|
|
||
| [iris-esmf-regrid's noxfile](../noxfile.py) includes a `benchmarks` session | ||
| that provides conveniences for setting up before benchmarking, and can also | ||
| replicate the CI run locally. See the session docstring for detail. | ||
|
|
||
| ### Environment variables | ||
|
|
||
| * `DATA_GEN_PYTHON` - required - path to a Python executable that can be | ||
| used to generate benchmark test objects/files; see | ||
| [Data generation](#data-generation). The Nox session sets this automatically, | ||
| but will defer to any value already set in the shell. | ||
| * `BENCHMARK_DATA` - optional - path to a directory for benchmark synthetic | ||
| test data, which the benchmark scripts will create if it doesn't already | ||
| exist. Defaults to `<root>/benchmarks/.data/` if not set. Note that some of | ||
| the generated files, especially in the 'SPerf' suite, are many GB in size so | ||
| plan accordingly. | ||
| * `ON_DEMAND_BENCHMARKS` - optional - when set (to any value): benchmarks | ||
| decorated with `@on_demand_benchmark` are included in the ASV run. Usually | ||
| coupled with the ASV `--bench` argument to only run the benchmark(s) of | ||
| interest. Is set during the Nox `sperf` session. | ||
|
|
||
| ### Reducing run time | ||
|
|
||
| Before benchmarks are run on a commit, the benchmark environment is | ||
| automatically aligned with the lock-file for that commit. You can significantly | ||
| speed up any environment updates by co-locating the benchmark environment and your | ||
| [Conda package cache](https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#specify-package-directories-pkgs-dirs) | ||
| on the same [file system](https://en.wikipedia.org/wiki/File_system). This can | ||
| be done in several ways: | ||
|
|
||
| * Move your iris-esmf-regrid checkout, this being the default location for the | ||
| benchmark environment. | ||
| * Move your package cache by editing | ||
| [`pkgs_dirs` in Conda config](https://conda.io/projects/conda/en/latest/user-guide/configuration/use-condarc.html#specify-package-directories-pkgs-dirs). | ||
| * Move the benchmark environment by **locally** editing the environment path of | ||
| `delegated_env_commands` and `delegated_env_parent` in | ||
| [asv.conf.json](asv.conf.json). | ||
|
|
||
| ## Writing benchmarks | ||
|
|
||
| [See the ASV docs](https://asv.readthedocs.io/) for full detail. | ||
|
|
||
| ### Data generation | ||
| **Important:** be sure not to use the benchmarking environment to generate any | ||
| test objects/files, as this environment changes with each commit being | ||
| benchmarked, creating inconsistent benchmark 'conditions'. The | ||
| [generate_data](./benchmarks/generate_data.py) module offers a | ||
| solution; read more detail there. | ||
|
|
||
| ### ASV re-run behaviour | ||
|
|
||
| Note that ASV re-runs a benchmark multiple times between its `setup()` routine. | ||
| This is a problem for benchmarking certain Iris operations such as data | ||
| realisation, since the data will no longer be lazy after the first run. | ||
| Consider writing extra steps to restore objects' original state _within_ the | ||
| benchmark itself. | ||
|
|
||
| If adding steps to the benchmark will skew the result too much then re-running | ||
| can be disabled by setting an attribute on the benchmark: `number = 1`. To | ||
| maintain result accuracy this should be accompanied by increasing the number of | ||
| repeats _between_ `setup()` calls using the `repeat` attribute. | ||
| `warmup_time = 0` is also advisable since ASV performs independent re-runs to | ||
| estimate run-time, and these will still be subject to the original problem. A | ||
| decorator is available for this - `@disable_repeat_between_setup` in | ||
| [benchmarks init](./benchmarks/__init__.py). | ||
|
|
||
| ### Scaling / non-Scaling Performance Differences | ||
|
|
||
| When comparing performance between commits/file-type/whatever it can be helpful | ||
| to know if the differences exist in scaling or non-scaling parts of the Iris | ||
| functionality in question. This can be done using a size parameter, setting | ||
| one value to be as small as possible (e.g. a scalar `Cube`), and the other to | ||
| be significantly larger (e.g. a 1000x1000 `Cube`). Performance differences | ||
| might only be seen for the larger value, or the smaller, or both, getting you | ||
| closer to the root cause. | ||
|
|
||
| ### On-demand benchmarks | ||
|
|
||
| Some benchmarks provide useful insight but are inappropriate to be included in | ||
| a benchmark run by default, e.g. those with long run-times or requiring a local | ||
| file. These benchmarks should be decorated with `@on_demand_benchmark` | ||
| (see [benchmarks init](./benchmarks/__init__.py)), which | ||
| sets the benchmark to only be included in a run when the `ON_DEMAND_BENCHMARKS` | ||
| environment variable is set. Examples include the SPerf benchmark | ||
| suite for the UK Met Office NG-VAT project. | ||
|
|
||
| ## Benchmark environments | ||
|
|
||
| We have disabled ASV's standard environment management, instead using an | ||
| environment built using the same Nox scripts as Iris' test environments. This | ||
| is done using ASV's plugin architecture - see | ||
| [asv_delegated_conda.py](asv_delegated_conda.py) and the extra config items in | ||
| [asv.conf.json](asv.conf.json). | ||
|
|
||
| (ASV is written to control the environment(s) that benchmarks are run in - | ||
| minimising external factors and also allowing it to compare between a matrix | ||
| of dependencies (each in a separate environment). We have chosen to sacrifice | ||
| these features in favour of testing each commit with its intended dependencies, | ||
| controlled by Nox + lock-files). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,15 +1,27 @@ | ||
| { | ||
| "version": 1, | ||
| "project": "esmf_regrid", | ||
| "repo": "..", | ||
| "environment_type": "nox-conda", | ||
| "pythons": [], | ||
| "branches": ["main"], | ||
| "benchmark_dir": "benchmarks", | ||
| "env_dir": ".asv-env", | ||
| "results_dir": ".asv-results", | ||
| "html_dir": ".asv-html", | ||
| "project_url": "https://github.com/SciTools-incubator/iris-esmf-regrid", | ||
| "repo": "..", | ||
| "environment_type": "conda-delegated", | ||
| "show_commit_url": "https://github.com/SciTools-incubator/iris-esmf-regrid/commit/", | ||
| "plugins": [".nox_asv_plugin"], | ||
| "branches": ["upstream/main"], | ||
|
|
||
| "benchmark_dir": "./benchmarks", | ||
| "env_dir": ".asv/env", | ||
| "results_dir": ".asv/results", | ||
| "html_dir": ".asv/html", | ||
| "plugins": [".asv_delegated_conda"], | ||
|
|
||
| // The command(s) that create/update an environment correctly for the | ||
| // checked-out commit. | ||
| // Interpreted the same as build_command, with following exceptions: | ||
| // * No build-time environment variables. | ||
| // * Is run in the same environment as the ASV install itself. | ||
| "delegated_env_commands": [ | ||
| "PY_VER=3.10 nox --envdir={conf_dir}/.asv/env/nox01 --session=tests --install-only --no-error-on-external-run --verbose" | ||
| ], | ||
| // The parent directory of the above environment. | ||
| // The most recently modified environment in the directory will be used. | ||
| "delegated_env_parent": "{conf_dir}/.asv/env/nox01" | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,198 @@ | ||
| # Copyright Iris contributors | ||
| # | ||
| # This file is part of Iris and is released under the LGPL license. | ||
| # See COPYING and COPYING.LESSER in the root of the repository for full | ||
| # licensing details. | ||
| """ | ||
| ASV plug-in providing an alternative :class:`asv.plugins.conda.Conda` | ||
| subclass that manages the Conda environment via custom user scripts. | ||
|
|
||
| """ | ||
|
|
||
| from os import environ | ||
| from os.path import getmtime | ||
| from pathlib import Path | ||
| from shutil import copy2, copytree, rmtree | ||
| from tempfile import TemporaryDirectory | ||
|
|
||
| from asv import util as asv_util | ||
| from asv.config import Config | ||
| from asv.console import log | ||
| from asv.plugins.conda import Conda | ||
| from asv.repo import Repo | ||
|
|
||
|
|
||
| class CondaDelegated(Conda): | ||
| """ | ||
| Manage a Conda environment using custom user scripts, run at each commit. | ||
|
|
||
| Ignores user input variations - ``matrix`` / ``pythons`` / | ||
| ``conda_environment_file``, since environment is being managed outside ASV. | ||
|
|
||
| Original environment creation behaviour is inherited, but upon checking out | ||
| a commit the custom script(s) are run and the original environment is | ||
| replaced with a symlink to the custom environment. This arrangement is then | ||
| re-used in subsequent runs. | ||
|
|
||
| """ | ||
|
|
||
| tool_name = "conda-delegated" | ||
|
|
||
| def __init__( | ||
| self, | ||
| conf: Config, | ||
| python: str, | ||
| requirements: dict, | ||
| tagged_env_vars: dict, | ||
| ) -> None: | ||
| """ | ||
| Parameters | ||
| ---------- | ||
| conf : Config instance | ||
|
|
||
| python : str | ||
| Version of Python. Must be of the form "MAJOR.MINOR". | ||
|
|
||
| requirements : dict | ||
| Dictionary mapping a PyPI package name to a version | ||
| identifier string. | ||
|
|
||
| tagged_env_vars : dict | ||
| Environment variables, tagged for build vs. non-build | ||
|
|
||
| """ | ||
| ignored = ["`python`"] | ||
| if requirements: | ||
| ignored.append("`requirements`") | ||
| if tagged_env_vars: | ||
| ignored.append("`tagged_env_vars`") | ||
| if conf.conda_environment_file: | ||
| ignored.append("`conda_environment_file`") | ||
| message = ( | ||
| f"Ignoring ASV setting(s): {', '.join(ignored)}. Benchmark " | ||
| "environment management is delegated to third party script(s)." | ||
| ) | ||
| log.warning(message) | ||
| requirements = {} | ||
| tagged_env_vars = {} | ||
| conf.conda_environment_file = None | ||
|
|
||
| super().__init__(conf, python, requirements, tagged_env_vars) | ||
| self._update_info() | ||
|
|
||
| self._env_commands = self._interpolate_commands(conf.delegated_env_commands) | ||
| # Again using _interpolate_commands to get env parent path - allows use | ||
| # of the same ASV env variables. | ||
| env_parent_interpolated = self._interpolate_commands(conf.delegated_env_parent) | ||
| # Returns list of tuples, we just want the first. | ||
| env_parent_first = env_parent_interpolated[0] | ||
| # The 'command' is the first item in the returned tuple. | ||
| env_parent_string = " ".join(env_parent_first[0]) | ||
| self._delegated_env_parent = Path(env_parent_string).resolve() | ||
|
|
||
| @property | ||
| def name(self): | ||
| """Get a name to uniquely identify this environment.""" | ||
| return asv_util.sanitize_filename(self.tool_name) | ||
|
|
||
| def _update_info(self) -> None: | ||
| """Make sure class properties reflect the actual environment being used.""" | ||
| # Follow symlink if it has been created. | ||
| actual_path = Path(self._path).resolve() | ||
| self._path = str(actual_path) | ||
|
|
||
| # Get custom environment's Python version if it exists yet. | ||
| try: | ||
| get_version = ( | ||
| "from sys import version_info; " | ||
| "print(f'{version_info.major}.{version_info.minor}')" | ||
| ) | ||
| actual_python = self.run(["-c", get_version]) | ||
| self._python = actual_python | ||
| except OSError: | ||
| pass | ||
|
|
||
| def _prep_env(self) -> None: | ||
| """Run the custom environment script(s) and switch to using that environment.""" | ||
| message = f"Running delegated environment management for: {self.name}" | ||
| log.info(message) | ||
| env_path = Path(self._path) | ||
|
|
||
| def copy_asv_files(src_parent: Path, dst_parent: Path) -> None: | ||
| """For copying between self._path and a temporary cache.""" | ||
| asv_files = list(src_parent.glob("asv*")) | ||
| # build_root_path.name usually == "project" . | ||
| asv_files += [src_parent / Path(self._build_root).name] | ||
| for src_path in asv_files: | ||
| dst_path = dst_parent / src_path.name | ||
| if not dst_path.exists(): | ||
| # Only caching in case the environment has been rebuilt. | ||
| # If the dst_path already exists: rebuilding hasn't | ||
| # happened. Also a non-issue when copying in the reverse | ||
| # direction because the cache dir is temporary. | ||
| if src_path.is_dir(): | ||
| func = copytree | ||
| else: | ||
| func = copy2 | ||
| func(src_path, dst_path) | ||
|
|
||
| with TemporaryDirectory(prefix="delegated_asv_cache_") as asv_cache: | ||
| asv_cache_path = Path(asv_cache) | ||
| # Cache all of ASV's files as delegated command may remove and | ||
| # re-build the environment. | ||
| copy_asv_files(env_path.resolve(), asv_cache_path) | ||
|
|
||
| # Adapt the build_dir to the cache location. | ||
| build_root_path = Path(self._build_root) | ||
| build_dir_original = build_root_path / self._repo_subdir | ||
| build_dir_subpath = build_dir_original.relative_to(build_root_path.parent) | ||
| build_dir = asv_cache_path / build_dir_subpath | ||
|
|
||
| # Run the script(s) for delegated environment creation/updating. | ||
| # (An adaptation of self._interpolate_and_run_commands). | ||
| for command, env, return_codes, cwd in self._env_commands: | ||
| local_envs = dict(environ) | ||
| local_envs.update(env) | ||
| if cwd is None: | ||
| cwd = str(build_dir) | ||
| _ = asv_util.check_output( | ||
| command, | ||
| timeout=self._install_timeout, | ||
| cwd=cwd, | ||
| env=local_envs, | ||
| valid_return_codes=return_codes, | ||
| ) | ||
|
|
||
| # Replace the env that ASV created with a symlink to the env | ||
| # created/updated by the custom script. | ||
| delegated_env_path = sorted( | ||
| self._delegated_env_parent.glob("*"), | ||
| key=getmtime, | ||
| reverse=True, | ||
| )[0] | ||
| if env_path.resolve() != delegated_env_path: | ||
| try: | ||
| env_path.unlink(missing_ok=True) | ||
| except IsADirectoryError: | ||
| rmtree(env_path) | ||
| env_path.symlink_to(delegated_env_path, target_is_directory=True) | ||
|
|
||
| # Check that environment exists. | ||
| try: | ||
| env_path.resolve(strict=True) | ||
| except FileNotFoundError: | ||
| message = f"Path does not resolve to environment: {env_path}" | ||
| log.error(message) | ||
| raise RuntimeError(message) | ||
|
|
||
| # Restore ASV's files from the cache (if necessary). | ||
| copy_asv_files(asv_cache_path, env_path.resolve()) | ||
|
|
||
| # Record new environment information in properties. | ||
| self._update_info() | ||
|
|
||
| def checkout_project(self, repo: Repo, commit_hash: str) -> None: | ||
| """Check out the working tree of the project at given commit hash.""" | ||
| super().checkout_project(repo, commit_hash) | ||
| self._prep_env() | ||
| log.info(f"Environment {self.name} updated to spec at {commit_hash[:8]}") | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.