-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[datasets] Switch CompilerEnv to the new dataset API. #222
[datasets] Switch CompilerEnv to the new dataset API. #222
Conversation
336d959
to
ab01348
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
compiler_gym/envs/compiler_env.py
Outdated
force a specific benchmark to be chosen, set this property (or pass | ||
the benchmark as an argument to :func:`reset`): | ||
By default, a benchmark will be selected randomly by the service from | ||
the available benchmarks on a call to :func:`reset`. To force a specific |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there always guaranteed to be at least one available benchmark?
What happens if someone uninstalls all the datasets?
The distribution of this changes based on what you have installed. I still think removing randomness from most of these apis is better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, these docs are stale. After our chat on Monday I took your recommendation of removing randomness from the datasets in df5dcdb. Now, an environment has a default benchmark (defaults to first available next(env.datasets.benchmarks())
). If there are no datasets, a TypeError is raised on call to reset()
@@ -25,10 +24,10 @@ class MinimizationError(OSError): | |||
|
|||
# A hypothesis is a callback that accepts as input an enivornment in a given | |||
# state returns true if a particular hypothesis holds, else false. | |||
Hypothesis = Callable[[CompilerEnv], bool] | |||
Hypothesis = Callable[["CompilerEnv"], bool] # noqa: F821 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just curious, what do the quotes do in "CompilerEnv"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a string literal type hint defers its evaluation. Otherwise CompilerEnv
name would be evaluated here. This is used for forward references or, in this case, to break a circular reference.
As of py3.7 there is a better way of handling this, but CompilerGym supports py3.6: https://www.python.org/dev/peps/pep-0563
1115a10
to
70bc82c
Compare
This extends the micro-benchmark script to record and report runtimes for operations on the example gym service implementations in C++ and Python. The idea is that this provides a useful reference when evaluating the measurements of other environments. Since the example services do no compilation, the benchmark performances are close to a roofline.
70bc82c
to
0b34c14
Compare
This switches over the `CompilerEnv` environment to use the new dataset API, dropping the `LegacyDataset` class. Background ---------- Since the very first prototype of CompilerGym, a `Benchmark` protocol buffer has been used to provide a serializable representation of benchmarks that can be passed back and forth between the service and the frontend. Initially, it was up to the compiler service to maintain the set of available benchmarks, exposing the available benchmarks with a `GetBenchmarks()` RPC method, and allowing new benchmarks to be added using an `AddBenchmarks()` method. This was fine for the initial use case of shipping a handful of benchmarks and allowing ad-hoc new benchmarks to be added, but for managing larger sets of benchmarks, a *datasets* abstraction was added. Initial Datasets abstraction ---------------------------- To add support for managing large sets of programs, a [Dataset](https://github.com/facebookresearch/CompilerGym/blob/49c10d77d1c1b1297a1269604584a13c10434cbb/compiler_gym/datasets/dataset.py#L20) tuple was added that describes a set of programs, and a link to the a tarball containing those programs. The tarball is required to have a JSON file containing metadata, and a directory containing the benchmarks, one file per benchmark. A set of operations were added to the frontend command line to make downloading and unpacking these tarballs easier: https://github.com/facebookresearch/CompilerGym/blob/49c10d77d1c1b1297a1269604584a13c10434cbb/compiler_gym/bin/datasets.py#L5-L133 Problems with this approach --------------------------- (1) **Leaky abstraction** Both the environment and backend service have to know about datasets. This means redundant duplicated logic, and adds a maintenance burden of keeping the C++/python logic in sync. (2) **Inflexible** Only supports environments in which a single file represents a benchmark. No support for multi-file benchmarks, benchmarks that are compiled on-demand, etc. (3) **O(n) space and time overhead** on each service instance, where *n* is the total number of benchmarks. At init time, each service needs to recursively scan a directory tree to build a list of available benchmarks. This list must be kept in memory. This adds startup time, and also causes cache invalidation issues when multiple environment instances are modifying the underlying filesystem. New Dataset API --------------- This commit changes the ownership model so that the *Environment* owns the benchmarks and datasets, not the service. This uses the new `Dataset` class hierarchy that has been added in previous pull requests: facebookresearch#190, facebookresearch#191, facebookresearch#192, facebookresearch#200, facebookresearch#201. Now, the backend has no knowledge of "datasets". Instead the service simply keeps a small cache of benchmarks that it has seen. If a session request has a benchmark URI that is not in this cache, the service returns a "resource not found" error and the frontend logic can then respond by sending it a copy of the benchmark as a `Benchmark` proto. The service is free to cache this for future use, and can empty the cache whenever it wants. This new approach has a few key benefits: (1) By moving all of the datasets logic into the frontend, it becomes much easier for users to define their own datasets. (2) Reduces compiler service startup time as it removes the need for each service to do a recursive filesystem sweep. (3) Removes the requirement that the set of benchmarks is fully enumerable, allow for program generators that can produce a theoretically infinite number of benchmarks. (4) Adds support for lazily-compiled datasets of programs that are generated on-demand. (5) Removes the need to download datasets ahead of time. Datasets can now be installed on-demand. Summary of changes ------------------ (1) Changes the type of `env.benchmark` from a string to a `Benchmark` instance. (2) Makes `env.benchmark` a mandatory attribute. If no benchmark is provided at init time, one is chosen deterministically. If you wish to select a random benchmark, use `env.datasets.benchmark()`. (3) `env.fork()` no longer requires `env.reset()` to have been called first. It will call `env.reset()` if required. (4) `env.benchmark = None` is no longer a valid way of requesting a random benchmark. If you would like a random benchmark, you must now roll your own random picker using `env.datasets.benchmark_uris()` and similar. (5) Deprecates all `LegacyDataset` operations, changing their behavior to no-ops, and removing the class. (6) Renames `cBench` to `cbench` to be consistent with the lower-case naming convention of gym. The old `cBench` datasets are kept around but are marked deprecated to encourage migration. Migrating to the new interface ------------------------------ To migrate existing code to the new interface: (1) Update references to `cBench-v[01]` to `cbench-v1`. (2) Review code that accesses the `env.benchmark` property and update to `env.benchmark.uri` if a string name is required. (3) Review code that calls `env.reset()` without first setting a benchmark. Previously, calling `env.reset()` would select a random benchmark. Now, `env.reset()` always selects the last used benchmark, or a predetermined default if none is specified. (4) Review code that relies on `env.benchmark` being `None` to select benchmarks randomly. Now, `env.benchmark` is always set to the previously used benchmark, or a predetermined default benchmark if none has been provided. (5) Remove calls to `env.require_dataset()`. Issue facebookresearch#45.
0b34c14
to
0260132
Compare
This release introduces some significant changes to the way that benchmarks are managed, introducing a new dataset API. This enabled us to add support for millions of new benchmarks and a more efficient implementation for the LLVM environment, but this will require some migrating of old code to the new interfaces (see “Migration Checklist” below). Some of the key changes of this release are: - [Core API change] We have added a Python Benchmark class (#190). The env.benchmark attribute is now an instance of this class rather than a string (#222). - [Core behavior change] Environments will no longer select benchmarks randomly. Now env.reset() will now always select the last-used benchmark, unless the benchmark argument is provided or env.benchmark has been set. If no benchmark is specified, a default is used. - [API deprecations] We have added a new Dataset class hierarchy (#191, #192). All datasets are now available without needing to be downloaded first, and a new Datasets class can be used to iterate over them (#200). We have deprecated the old dataset management operations, the compiler_gym.bin.datasets script, and removed the --dataset and --ls_benchmark flags from the command line tools. - [RPC interface change] The StartSession RPC endpoint now accepts a list of initial observations to compute. This removes the need for an immediate call to Step, reducing environment reset time by 15-21% (#189). - [LLVM] We have added several new datasets of benchmarks, including the Csmith and llvm-stress program generators (#207), a dataset of OpenCL kernels (#208), and a dataset of compilable C functions (#210). See the docs for an overview. - CompilerEnv now takes an optional Logger instance at construction time for fine-grained control over logging output (#187). - [LLVM] The ModuleID and source_filename of LLVM-IR modules are now anonymized to prevent unintentional overfitting to benchmarks by name (#171). - [docs] We have added a Feature Stability section to the documentation (#196). - Numerous bug fixes and improvements. Please use this checklist when updating code for the previous CompilerGym release: - Review code that accesses the env.benchmark property and update to env.benchmark.uri if a string name is required. Setting this attribute by string (env.benchmark = "benchmark://a-v0/b") and comparison to string types (env.benchmark == "benchmark://a-v0/b") still work. - Review code that calls env.reset() without first setting a benchmark. Previously, calling env.reset() would select a random benchmark. Now, env.reset() always selects the last used benchmark, or a predetermined default if none is specified. - Review code that relies on env.benchmark being None to select benchmarks randomly. Now, env.benchmark is always set to the previously used benchmark, or a predetermined default benchmark if none has been specified. Setting env.benchmark = None will raise an error. Select a benchmark randomly by sampling from the env.datasets.benchmark_uris() iterator. - Remove calls to env.require_dataset() and related operations. These are no longer required. - Remove accesses to env.benchmarks. An iterator over available benchmark URIs is now available at env.datasets.benchmark_uris(), but the list of URIs cannot be relied on to be fully enumerable (the LLVM environments have over 2^32 URIs). - Review code that accesses env.observation_space and update to env.observation_space_spec where necessary (#228). - Update compiler service implementations to support the updated RPC interface by removing the deprecated GetBenchmarks RPC endpoint and replacing it with Dataset classes. See the example service for details. - [LLVM] Update references to the poj104-v0 dataset to poj104-v1. - [LLVM] Update references to the cBench-v1 dataset to cbench-v1.
This release introduces some significant changes to the way that benchmarks are managed, introducing a new dataset API. This enabled us to add support for millions of new benchmarks and a more efficient implementation for the LLVM environment, but this will require some migrating of old code to the new interfaces (see “Migration Checklist” below). Some of the key changes of this release are: - [Core API change] We have added a Python Benchmark class (facebookresearch#190). The env.benchmark attribute is now an instance of this class rather than a string (facebookresearch#222). - [Core behavior change] Environments will no longer select benchmarks randomly. Now env.reset() will now always select the last-used benchmark, unless the benchmark argument is provided or env.benchmark has been set. If no benchmark is specified, a default is used. - [API deprecations] We have added a new Dataset class hierarchy (facebookresearch#191, facebookresearch#192). All datasets are now available without needing to be downloaded first, and a new Datasets class can be used to iterate over them (facebookresearch#200). We have deprecated the old dataset management operations, the compiler_gym.bin.datasets script, and removed the --dataset and --ls_benchmark flags from the command line tools. - [RPC interface change] The StartSession RPC endpoint now accepts a list of initial observations to compute. This removes the need for an immediate call to Step, reducing environment reset time by 15-21% (facebookresearch#189). - [LLVM] We have added several new datasets of benchmarks, including the Csmith and llvm-stress program generators (facebookresearch#207), a dataset of OpenCL kernels (facebookresearch#208), and a dataset of compilable C functions (facebookresearch#210). See the docs for an overview. - CompilerEnv now takes an optional Logger instance at construction time for fine-grained control over logging output (facebookresearch#187). - [LLVM] The ModuleID and source_filename of LLVM-IR modules are now anonymized to prevent unintentional overfitting to benchmarks by name (facebookresearch#171). - [docs] We have added a Feature Stability section to the documentation (facebookresearch#196). - Numerous bug fixes and improvements. Please use this checklist when updating code for the previous CompilerGym release: - Review code that accesses the env.benchmark property and update to env.benchmark.uri if a string name is required. Setting this attribute by string (env.benchmark = "benchmark://a-v0/b") and comparison to string types (env.benchmark == "benchmark://a-v0/b") still work. - Review code that calls env.reset() without first setting a benchmark. Previously, calling env.reset() would select a random benchmark. Now, env.reset() always selects the last used benchmark, or a predetermined default if none is specified. - Review code that relies on env.benchmark being None to select benchmarks randomly. Now, env.benchmark is always set to the previously used benchmark, or a predetermined default benchmark if none has been specified. Setting env.benchmark = None will raise an error. Select a benchmark randomly by sampling from the env.datasets.benchmark_uris() iterator. - Remove calls to env.require_dataset() and related operations. These are no longer required. - Remove accesses to env.benchmarks. An iterator over available benchmark URIs is now available at env.datasets.benchmark_uris(), but the list of URIs cannot be relied on to be fully enumerable (the LLVM environments have over 2^32 URIs). - Review code that accesses env.observation_space and update to env.observation_space_spec where necessary (facebookresearch#228). - Update compiler service implementations to support the updated RPC interface by removing the deprecated GetBenchmarks RPC endpoint and replacing it with Dataset classes. See the example service for details. - [LLVM] Update references to the poj104-v0 dataset to poj104-v1. - [LLVM] Update references to the cBench-v1 dataset to cbench-v1.
This switches over the
CompilerEnv
environment to use the new dataset API, dropping theLegacyDataset
class.Disclaimer: Sorry for the large diff :-( Two reasons: (1) renaming cBench to cbench causes a lot of churn, (2) the datasets API touches nearly everything, and the switch from old to new API should be atomic.
Background
Since the very first prototype of CompilerGym, a
Benchmark
protocol buffer has been used to provide a serializable representation of benchmarks that can be passed back and forth between the service and the frontend:CompilerGym/compiler_gym/service/proto/compiler_gym_service.proto
Lines 266 to 276 in 49c10d7
Initially, it was up to the compiler service to maintain the set of available benchmarks, exposing the available benchmarks with a
GetBenchmarks()
RPC method, and allowing new benchmarks to be added using anAddBenchmarks()
method.This was fine for the initial use case of shipping a handful of benchmarks and allowing ad-hoc new benchmarks to be added, but for managing larger sets of benchmarks, a datasets abstraction was added.
Initial Datasets abstraction
To add support for managing large sets of programs, a Dataset tuple was added that describes a set of programs, and a link to the a tarball containing those programs. The tarball is required to have a JSON file containing metadata, and a directory containing the benchmarks, one file per benchmark. A set of operations were added to the frontend command line to make downloading and unpacking these tarballs easier:
CompilerGym/compiler_gym/bin/datasets.py
Lines 5 to 133 in 49c10d7
Problems with this approach
New Dataset API
This pull request changes the ownership model so that the Environment owns the benchmarks and datasets, not the service. This uses the new
Dataset
class hierarchy that has been added in previous pull requests: #190, #191, #192, #200, #201.Now, the backend has no knowledge of "datasets". Instead the service simply keeps a small cache of benchmarks that it has seen. If a session request has a benchmark URI that is not in this cache, the service returns a "resource not found" error and the frontend logic can then respond by sending it a copy of the benchmark as a
Benchmark
proto. The service is free to cache this for future use, and can empty the cache whenever it wants.This new approach has a few key benefits:
Summary of changes
env.benchmark
from a string to aBenchmark
instance.env.benchmark
a mandatory attribute. If no benchmark is provided at init time, one is chosen deterministically. If you wish to select a random benchmark, useenv.datasets.benchmark()
.env.fork()
no longer requiresenv.reset()
to have been called first. It will callenv.reset()
if required.env.benchmark = None
is no longer a valid way of requesting a random benchmark. If you would like a random benchmark, you must now roll your own random picker usingenv.datasets.benchmark_uris()
and similar.LegacyDataset
operations, changing their behavior to no-ops, and removing the class.cBench
tocbench
to be consistent with the lower-case naming convention of gym. The oldcBench
datasets are kept around but are marked deprecated to encourage migration.Migrating to the new interface
To migrate existing code to the new interface:
cBench-v[01]
tocbench-v1
.env.benchmark
property and update toenv.benchmark.uri
if a string name is required.env.reset()
without first setting a benchmark. Previously, callingenv.reset()
would select a random benchmark. Now,env.reset()
always selects the last used benchmark, or a predetermined default if none is specified.env.benchmark
beingNone
to select benchmarks randomly. Now,env.benchmark
is always set to the previously used benchmark, or a predetermined default benchmark if none has been provided.env.require_dataset()
.Performance impact
A comparison of
benchmarks/bench_test.py
on currentdevelopment
vs this PR:Issue #45.