-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce PR Benchmark Workflow #903
Introduce PR Benchmark Workflow #903
Conversation
benchmarks/asv.conf.json
Outdated
"python -m build --wheel -o {build_cache_dir} {build_dir}" | ||
], | ||
"environment_type": "virtualenv", | ||
"show_commit_url": "https://github.com/lapp0/outlines/commit/", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/outlines-dev/outlines/commit/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change needs to be made.
Can we do without the token/permissions if we get rid of the PR comment feature? We really don't need a comment added to a PR if there's a workflow run with the output that we can check. Also, we need to confirm that this approach will block merging if the benchmarks don't pass. Ideally we won't need to run the benchmarks on every commit, but would still need them to be run in order to merge. In that scenario, it would be up to the maintainers to add a tag that runs the benchmarks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @lapp0 🙂
This approach looks really good for outlines
's benchmarking requirements. Here are few of my thoughts on this PR.
ded19a6
to
90f30fa
Compare
edd9789
to
22771de
Compare
Thanks for the feedback! As requested I've
Example: Terrible performance regression resulting in failure:
Benchmarks that have stayed the same:
Benchmarks that have got worse:
Performance degradation detected! Error: Process completed with exit code 1. |
22771de
to
ecc08c5
Compare
b7ab26c
to
f1e2bbb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some small comments/updates and we can merge this.
benchmarks/asv.conf.json
Outdated
"python -m build --wheel -o {build_cache_dir} {build_dir}" | ||
], | ||
"environment_type": "virtualenv", | ||
"show_commit_url": "https://github.com/lapp0/outlines/commit/", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change needs to be made.
benchmarks/bench_numba_compile.py
Outdated
|
||
from .common import clear_outlines_cache, setup_tokenizer | ||
|
||
outlines.disable_cache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to contain global state changes like this in the set-up/tear-down of these benchmarks, or—better yet—avoid them completely.
f1e2bbb
to
309e21f
Compare
309e21f
to
0ea382d
Compare
@brandonwillard I've introduced and tested a new |
Fixes #883
Changes
This change-set configures
asv
within the repo, along with theasv_benchmark_pr.yml
workflow to comment benchmark comparisons in each open PR.tests/benchmarks/
has been moved tobenchmarks/
and converted frompytest-benchmark
toasv
format.Behavior
HEAD
andoutlines-dev/outlines@main
HEAD
.--interleave-rounds -a repeat=3
inasv continuous
mitigates variance due to environmental factors described in Check benchmarks in CI #883, but triples the runtime compared to a single pass.repeat=3
): 23 minutes (should be close to test run time - 10 minutes)run_benchmarks
label is applied$GITHUB_STEP_SUMMARY
Examples
Times differ between 1% and 4% due to random variation: Workflow for ASV Benchmarks in PR lapp0/outlines#16 (comment)
Demo of Benchmark Output for PR with Performance Regression: Add time.sleep(0.1) to build_regex_from_schema() lapp0/outlines#18 (comment)
Out of Scope
With this infrastructure we can create useful historical performance dashboards such as https://asv-runner.github.io/asv-collection/pandas/ This requires a stable, dedicated machine which must have a guarantee of being idle during benchmark runs.
Repo Configuration Work
run_benchmarks
labelRemoved:
For this workflow we need to set up an access token for the repo with appropriate permissions:
Then create a new
asv-benchmarks
environment, and a secret with key =GH_TOKEN
, value = access token.Security
I recommend the following setting so arbitrary workflows cannot be run in malicious PRs
https://github.com/outlines-dev/outlines/settings/actions
Text field
TODO:
asv
configurationpytest-benchmark
toasv
harden workflow security (e.g. a PR with a new workflow usingGH_TOKEN
could spam the repo using the pull-requests write permissions)[ ] Optimize workflow run time (setup is majority of time, not benchmark execution)--interleave-rounds
@rlouf / @brandonwillard could you please share your thoughts on features / changes you'd like to see before this is ready for review?