Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce PR Benchmark Workflow #903

Merged
merged 4 commits into from
Jun 4, 2024

Conversation

lapp0
Copy link
Contributor

@lapp0 lapp0 commented May 18, 2024

Fixes #883

Changes

This change-set configures asv within the repo, along with the asv_benchmark_pr.yml workflow to comment benchmark comparisons in each open PR.

tests/benchmarks/ has been moved to benchmarks/ and converted from pytest-benchmark to asv format.

Behavior

Examples

Out of Scope

With this infrastructure we can create useful historical performance dashboards such as https://asv-runner.github.io/asv-collection/pandas/ This requires a stable, dedicated machine which must have a guarantee of being idle during benchmark runs.

Repo Configuration Work

  • Set up branch protection to ensure merging is disabled unless the new workflow is run
  • Add run_benchmarks label

Removed:

For this workflow we need to set up an access token for the repo with appropriate permissions:

  • contents: read
    • for retrieving compared revisions
  • pull-requests: read and write
    • for commenting

Then create a new asv-benchmarks environment, and a secret with key = GH_TOKEN, value = access token.

Security

I recommend the following setting so arbitrary workflows cannot be run in malicious PRs

https://github.com/outlines-dev/outlines/settings/actions

image

Text field

peter-evans/create-or-update-comment@*,
peter-evans/find-comment@*,
pre-commit/action@*,

TODO:

  • asv configuration
  • PR comment workflow
  • migrate benchmarks from pytest-benchmark to asv
  • harden workflow security (e.g. a PR with a new workflow using GH_TOKEN could spam the repo using the pull-requests write permissions)
  • use https://github.com/airspeed-velocity/asv/pull/1263/files
  • update docs
  • [ ] Optimize workflow run time (setup is majority of time, not benchmark execution)
    • We can't improve performance without abandoning --interleave-rounds
  • receive commentary

@rlouf / @brandonwillard could you please share your thoughts on features / changes you'd like to see before this is ready for review?

"python -m build --wheel -o {build_cache_dir} {build_dir}"
],
"environment_type": "virtualenv",
"show_commit_url": "https://github.com/lapp0/outlines/commit/",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/outlines-dev/outlines/commit/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change needs to be made.

@brandonwillard brandonwillard requested a review from kc611 May 21, 2024 22:20
@brandonwillard brandonwillard added the tests Linked to library tests label May 21, 2024
@brandonwillard
Copy link
Member

brandonwillard commented May 23, 2024

Can we do without the token/permissions if we get rid of the PR comment feature? We really don't need a comment added to a PR if there's a workflow run with the output that we can check.

Also, we need to confirm that this approach will block merging if the benchmarks don't pass. Ideally we won't need to run the benchmarks on every commit, but would still need them to be run in order to merge. In that scenario, it would be up to the maintainers to add a tag that runs the benchmarks.

Copy link

@kc611 kc611 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @lapp0 🙂

This approach looks really good for outlines's benchmarking requirements. Here are few of my thoughts on this PR.

.github/workflows/asv_benchmark_pr.yml Show resolved Hide resolved
.github/workflows/asv_benchmark_pr.yml Outdated Show resolved Hide resolved
.github/workflows/asv_benchmark_pr.yml Outdated Show resolved Hide resolved
.github/workflows/asv_benchmark_pr.yml Show resolved Hide resolved
@lapp0 lapp0 force-pushed the introduce-asv-ci-workflow-883 branch 2 times, most recently from edd9789 to 22771de Compare May 31, 2024 22:07
@lapp0
Copy link
Contributor Author

lapp0 commented May 31, 2024

Thanks for the feedback!

As requested I've

  • Removed comment functionality, instead results are shown in the BENCHMARK RESULTS section of workflow.
  • Updated the workflow to ensure it runs only when workflow is manually dispatched OR for every commit when run_benchmarks label is applied.
  • Updated the asv.conf.json build_command to be consistent with current build command defaults.
  • Forced workflow failure if performance decline for any test exceeds 10%

Rendered Docs

Requires #929 merge first for tests to pass.

Example: Terrible performance regression resulting in failure:

Benchmarks that have stayed the same:

Change Before [538f77a] After [31b44ca] Ratio Benchmark (Parameter)
4.16±0.01s 4.29±0.02s 1.03 bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_fsm('complex_schema')
2.08±0s 2.13±0.08s 1.03 bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_fsm('simple_schema')
4.89±0.02s 4.92±0.01s 1.01 bench_numba_compile.NumbaCompileBenchmark.time_compile_numba
591M 592M 1 bench_regex_guide.MemoryRegexGuideBenchmark.peakmem_regex_to_guide('complex_span_constrained_relation_extraction')
497M 497M 1 bench_regex_guide.MemoryRegexGuideBenchmark.peakmem_regex_to_guide('simple_phone')
628±3ms 623±4ms 0.99 bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('complex_phone')
5.69±0s 5.69±0.01s 1 bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('complex_span_constrained_relation_extraction')
353±6ms 352±4ms 1 bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('date')
284±0.9ms 283±2ms 1 bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('email')
245±0.7ms 241±3ms 0.99 bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('ip')
179±2ms 181±2ms 1.01 bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('simple_phone')
129±0.9ms 129±0.6ms 1 bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('ssn')
124±0.7ms 123±2ms 0.99 bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('time')
411±2ms 409±0.5ms 1 bench_regex_guide.RegexGuideBenchmark.time_regex_to_guide('url')

Benchmarks that have got worse:

Change Before [538f77a] After [31b44ca] Ratio Benchmark (Parameter)
+ 79.9±2μs 100±0.01ms 1255.5 bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_regex('complex_schema')
+ 43.8±0.5μs 100±0ms 2288.37 bench_json_schema.JsonSchemaBenchmark.time_json_schema_to_regex('simple_schema')

Performance degradation detected!

Error: Process completed with exit code 1.

Copy link
Member

@brandonwillard brandonwillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small comments/updates and we can merge this.

"python -m build --wheel -o {build_cache_dir} {build_dir}"
],
"environment_type": "virtualenv",
"show_commit_url": "https://github.com/lapp0/outlines/commit/",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change needs to be made.


from .common import clear_outlines_cache, setup_tokenizer

outlines.disable_cache()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to contain global state changes like this in the set-up/tear-down of these benchmarks, or—better yet—avoid them completely.

@lapp0 lapp0 force-pushed the introduce-asv-ci-workflow-883 branch from f1e2bbb to 309e21f Compare June 4, 2024 06:33
@lapp0 lapp0 force-pushed the introduce-asv-ci-workflow-883 branch from 309e21f to 0ea382d Compare June 4, 2024 06:33
@lapp0
Copy link
Contributor Author

lapp0 commented Jun 4, 2024

@brandonwillard I've introduced and tested a new outlines.caching.cache_disabled context manager. This decorator is applied to all benchmark tests.

@brandonwillard brandonwillard merged commit b5a2073 into dottxt-ai:main Jun 4, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmarks tests Linked to library tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check benchmarks in CI
3 participants