perf

nmote

and

akuhlens

chore: Handle FileNotFound error when computing benchmark stats (semg…

Sep 10, 2024

a51e0a7 · Sep 10, 2024

History

This branch is 1 commit ahead of, 670 commits behind semgrep/semgrep:develop.

Name	Name	Last commit message	Last commit date
parent directory ..
bench	bench	Update pre-commit config (semgrep#4632 )	Feb 8, 2022
configs	configs	chore: cleanup and add tests for prefilter (semgrep#9042 )	Oct 19, 2023
perf-matching	perf-matching	Generate rule files from pattern files since semgrep-core no longer s…	Aug 23, 2024
r2c-rules	r2c-rules	chore: cleanup and add tests for prefilter (semgrep#9042 )	Oct 19, 2023
rules	rules	ci: pro-benchmark-testing: Rename semgrep -> OSS (semgrep#1505 )	May 15, 2024
snapshots	snapshots	ci: pro-benchmark-testing: Rename semgrep -> OSS (semgrep#1505 )	May 15, 2024
Makefile	Makefile	tests(perf): Change code that updates the snapshots (semgrep#7441 )	Apr 3, 2023
README.md	README.md	chore: Improve benchmarks docs and usability (semgrep#8495 )	Aug 23, 2023
benchmark-against-version.py	benchmark-against-version.py	Add wrapper script in perf/ for benchmarking a local semgrep vs. rele…	Sep 15, 2021
compare-bench-findings	compare-bench-findings	chore(internal): Update pre-commit config (semgrep#9384 )	Dec 5, 2023
compare-perf	compare-perf	tests: Make mini-benchmarks workflow less noisy (semgrep#9246 )	Nov 15, 2023
config.py	config.py	Tests: add benchmarks for pro (semgrep#7338 )	Mar 27, 2023
constants.py	constants.py	Replace bench folders with config file (semgrep#3952 )	Oct 19, 2021
corpus.py	corpus.py	chore(internal): Update pre-commit config (semgrep#9384 )	Dec 5, 2023
repository_time_per_rule.py	repository_time_per_rule.py	Add wrapper script in perf/ for benchmarking a local semgrep vs. rele…	Sep 15, 2021
run-benchmarks	run-benchmarks	chore: Handle FileNotFound error when computing benchmark stats (semg…	Sep 10, 2024
variant.py	variant.py	core: Remove matching cache (semgrep#8143 )	Jun 27, 2023

README.md

Semgrep benchmarks

This folder is for running realistic benchmarks for semgrep, as opposed to more focused tests.

The main results can be visualized over time here: https://metabase.corp.r2c.dev/question/560-semgrep-bench-all-history (this is accessible only to Semgrep employees).

Requirements

The semgrep command must be available, as well as generic development tools including git and python3.

If you want to use the --plot-benchmarks option, you will need to install the tk package, as well as pip install the matplotlib and pandas Python libraries.

Architecture

Each benchmark has a name. For each benchmark, we run the standard semgrep commands as well as variants which disable or enable certain optimizations.

The workspace looks like this:

.
├── bench
│   ├── dummy
│   │   ├── input
│   │   │   ├── dummy
│   │   │   ├── rules
│   │   │       └── exec.yaml
│   │   │   ├── targets
│   │   │           ├── hello.js
│   │   │           └── malformed.js
│   │   └── prep
│   └── njs
│       ├── input
│       │   ├── juice-shop/ (lots of files)
│       │   └── njsscan/ (lots of files)
│       └── prep
├── rules
|   └── semgrep_fast.yml
├── configs
|   └── ci_medium_repos.yaml
|
├── Makefile
├── README.md
└── run-benchmarks

The total duration of each benchmark is uploaded to the semgrep dashboard, for example as semgrep.bench.njs.std.total-time. Other metrics such as memory usage could be reported in the future.

Then you can use some SQL queries on metabase to slice and dice and visualize those metrics (e.g., https://metabase.corp.r2c.dev/question/560-semgrep-bench-all-history )

The number of parallel jobs is the maximum number of logical CPUs offered by the host as is the default for semgrep.

Manual operation

Read and use the Makefile or call ./run-benchmarks directly. The bare make command will use the local semgrep command and overall is safe to use.

$ make

This will not upload the results to the dashboard, as it is reserved for CI jobs which run more or less in a consistent environment.

Reproducing the CI benchmarks

To reproduce the benchmarks as run in CI, see the toplevel Makefile. (Updates to this header should be reflected in the comment in that file.)

The benchmarks run in CI run three scripts (see scripts/run-benchmarks.sh):

perf/run-benchmarks -- runs with a specific version of Semgrep, installed via pip, and then the local version. At the end, the version of semgrep installed will be the local version
perf/compare-perf -- this checks how much the benchmarks deviate by and posts an update if it increases by a lot or a little
perf/compare-bench-findings -- this checks the findings to confirm that they are still the same as the expected snapshots

For local debugging, it is often easiest to run just the perf/run-benchmarks command in scripts/run-benchmarks.sh.

TODO: Running the script may modify cli/Pipfile and cli/Pipfile.lock. Those changes should not be committed

Modifying what benchmarks are run

The benchmarks to run are controlled by the configs, which reside in perf/configs. Below is an example config:

runs:
  - name: zulip # zulip rules on zulip
    repos:
      - url: https://github.com/zulip/zulip
        commit_hash: 829f9272d2c4299a0c0a37a09802248d8136c0a8
    rule_configs:
      - rules/zulip/semgrep.yml
    opts: [--fast]

The opts are the command line arguments passed to Semgrep.

In addition, it is possible to compare multiple versions of Semgrep by modifying SEMGREP_VARIANTS and not including the flag --std-only. For instance, if you made a version of Semgrep that ran with a matching cache (--matching-cache) and one without (--no-matching-cache), you could run both by adding them to the SEMGREP_VARIANTS. You can also include semgrep-core options in the same way (see variant.py).

If you have multiple variants and are running locally, you can easily compare the effect visually by adding --plot-benchmarks.

Troubleshooting CI with the semgrep Docker image

CI uses CircleCI or GitHub Actions, configured in the standard places (.circleci, .github/workflows). See those files to determine which jobs run and when.

We maintain a Docker build that comes with semgrep pre-installed. It can also be used for daily benchmarks and other jobs that use the development version of semgrep. The image URL is returntocorp/semgrep:develop. It is built and pushed to DockerHub by a CI job that triggers each time there's a change on the main branch of the semgrep repo.

$ docker pull returntocorp/semgrep:develop     # updates your local copy
$ docker run -it returntocorp/semgrep:develop  # starts bash in container

If you want to test some of your local code inside the container, you would mount your folder using the -v option. The usage is -v SRC:DST where SRC is an absolute path to your original folder and DST is the absolute path you want it to have in the container. The home folder for this image is set to /home/semgrep. For example you'd do this:

$ ls
my_stuff
$ docker run -v "$(pwd)"/my_stuff:/home/semgrep/my_stuff -it returntocorp/semgrep:develop
bash-5.1$ whoami
semgrep
bash-5.1$ ls ~
my_stuff
bash-5.1$ semgrep --version
0.46.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

perf

perf

README.md

Semgrep benchmarks

Requirements

Architecture

Manual operation

Reproducing the CI benchmarks

Modifying what benchmarks are run

Troubleshooting CI with the semgrep Docker image

Files

perf

Directory actions

More options

Directory actions

More options

Latest commit

History

perf

Folders and files

parent directory

README.md

Semgrep benchmarks

Requirements

Architecture

Manual operation

Reproducing the CI benchmarks

Modifying what benchmarks are run

Troubleshooting CI with the semgrep Docker image