Skip to content

Commit

Permalink
[WIP][single-machine-performance] Introduce regression detector jobs
Browse files Browse the repository at this point in the history
This PR intends to introduce the Single Machine Performance regression detector
into Agent CI. This builds on work done in #14477 and is peer to #14438. The
Regression Detector is a CI tool that determines if a changed introduced into a
project modifies project performance in a way that is more than just random
chance with some statistical guarantee. The Regression Detector is not a
microbenchmarking tool and must operate on the whole Agent. This PR introduces
only 'throughput' as an optmization goal -- how quickly can the Regression
Detector produce load into the Agent -- but other goals are
possible. Regressions are checked per-experiment, please see `tests/regression`
for details about how to define an experiment.

The Regression Detector runs today in vectordotdev/vector project and is
influential in keeping that project's performance consistently high.

REF SMP-208

Signed-off-by: Brian L. Troutwine <[email protected]>
  • Loading branch information
blt committed Dec 16, 2022
1 parent d7f89cd commit 1510016
Show file tree
Hide file tree
Showing 14 changed files with 249 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitlab/functional_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ include:
- /.gitlab/functional_test/security_agent.yml
- /.gitlab/functional_test/system_probe.yml
- /.gitlab/functional_test/serverless.yml
- /.gitlab/functional_test/regression_detector.yml
52 changes: 52 additions & 0 deletions .gitlab/functional_test/regression_detector.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
regression_detector_submit_job:
stage: functional_test
image: 486234852809.dkr.ecr.us-east-1.amazonaws.com/ci/datadog-agent-buildimages/docker_x64:$DATADOG_AGENT_BUILDIMAGES
tags: ["runner:docker"]
needs:
- job: docker_build_agent7_single_machine_performance
artifacts: false
artifacts:
expire_in: 1 weeks
paths:
- submission_metadata
variables:
SMP_VERSION: 0.6.2
LADING_VERSION: 0.10.2
TOTAL_SAMPLES: 600
WARMUP_SECONDS: 45
REPLICAS: 10
CPUS: 7
MEMORY: "30g"
script:
# Setup AWS credentials for single-machine-performance AWS account
- SMP_ACCOUNT_ID=$(aws ssm get-parameter --region us-east-1 --name ci.datadog-agent.single-machine-performance-account-id --with-decryption --query "Parameter.Value" --out text)
- SMP_ECR_URL=${SMP_ACCOUNT_ID}.dkr.ecr.us-west-2.amazonaws.com
- SMP_AGENT_TEAM_ID=$(aws ssm get-parameter --region us-east-1 --name ci.datadog-agent.single-machine-performance-agent-team-id --with-decryption --query "Parameter.Value" --out text)
- aws configure set aws_access_key_id $(aws ssm get-parameter --region us-east-1 --name ci.datadog-agent.single-machine-performance-bot-access-key-id --with-decryption --query "Parameter.Value" --out text) --profile single-machine-performance
- aws configure set aws_secret_access_key $(aws ssm get-parameter --region us-east-1 --name ci.datadog-agent.single-machine-performance-bot-access-key --with-decryption --query "Parameter.Value" --out text) --profile single-machine-performance
- aws configure set region us-west-2 --profile single-machine-performance
# Download smp binary and prepare it for use
- aws --profile single-machine-performance s3 cp s3://smp-cli-releases/v${SMP_VERSION}/x86_64-unknown-linux-gnu/smp smp
- chmod +x smp
# Submit job, using the current main SHA as baseline. This will have been
# previously submitted in a separate pipeline run. The comparison will have
# been submitted in this pipeline run.
- BASELINE_SHA=$(git rev-parse ${CI_MERGE_REQUEST_TARGET_BRANCH_NAME})
- BASELINE_IMAGE=${SMP_ECR_URL}/${SMP_AGENT_TEAM_ID}-agent:${BASELINE_SHA}-7-amd64
- COMPARISON_IMAGE=${SMP_ECR_URL}/${SMP_AGENT_TEAM_ID}-agent:${CI_COMMIT_SHA}-7-amd64
- ./smp --team-id ${SMP_AGENT_TEAM_ID} --aws-named-profile single-machine-performance
job submit
--lading-version ${LADING_VERSION}
--total-samples ${TOTAL_SAMPLES}
--warmup-seconds ${WARMUP_SECONDS}
--replicas ${REPLICAS}
--baseline-image ${BASELINE_IMAGE}
--comparison-image ${COMPARISON_IMAGE}
--baseline-sha ${BASELINE_SHA}
--comparison-sha ${CI_COMMIT_SHA}
--target-command "/usr/local/bin/agent run --cfgpath /etc/agent"
--target-config-dir test/regression/
--target-cpu-allotment ${CPUS}
--target-memory-allotment ${MEMORY}
--target-name agent
--submission-metadata submission_metadata
30 changes: 30 additions & 0 deletions test/regression/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Regression Detection

The Regression Detector, owned by Single Machine Performance, is a tool that
detects if there are more-than-random performance changes to a target program --
here, the Agent -- across a variety of experiments and goals. This directory
contains the experiments for Agent. A similar one exists in [Vector]. Please do
add your own experiments, instructions below. If you have any questions do
contact #single-machine-performance; we'll be glad to help.

## Adding an Experiment

In order for SMP's tooling to properly read a experiment directory please
adhere to the following structure. Starting at the root:

* `cases/` -- __Required__ The directory that contains each regression
experiment. Each sub-directory is a separate experiment and the name of the
directory is the name of the experiment, for instance
`tcp_syslog_to_blackhole`. We call these sub-directories 'cases'.

The structure of each case is as follows:

* `lading/lading.yaml` -- __Required__ The [lading] configuration inside its own
directory. Directory will be mount read-only in the container built from
`Dockerfile` above at `/etc/lading`.
* `agent/` -- __Required__ This is the configuration directory of your
program. Will be mounted read-only in the container build from `Dockerfile`
above at `/etc/agent`.

[Vector]: https://github.com/vectordotdev/vector/tree/master/regression
[lading]: https://github.com/DataDog/lading
16 changes: 16 additions & 0 deletions test/regression/cases/file_to_blackhole/lading/lading.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
generator:
- file_gen:
seed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
path_template: "/tmp/file-gen-%NNN%.log"
duplicates: 4
variant: "ascii"
bytes_per_second: "100Mb"
maximum_bytes_per_file: "100Mb"
maximum_prebuild_cache_size_bytes: "400Mb"

blackhole:
- http:
binding_addr: "127.0.0.1:9091"
- http:
binding_addr: "127.0.0.1:9092"
15 changes: 15 additions & 0 deletions test/regression/cases/tcp_ascii_to_blackhole/lading/lading.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
generator:
- tcp:
seed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
addr: "127.0.0.1:10000"
variant: "ascii"
bytes_per_second: "500 Mb"
block_sizes: ["1Mb", "0.5Mb", "0.25Mb", "0.125Mb", "128Kb"]
maximum_prebuild_cache_size_bytes: "256 Mb"

blackhole:
- tcp:
binding_addr: "127.0.0.1:9091"
- http:
binding_addr: "127.0.0.1:9092"
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
generator:
- tcp:
seed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
addr: "127.0.0.1:10000"
variant: "ascii"
bytes_per_second: "500 Mb"
block_sizes: ["1Mb", "0.5Mb", "0.25Mb", "0.125Mb", "128Kb"]
maximum_prebuild_cache_size_bytes: "256 Mb"

blackhole:
- http:
binding_addr: "127.0.0.1:9091"
- http:
binding_addr: "127.0.0.1:9092"
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
generator:
- tcp:
seed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
addr: "127.0.0.1:10000"
variant: "datadog_log"
bytes_per_second: "500 Mb"
block_sizes: ["8Kb", "4Kb", "2Kb", "1Kb", "512b", "256b", "128b"]
maximum_prebuild_cache_size_bytes: "256 Mb"

blackhole:
- tcp:
binding_addr: "127.0.0.1:9091"
- http:
binding_addr: "127.0.0.1:9092"
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
generator:
- tcp:
seed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
addr: "127.0.0.1:10000"
variant: "datadog_log"
bytes_per_second: "500 Mb"
block_sizes: ["0.125Mb", "128Kb", "64Kb", "32Kb", "16Kb", "8Kb"]
maximum_prebuild_cache_size_bytes: "256 Mb"

blackhole:
- tcp:
binding_addr: "127.0.0.1:9091"
- http:
binding_addr: "127.0.0.1:9092"
15 changes: 15 additions & 0 deletions test/regression/cases/tcp_dd_logs_mask/lading/lading.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
generator:
- tcp:
seed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
addr: "127.0.0.1:10000"
variant: "datadog_log"
bytes_per_second: "500 Mb"
block_sizes: ["8Kb", "4Kb", "2Kb", "1Kb", "512b", "256b", "128b"]
maximum_prebuild_cache_size_bytes: "256 Mb"

blackhole:
- tcp:
binding_addr: "127.0.0.1:9091"
- http:
binding_addr: "127.0.0.1:9092"
15 changes: 15 additions & 0 deletions test/regression/cases/tcp_dd_logs_mask_replace/lading/lading.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
generator:
- tcp:
seed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
addr: "127.0.0.1:10000"
variant: "datadog_log"
bytes_per_second: "500 Mb"
block_sizes: ["8Kb", "4Kb", "2Kb", "1Kb", "512b", "256b", "128b"]
maximum_prebuild_cache_size_bytes: "256 Mb"

blackhole:
- tcp:
binding_addr: "127.0.0.1:9091"
- http:
binding_addr: "127.0.0.1:9092"
15 changes: 15 additions & 0 deletions test/regression/cases/tcp_json_logs_tiny_blocks/lading/lading.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
generator:
- tcp:
seed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
addr: "127.0.0.1:10000"
variant: "json"
bytes_per_second: "500 Mb"
block_sizes: ["8Kb", "4Kb", "2Kb", "1Kb", "512b", "256b", "128b"]
maximum_prebuild_cache_size_bytes: "256 Mb"

blackhole:
- tcp:
binding_addr: "127.0.0.1:9091"
- http:
binding_addr: "127.0.0.1:9092"
15 changes: 15 additions & 0 deletions test/regression/cases/tcp_json_to_blackhole/lading/lading.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
generator:
- tcp:
seed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
addr: "127.0.0.1:10000"
variant: "json"
bytes_per_second: "500 Mb"
block_sizes: ["1Mb", "0.5Mb", "0.25Mb", "0.125Mb", "128Kb"]
maximum_prebuild_cache_size_bytes: "256 Mb"

blackhole:
- tcp:
binding_addr: "127.0.0.1:9091"
- http:
binding_addr: "127.0.0.1:9092"
15 changes: 15 additions & 0 deletions test/regression/cases/tcp_syslog_to_blackhole/lading/lading.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
generator:
- tcp:
seed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
addr: "127.0.0.1:10000"
variant: "syslog5424"
bytes_per_second: "500 Mb"
block_sizes: ["1Mb", "0.5Mb", "0.25Mb", "0.125Mb", "128Kb"]
maximum_prebuild_cache_size_bytes: "256 Mb"

blackhole:
- tcp:
binding_addr: "127.0.0.1:9091"
- http:
binding_addr: "127.0.0.1:9092"
15 changes: 15 additions & 0 deletions test/regression/cases/udp_json_to_blackhole/lading/lading.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
generator:
- udp:
seed: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53,
59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103, 107, 109, 113, 127, 131]
addr: "127.0.0.1:10000"
variant: "json"
bytes_per_second: "500 Mb"
block_sizes: ["1Kb"]
maximum_prebuild_cache_size_bytes: "256 Mb"

blackhole:
- http:
binding_addr: "127.0.0.1:9091"
- http:
binding_addr: "127.0.0.1:9092"

0 comments on commit 1510016

Please sign in to comment.