Skip to content

Add scale test mode to example agent#481

Merged
tigrannajaryan merged 26 commits intoopen-telemetry:mainfrom
michel-laterman:enhancement/scale-test
Feb 10, 2026
Merged

Add scale test mode to example agent#481
tigrannajaryan merged 26 commits intoopen-telemetry:mainfrom
michel-laterman:enhancement/scale-test

Conversation

@michel-laterman
Copy link
Copy Markdown
Contributor

@michel-laterman michel-laterman commented Jan 6, 2026

Add a scale test mode to the example agent that will start a lot of agents (1 goroutine per agent) that connect to a server.

This allows us to leverage all capabilities in the spec that have been added to the example agent.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.70%. Comparing base (cd3b0ab) to head (ded57ab).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #481   +/-   ##
=======================================
  Coverage   81.70%   81.70%           
=======================================
  Files          27       27           
  Lines        2137     2137           
=======================================
  Hits         1746     1746           
  Misses        266      266           
  Partials      125      125           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@michel-laterman michel-laterman changed the title Add dumb scale test agent Add scale test driver Jan 6, 2026
Comment thread internal/examples/scale/main.go Outdated
@michel-laterman
Copy link
Copy Markdown
Contributor Author

michel-laterman commented Jan 8, 2026

If I limit the (docker) server image to only 2 CPUs, and 256M memory, enrolling 4000 leaves the container memory usage at 80% (as reported by docker container stats).

The server encounters no errors during this time; and all agents are able to report that all send connection settings have been applied.

I'm using my laptop, a 2023 MacBook Pro with an Apple M3 Pro chip for these tests.

One thing I would like to follow up on is that it the client sending a close; that's detected as a normal closure results in the server connection's OnReadMessageError callback being invoked.
EDIT: Fixed this as a part of #484

@tigrannajaryan
Copy link
Copy Markdown
Member

The direction of this PR looks good to me. Modify the existing example agent to be reusable for scale testing.

Comment thread internal/examples/makefile
@michel-laterman michel-laterman changed the title Add scale test driver Add scale test mode to example agent Jan 16, 2026
Copy link
Copy Markdown
Contributor

@juandemanjon juandemanjon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this feature to the agent example

Please, add some test to validate the new features.

"strconv"
"time"

opampinternal "github.com/open-telemetry/opamp-go/internal"
Copy link
Copy Markdown
Contributor

@juandemanjon juandemanjon Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break if the examples are moved to another repository. Currently there’s no direct dependency on opamp-go/internal even though the examples live under internal/examples.

I recommend duplicating NopLogger in internal/examples/agent/agent/logger.go:

type NopLogger struct{}

func (l *NopLogger) Debugf(ctx context.Context, format string, v ...interface{}) {}
func (l *NopLogger) Errorf(ctx context.Context, format string, v ...interface{}) {}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the existing NopLogger in order to avoid code duplication

Comment thread internal/examples/server/opampsrv/opampsrv.go Outdated
Comment thread internal/examples/server/opampsrv/opampsrv.go Outdated
Comment thread internal/examples/agent/README.md Outdated
@michel-laterman
Copy link
Copy Markdown
Contributor Author

I've split off the example server metrics into: #499

@michel-laterman
Copy link
Copy Markdown
Contributor Author

Running a docker container running with memory: 256M, and cpus: '2' on a 2023 MacBook Pro with an Apple M3 Pro chip:

docker version
Client:
 Version:           29.1.3
 API version:       1.52
 Go version:        go1.25.5
 Git commit:        f52814d
 Built:             Fri Dec 12 14:48:46 2025
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.55.0 (213807)
 Engine:
  Version:          29.1.3
  API version:      1.52 (minimum version 1.44)
  Go version:       go1.25.5
  Git commit:       fbf3ed2
  Built:            Fri Dec 12 14:50:40 2025
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          v2.2.0
  GitCommit:        1c4457e00facac03ce1d75f7b6777a7a851e5c41
 runc:
  Version:          1.3.4
  GitCommit:        v1.3.4-0-gd6d73eb8
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

I destroy and recreate the the opamp-server for each test, docker container stats for the server shows::

scale-count: 4000
CONTAINER ID   NAME           CPU %     MEM USAGE / LIMIT   MEM %     NET I/O           BLOCK I/O   PIDS
dd0e2dd653ec   opamp-server   1.45%     200.7MiB / 256MiB   78.41%    22.9MB / 26.5MB   0B / 0B     18

scale-count: 5000
CONTAINER ID   NAME           CPU %     MEM USAGE / LIMIT   MEM %     NET I/O           BLOCK I/O         PIDS
b3e90a9afd5c   opamp-server   1.87%     247MiB / 256MiB     96.49%    25.1MB / 30.9MB   54.7MB / 89.8MB   18

scale-count: 6000
CONTAINER ID   NAME           CPU %     MEM USAGE / LIMIT   MEM %     NET I/O           BLOCK I/O       PIDS
c90eed3d64bd   opamp-server   5.82%     250.2MiB / 256MiB   97.75%    30.8MB / 37.4MB   442MB / 509MB   18

The above stats are what generally occur once the server is "at rest" (few messages are exchanged). No server error callbacks are triggered in the above tests. However the CPU usage goes over 200% during the initial message exchanges.

When the scale count is set to 10k we see:

CONTAINER ID   NAME           CPU %     MEM USAGE / LIMIT   MEM %     NET I/O           BLOCK I/O         PIDS
20843ec87456   opamp-server   6.15%     255.6MiB / 256MiB   99.85%    46.9MB / 57.7MB   1.57GB / 1.82GB   19

However (read) errors are present in the opamp-server

Comment thread internal/examples/agent/main.go Outdated
Comment thread internal/examples/agent/main.go Outdated
Comment thread internal/examples/agent/main.go
@tigrannajaryan tigrannajaryan merged commit 7cd1897 into open-telemetry:main Feb 10, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

scale test utility

3 participants