Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project Tracking: Performance Benchmarking SIG #1617

Closed
cartersocha opened this issue Jul 27, 2023 · 26 comments
Closed

Project Tracking: Performance Benchmarking SIG #1617

cartersocha opened this issue Jul 27, 2023 · 26 comments

Comments

@cartersocha
Copy link
Contributor

cartersocha commented Jul 27, 2023

Description

As the adoption of OpenTelemetry grows and larger enterprises continue to deepen their usage of project components there are persistent and ongoing end user questions about the OpenTelemetry performance impact. End user performance varies due to the quirks of their environment but without a project performance standard and historical data record no one really knows if the numbers they're seeing are abnormal or expected. Additionally, there is no comprehensive documentation available on tuning project components or the performance trade-offs available to users which results in a reliance on vendor support.

Project Maintainers need to be able to track the current state of their components while preventing any performance regressions when making new releases. Customers need to be able to get a general sense of potential OpenTelemetry performance impact and the certainty that OpenTelemetry takes performance and customer resources seriously. Performance tracking and quantification is a general need that should be addressed by a project wide effort and automated tooling that minimizes repo owner effort while providing valuable new data points for all project stakeholders.

Project Board

SIG Charter

charter

Deliverables

  • Evaluate the current performance benchmarking specification, propose an updated benchmarking standard that can apply across project components, and make the requisite specification updates. The benchmarking standard should provide relevant information for maintainers and end users.
  • Develop automated tooling that can be used across project repos to report current performance numbers and track changes as new features / PRs are merged.
  • Write performance tuning documentation for the project website that can help customers make actionable decisions when faced with performance trade-offs or debugging bad component performance.
  • Provide ongoing maintenance as needed on automated tooling and own the underlying assets

Initial implementation scope would be the core Collector components (main repo), JavaScript / Java / Python SDKs and their core components. No contrib or instrumentation.

Staffing / Help Wanted

Anyone with an opinion on performance standards and testing.

Language maintainers or approvers as they will be tasked with implementing the changes and following through on the process.

Required staffing

lead - tbd
@jpkrohling domain expert
@cartersocha contributor
@mwear collector sig
@codeboten collector sig implementation
@ocelotl python sig
@martinkuba javascript
@tylerbenson java
@sbaum1994 contributor

@jpkrohling - TC/GC sponsor
@alolita - TC/GC sponsor

Need: more performance domain experts
Need: maintainers or approvers from several language sigs to participate

Meeting Times

TBD

Timeline

Initial scope is for the Collector and 3 SDKs. Output should be by KubeCon NA November 6, 2023

Labels

tbd

Linked Issues and PRs

https://opentelemetry.io/docs/collector/benchmarks/
cncf/cluster#245
cncf/cluster#182
https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/performance-benchmark.md
https://opentelemetry.io/docs/specs/otel/performance-benchmark/

@cartersocha
Copy link
Contributor Author

@puckpuck fyi

@tigrannajaryan
Copy link
Member

Please delete boilerplate like this form the description to make it easier to read:

A description of what this project is planning to deliver, or is in the process of delivering. This includes all OTEPs and their associated prototypes.

In general, OTEPs are not accepted unless they come with working prototypes available to review in at least two languages. Please discuss these requirements with a TC member before submitting an OTEP.

There is more like that that seems to be copied from a template and should be deleted/replaced by more specifics.

@tigrannajaryan
Copy link
Member

  • Evaluate the current performance benchmarking specification

Does this refer to this document?

@jpkrohling
Copy link
Member

cc @gsoria and @harshita19244, as they worked on performance benchmarks for SDKs at different stages (OpenTracing and OpenTelemetry) and can share their experience in doing so.

@jpkrohling
Copy link
Member

cc @sh0rez and @frzifus , as they are interested in benchmarking the collector against other solutions

@alolita
Copy link
Member

alolita commented Aug 1, 2023

@cartersocha I'd be happy to be the second GC sponsor supporting this Performance Benchmarking SIG.

I recommend creating a Charter doc for this SIG to map out more details about the mission, goals, deliverables and logistics for this SIG. Let's also itemize what items are out of scope and non-goals since performance benchmarking is a subjective area for an open source project of OpenTelemetry's breadth and depth.

Please share link on this thread.

@harshita19244
Copy link

harshita19244 commented Aug 1, 2023

Hi, I worked on the performance benchmarking project to compare the performance of the Opentracing and the Opentelemetry libraries as a part of my Outreachy internship. All tests were executed on bare metal machines. Please find the GitHub repo here: https://github.com/harshita19244/opentelemetry-java-benchmarks
Do feel free to reach out to me in case of questions.

@brettmc
Copy link

brettmc commented Aug 2, 2023

Over in PHP SIG, we've implemented (most of) the documented perf tests, but what I think we lack is a way to run them on consistent hardware, and a way to publish the results (or compare to a benchmark to track regressions/improvements).

@cartersocha
Copy link
Contributor Author

@brettmc already made an ask for bare metal machines that was approved. I’ll share the details once we get them cncf/cluster#245

@frzifus
Copy link
Member

frzifus commented Aug 2, 2023

Thx @cartersocha for starting this!

Anyone with an opinion on performance standards and testing.

I would be super interested in participating.

Recently @sh0rez started a project to compare the grafana-agent and the Prometheus-agent performance in collecting metrics. Since its quite flexible, it wasn't to hard to extend it to include the open telemetry collector. Maybe its beneficial for this project, happy to chat about it.

@cartersocha
Copy link
Contributor Author

Would love to see the data / results or hear about any testing done here @frzifus. Thanks for being willing to share your work 😎

@cartersocha
Copy link
Contributor Author

Added a charter to the proposal as @alolita suggested.

@ocelotl
Copy link

ocelotl commented Aug 22, 2023

👍

@vielmetti
Copy link

Looking forward to seeing this go forward! cc @tobert

@cartersocha
Copy link
Contributor Author

hey @frzifus @sh0rez @harshita19244 @gsoria @brettmc we now have bare metal machines to run tests on. I wasn't sure how to add all of you on slack but we're in the CNCF slack otel-benchmarking channel.

https://cloud-native.slack.com/archives/C05PEPYQ5L3

@jack-berg
Copy link
Member

In java we've taken performance fairly seriously, and continue to make improvements as we receive feedback. For example, we received an issue about a use case in which millions of distinct metric series may need to be maintained in memory, and feedback that the SDK at the time would produce problematic memory churn. Since receiving, we worked to reduce metric memory allocation by 80%, and there is work in progress to reduce it by 99.9% (essentially zero memory allocations after the metric SDK reaches a steady state). We also have performance test suites for many sensitive areas and validate that changes to sensitive areas don't degrade performance.

All this is to say that I believe we have a decent performance story today.

However, where I think we could improve is in documentation for the performance to point curious users to. Our performance test suites require quite a bit of context to run and interpret the results. It would be great if we could extend the spec performance benchmark document to include high level descriptions of some use cases for each signal, and to provide tooling to be able to run and publish performance results to some central location.

If the above was available, we would have some nice material to point users to who are evaluating the project. We would still keep the nuanced performance tests around for sensitive areas, but it would be good to have something simpler / higher level.

In general, I think performance engineering is going to be very language / implementation dependent. I would caution against too expansive of a scope for a cross-language performance group. It would be great to provide some documentation of use cases to evaluate in suites, and tooling for running on bare metal / publishing results. But there are always going to be nuanced language specific concerns. I think we should raise those issues with the relevant SIGs, and let those maintainers / contributors work out solutions.

@jack-berg jack-berg reopened this Sep 10, 2023
@reyang
Copy link
Member

reyang commented Sep 14, 2023

I have similar position with @jack-berg.

Taking OpenTelemetry .NET as an example, performance has been taken care of seriously from the beginning:

Thinking about what could potentially benefit OpenTelemetry .NET, having some perf numbers published to an official document on opentelemetry.io across all programming languages might increase the discoverability.

@cartersocha
Copy link
Contributor Author

Thanks for the context all. @jack-berg could you share where the java tests are published and what compute they run on? @reyang could you share what compute you rely on in dotnet and consider migrating the test results to the otel website like the collector does?

@jack-berg
Copy link
Member

The tests are scattered throughout the repo in directories next to the source they evaluate. All the directories contain "jmh". I wrote a quick little script to find them all:

find . -type d | grep "^.*\/jmh$" | grep -v ".*\/build\/.*"

# Results
./context/src/jmh
./exporters/otlp/all/src/jmh
./exporters/otlp/common/src/jmh
./extensions/trace-propagators/src/jmh
./extensions/incubator/src/jmh
./sdk/metrics/src/jmh
./sdk/trace/src/jmh
./sdk/logs/src/jmh
./api/all/src/jmh

They run on each developers local machine, and only on request. The basic idea is that maintainers / approvers know which areas of the code are sensitive and have JMH test suites. When someone opens a PR which we suspect has performance implications, we ask them to run the performance suite before and after and compare the results (example). Its obviously imperfect, but has generally been fine.

It would be good if there was an easy way to run a subset of these on stable compute and publish the results to a central place. I think running / publishing all of them might be overwhelming.

@cartersocha
Copy link
Contributor Author

Makes sense. Thanks for sharing those details. Let me start a thread in the cncf slack to coordinate machine access

@cwegener
Copy link

cwegener commented Sep 20, 2023

A random find that I just stumbled across. K6 extension for generating OTEL signals created by an ING Bank engineer https://github.com/thmshmm/xk6-opentelemetry

I'm not sure what the guidelines on usage of 3rd party tooling are for the Performance Benchmarking SIG.

@cartersocha
Copy link
Contributor Author

Thanks for sharing @cwegener! The guidelines are to be defined so we’ll see but the general preference is for community tooling (which can also be donated). We’re a decentralized project and each language has its quirks so whatever guidelines that would be defined would be more of a baseline. If you think this approach would be generally beneficial we’d love to hear more. Feel free to cross post in the #otel-benchmarking channel

@cwegener
Copy link

If you think this approach would be generally beneficial we’d love to hear more.

I will test drive the k6 extension myself a little bit and report back in Slack.

@tedsuo
Copy link
Contributor

tedsuo commented Sep 27, 2023

@cartersocha do you mind converting this issue to a PR? We are now placing proposals here: https://github.com/open-telemetry/community/tree/main/projects

@trask
Copy link
Member

trask commented Sep 17, 2024

@cartersocha do you mind converting this issue to a PR? We are now placing proposals here: https://github.com/open-telemetry/community/tree/main/projects

@cartersocha @jpkrohling @tylerbenson is this SIG ongoing currently, or should we close / turn this into project proposal PR until folks are ready to move forward

@tylerbenson
Copy link
Member

While the infrastructure is still in place and can continue to be used by individual SIGs, I don't think we got volunteers from anyone outside of Lightstep and we've all moved along to other projects. I think the proposal can be closed.

@trask trask closed this as completed Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests