Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics SDK: Filtering metrics attributes #1191

Merged
merged 19 commits into from
Feb 5, 2022

Conversation

lalitb
Copy link
Member

@lalitb lalitb commented Feb 2, 2022

Fixes

#1190

Changes

As per the specs, Views should be able to configure the filtering of attributes reported on metrics, including removing all the attributes. This PR adds a filter attribute-processor for this. The list of attributes to be included in the metrics can be specified through this filter.

Please provide a brief description of the changes here.

For significant contributions please make sure you have completed the following items:

  • CHANGELOG.md updated for non-trivial changes
  • Unit tests have been added
  • Changes in public API reviewed

@lalitb lalitb requested a review from a team February 2, 2022 02:33
@lalitb lalitb changed the title Filter attributes Filtering metrics attributes Feb 2, 2022
@lalitb lalitb changed the title Filtering metrics attributes Metrics SDK: Filtering metrics attributes Feb 2, 2022
@codecov
Copy link

codecov bot commented Feb 2, 2022

Codecov Report

Merging #1191 (fa0b7e2) into main (e1b4a49) will increase coverage by 0.06%.
The diff coverage is 97.06%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1191      +/-   ##
==========================================
+ Coverage   93.29%   93.35%   +0.06%     
==========================================
  Files         174      177       +3     
  Lines        6404     6502      +98     
==========================================
+ Hits         5974     6069      +95     
- Misses        430      433       +3     
Impacted Files Coverage Δ
...clude/opentelemetry/sdk/common/attributemap_hash.h 78.58% <78.58%> (ø)
...include/opentelemetry/sdk/common/attribute_utils.h 98.08% <100.00%> (+0.86%) ⬆️
...ntelemetry/sdk/metrics/view/attributes_processor.h 81.25% <100.00%> (+81.25%) ⬆️
sdk/test/common/attribute_utils_test.cc 100.00% <100.00%> (ø)
sdk/test/common/attributemap_hash_test.cc 100.00% <100.00%> (ø)
sdk/test/metrics/attributes_processor_test.cc 100.00% <100.00%> (ø)

@@ -11,13 +11,25 @@ namespace metrics
{
using MetricAttributes = opentelemetry::sdk::common::AttributeMap;

/**
* The AttributesProcessor is responsible for customizing which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this can be made internal/private to the SDK for now.

The specification only allows very limited manipulation on attributes (e.g. remove certain attributes from the measurement), this generic solution could work but it comes with perf cost (e.g. the virtual AttributesProcessor::process call).

Doesn't have to be a blocker though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I do suspect some perf cost associated with virtual and attributes copy. I have added a benchmark test to get cost statistics and would be helpful if have to consider performance improvement.

Also, I realized that we need to calculate the hash on the attribute map to store metrics data. Have added functionality to calculate the hash based on std::hash and boost::hash_combine`. As hash should be the same irrespective of the order of key/values in the AttributeMap, I have changed the internal storage for AttributeMap from unordered_map to map. This way we can avoid sorting of keys for every measurement.

Let me see how can we make this class internal/private to SDK, as C++ doesn't have a direct way of doing it. We can make AttributeProcessor::process() as private, and other SDK classes as its friend to it but would like to see a better approach if possible.

@reyang
Copy link
Member

reyang commented Feb 2, 2022

Doesn't have to be in this PR, we need to understand the performance characteristics (e.g. heap allocation, memory fragmentation, contention if there is any, etc.) in order to guide the SDK design/implementation.

Here are some general guidelines regarding metrics memory usage https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/supplementary-guidelines.md#memory-management.

Consider two types of performance tests:

  1. benchmark - which tells us when we call instrument.Add/Record, what's the heap allocation in bytes and CPU cycles for each call, and what's the contribution from each component (provider, processor/reader, exporter)
  2. stress test - while running the SDK and utilize all the CPU cores to emit as much as of measurements to the SDK (and probably scrape from the pull exporter at the same time), what's the total amount of memory usage, do we see stable calls/second, what's the scalability on SMP when the number of cores grow, etc.

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to unblock this so we can get a working prototype end-to-end.

@lalitb
Copy link
Member Author

lalitb commented Feb 4, 2022

Doesn't have to be in this PR, we need to understand the performance characteristics (e.g. heap allocation, memory fragmentation, contention if there is any, etc.) in order to guide the SDK design/implementation.

Here are some general guidelines regarding metrics memory usage https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/supplementary-guidelines.md#memory-management.

Consider two types of performance tests:

  1. benchmark - which tells us when we call instrument.Add/Record, what's the heap allocation in bytes and CPU cycles for each call, and what's the contribution from each component (provider, processor/reader, exporter)
  2. stress test - while running the SDK and utilize all the CPU cores to emit as much as of measurements to the SDK (and probably scrape from the pull exporter at the same time), what's the total amount of memory usage, do we see stable calls/second, what's the scalability on SMP when the number of cores grow, etc.

Thanks, this is useful. We will discuss these performance attributes in our community meeting and will come up with plan to measure them in a better way.

@lalitb lalitb merged commit b6a28df into open-telemetry:main Feb 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants