Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics API #68

Closed
wants to merge 18 commits into from
Closed

Metrics API #68

wants to merge 18 commits into from

Conversation

lzchen
Copy link
Contributor

@lzchen lzchen commented Jul 30, 2019

This PR adds the Metrics API package.

this API takes information from:

Example usage for raw measurement:

_meter = meter
measure = _meter.create_measure("cpu_usage", "cpu usage over time", "percentage", MeasureType.FLOAT)

measurements = []
for i in range(100):
    measurements.append(measure.createMeasurement(psutil.cpu_percent()))
    time.sleep(1)

_meter.record(measurements, distributed_context=DistributedContext.get_current())

Example usage for already aggregated metrics:

_meter = meter
label_keys = [LabelKey("environment", "the environment the application is running in")]
memory_metric = _meter.create_int_gauge("available_memory", "available memory over time", "bytes", label_keys)
label_values = ["Testing"]
memory_metric.setCallBack(lambda: memory_metric.getOrCreateTimeSeries(label_values).set(psutil.virtual_memory().available))

Example usage for simple pre-defined aggregation metrics:

_meter = meter
label_keys = [LabelKey("environment", "the environment the application is running in")]
sum_metric = _meter.create_int_counter("sum numbers", "sum numbers over time", "number", label_keys)
label_values = ["Testing"]
sum_time_series = sum_metric.getOrCreateTimeSeries(label_values)

for i in range(100):
    sum_time_series.add(i)

Comments for Meter

More comments

Add more comments

Fix typos
@reyang
Copy link
Member

reyang commented Jul 30, 2019

Please sign the CLA.

Copy link
Member

@toumorokoshi toumorokoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks like it provides everything that was laid out in the spec, which is amazing!

I think along the lines of open-telemetry/opentelemetry-specification#165, we can consider an API that provides the capabilities necessary but ensures the user experience is idiomatic python and succinct. If I understand correctly, the code I would need to create and record a value today is:

ENVIRONMENT_LABEL = LabelKey("environment", "the environment the app is running in")
measure = meter().create_long_counter("request-count", options=MeasureOptions(
    label_keys=[ENVIRONMENT_LABEL]
))
meter().record(
   [measure.create_long_measurement(10)], # not sure how to add label values
   options=RecordOptions(distributed_context=DistributedContext.get_current())
)

Which feels a bit unwieldy (I also may be completely misunderstanding how the APIs should be used here). Many metrics libraries generally create a measure and then record the measure, such as:

ENVIRONMENT_LABEL = LabelKey("environment", "the environment the app is running in")
measure = meter().create_long_counter("request-count", options=MeasureOptions(
    label_keys=[ENVIRONMENT_LABEL]
))
measure.record(10, ["staging"])  # handles creating a measurement object and recording it.

This also makes the measures significantly more powerful, which allows them to be carried around as app variables that might be useful later (e.g. in flask):

app.total_executed_jobs = meter().create_long_counter(...)

def handle_foo():
    app.total_executed_jobs.record(1)

Something that is totally possible to accomplish based on the SDK implementations. But I'm hard-pressed to imagine why you wouldn't want a metrics API to expose that sort of an interface.

opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
A :class:`.Measure`
"""

def record(self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does this play into the measures? I'm thinking through the interface and I see something like:

measure = meter().create_double_counter("foobar", label_keys=[ENVIRONMENT])
measure.create_double_measurement(10, "staging")

I see that the contexts are not an explicit argument, although it would be good to include those labels. I guess that would be possible depending on the implementation of the measure?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a long conversation about static labels at open-telemetry/oteps#4, not sure if it answers your question though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth discussing whether we need constant labels for measurements since distributed_context is not explicit. I'll bring it up in the meeting.

opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
A new :class:`.Measurement`
"""

def create_long_measurement(self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these apis do not provide a way to add the required label_keys as specified in the MetricOptions, which state that it is defining required labels.

Looking at the spec it doesn't call out a required argument for the label values, but it feels like that's needed here.

Copy link
Contributor Author

@lzchen lzchen Aug 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken from the spec, for measurements, I believe the way to added the required label values (or dimensions) would be through passing in an explicit distributed_context or from the current context itself. This is done not at the creation of the measurement level but when the measurements are recorded, which is why the creating of measurements do not have a way to pass in specific label values.

opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
@toumorokoshi
Copy link
Member

Other general thought: is there strong value in not having standard behavior here?

There's a lot of complex logic that would have to be re-implemented if someone were to choose a separate sdk. Specifically:

  • the strategy by which spans and context data is propagated to the measurement
  • the behavior around adding and removing from the time series
  • the way that recorded measurements are enqueued to be processed.

It feels like to me having a defined way that all of this works would reduce confusion if people plugged in different sdks. But maybe the implicit assumption is that people will almost always consume the sdk along with the API, and thus feel confident about consistent behavior by that convention.


def __init__(self,
value: str) -> None:
self.value = value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just use a string for this instead of introducing a class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing a class gives flexibility in the future if we ever want to add anything to LabelValue.

Copy link
Member

@c24t c24t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far! I took a first pass, but there's a lot here still to consider. I'm in favor of losing the options classes, losing the constructors, and splitting this up into multiple modules.

There are some problems rendering the docs, you may want to cherry-pick edbc34a and try generating the docs as you're writing them.

opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
A :class:`.Measure`
"""

def record(self,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a long conversation about static labels at open-telemetry/oteps#4, not sure if it answers your question though.


def create_measure(self,
name: str,
options: typing.Optional['MeasureOptions'] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the type (int or float) an option here when there are separate methods for int/float counters and gauges?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the specification, a measurement has both createDouble and createLong methods, irregardless of the type of the measure that was used to create the measurement. If we remove the type in MeasureOptions, there will be no validation logic between the measure_type and the type of measurement being created. I think we want to prevent a measure to be able to create any type of measurement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to me like the spec was written to match the java client, and as far as I can tell there's no reason for (java's) Measure to have both createDoubleMeasurement and createLongMeasurement.

I'd imagine something like this instead:

class Meter:
    def create_double_measure(...)
    def create_long_measure(...)

class DoubleMeasure(Measure):
    def create_measurement(value)
        # returns a double-valued Measurement

class LongMeasure(Measure):
    def create_measurement(value)
        # returns a long-valued Measurement

and possibly replacing Measurement with separate DoubleMeasurement and LongMeasurement classes.

I don't mean to suggest deviating from the spec. The spec is underdefined here and writing the python client should help us generate spec changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I was focused too much on following the spec. I also think the ability to create both types of measurements and having to validate the type of measure is redundant as well. I will make these changes and propose them in the spec.

opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
self.resource = resource


class MeasureType:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully this is one of those ugly things we only need in javascript...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should have some other way of representing the type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something here, but ideally we represent differently typed metrics with different classes.

opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
@lzchen
Copy link
Contributor Author

lzchen commented Jul 31, 2019

I signed it

@lzchen
Copy link
Contributor Author

lzchen commented Aug 6, 2019

@c24t @toumorokoshi
In regards to the repetition of the TimeSeries class for each metric, I created two classes CounterTimeSeries and GaugeTimeSeries to use instead. The behavior is that they can each accept any value (float or int) and then it will be up to the implementation to check for validation. It separates the different logic of add/set for counter vs gauge.

@lzchen
Copy link
Contributor Author

lzchen commented Aug 6, 2019

@toumorokoshi I've posted some sample code on how measures and metrics would be used in a very simple case (in the PR description). Feel free to make some comments and questions!

@c24t
Copy link
Member

c24t commented Aug 7, 2019

@lzchen the examples would be useful in the repo too, and then we can comment on them in this PR.

@Oberon00
Copy link
Member

Oberon00 commented Aug 8, 2019

Just linking #48

"""Used to create raw :class:`.FloatMeasurement` s."""

def create_measurement(self,
value: int,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Float here.

Copy link
Member

@c24t c24t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments from in-person review with @lzchen and @reyang.

opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
:class:`.Metric` s are used for recording pre-defined aggregation, or
already aggregated data. This should be used to report metrics like
cpu/memory usage, in which the type of aggregation is already defined, or
simple metrics like "queue_length".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question for the spec: where do we define the aggregation behavior a la views in OC?

opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
opentelemetry-api/src/opentelemetry/metrics/label_key.py Outdated Show resolved Hide resolved
import typing


class CounterTimeSeries:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note from IRL conversation: we should probably revisit this class structure, decide if we need separate TS for each type combination, whether we need measurement types before export.

Copy link
Member

@toumorokoshi toumorokoshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the examples! I think there is still some more simplification that can be done here. But It's not clear to me how much that will violate what the API that is called for in OpenTelemetry would be.

Specifically, I'm thinking about the user interface exposing these six options:

  • float gauge
  • float counter
  • float measure
  • int gauge
  • int counter
  • int measure

The measures themselves I don't see a lot of value for, they need to work in tandem with some sort of aggregator (which OT calls timeseries, which IMO is a little confusing) to actually produce a measurement that will be enqueued and sent to exporters. I can imagine a measurement being a primitive that Gauge / Counters use, but a consumer will probably not use them directly.

Also not sure about the value of calling out the type in the method signature. A similar flexibility could be done by just passing the type into the factory:

create_counter(name, description, cls=int)

Which would further reduce API definitions for both int and float timeseries.

I think there's a lot of work here that would be best to be contributed back to the OT spec. I think it continues to highlight some issues that are worth hashing out across implementations.

opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
@@ -11,3 +11,14 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.


class LabelValue:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this just be a string?

Copy link
Contributor Author

@lzchen lzchen Aug 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we ever needed to add additional fields to LabelValue, we could do so if we used a class without introducing breaking changes. However, label values might be fixed as strings so making them strings might be appropriate.

opentelemetry-api/src/opentelemetry/metrics/__init__.py Outdated Show resolved Hide resolved
def create_int_counter(self,
name: str,
description: str,
unit: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

random question: why is "unit" it's own field when it could be identified as a label_key?

I feel like not everything measured needs a unit to help understand the value. And even if it did, I don't believe many metric storage systems have the unit as a field they store.

For example, I believe graphite or prometheus would store the "unit" key as a tag or a part of the metric name.

"number",
LABEL_KEYS)
LABEL_VALUES = ["Testing"]
SUM_TIME_SERIES = SUM_METRIC.getOrCreateTimeSeries(LABEL_VALUES)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the example! Things are getting a little clearer for me.

What is the value for having the workflow of creating a metric, then creating a time series?

It feels to me like the metric is not valuable without the time series included. Generally, when emitting metrics, I think of a single measurement as an event that writes to a stream, that then gets aggregated by whatever queue system and submitted to the exporter.

If I was a user, I would like things to be as simple as:

METER = Meter()
LABEL_KEYS = [LabelKey("environment", 
                       "the environment the application is running in")]
SUM_METRIC = METER.create_int_counter("sum numbers", 
                                      "sum numbers over time",
                                      "number",
                                      LABEL_KEYS)
SUM_METRIC.record(10)

@lzchen lzchen mentioned this pull request Aug 14, 2019
@lzchen
Copy link
Contributor Author

lzchen commented Aug 14, 2019

Continued with [#87 ]

@lzchen lzchen closed this Aug 15, 2019
srikanthccv pushed a commit to srikanthccv/opentelemetry-python that referenced this pull request Nov 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants