chore(metrics): restrict `metrics` crate usage to `astria-telemetry` by Fraser999 · Pull Request #1192 · astriaorg/astria

Fraser999 · 2024-06-18T12:42:39Z

Summary

This PR introduces several types in astria-telemetry to support building and recording metrics in our other crates.

Background

There are a few reasons driving the design here:

I wanted to remove the third-party metrics crate as a direct dependency from all our crates except astria-telemetry so that its macros aren't available in our other crates. This precludes accidental addition of a new metric in the future which isn't properly initialized at app startup.
There's no possibility of a description for a given metric being ignored due to an accidental mismatch in the metrics macros being used (e.g. we previously had a case of describe_gauge! being used with a counter! meaning the description was ignored)
We can set buckets on a per-histogram basis
We can set up metrics in our black box tests without requiring an actual http server to be running. Configuring the metrics crate currently doesn't support either passing in an already-bound listener (where we could bind to port 0 and find out what actual port was used before giving the listener to metrics) or querying the running server in metrics to find out the port. This makes reliable testing using an actual http server essentially impossible. With the changes in this PR, the black box tests will retain a handle to the underlying exporter, meaning snapshots can be rendered which yield the same view of the metrics as if the http endpoint had been called.

Changes

Added newtypes for Counter, Gauge and Histogram.
Added a Metrics trait, implemented by each of the crates' Metrics structs to support registering the metrics via telemetry::Config::try_init.
Added metrics::ConfigBuilder to support configuring and initializing metrics outside of telemetry::Config::try_init (for use in black box tests).
Added metrics::RegisteringBuilder to support registering individual metrics via the Metrics trait. To register a metric, this builder returns an appropriate factory, and the factory is then used potentially multiple times to register and return the individual metric with or without groups of labels applied.
Added metrics::BucketBuilder to support registering histogram buckets via the Metrics trait. This is unfortunately a separate builder since the metrics crate doesn't support registering buckets at the same point as registering the histograms. The former must happen via the PrometheusBuilder, but the latter can only be done once the PrometheusBuilder has been consumed and converted to a PrometheusRecorder. I added checks to ensure that buckets set via the BucketBuilder are all related to actual histograms registered later.
Changed the telemetry::Config setters to use a consistent set_ naming convention and merged the two metrics-related setters into a single one, since now we require either both or neither.

Note that these changes aren't mutually exclusive to using the raw metrics crate macros. So while we want to discourage usage in our own crates, dependencies using the macros will continue to function as before assuming we have configured metrics to use the global recorder. Using telemetry::Config::try_init does enable the global recorder, whereas we disable it in our black box tests so they don't conflict.

Testing

Manually checked metrics are still available right after startup, and manually checked the handle provided to black box tests functions as expected. However, proper testing of metrics should now be possible and I'd like to tackle that in follow-up PRs.

Related Issues

Closes #1147.

## Summary This adds further metrics to the sequencer. ## Background This should help diagnose block production slowdown when the sequencer is stress-tested. ## Changes - Added metrics (see below for list). - Enabled `cnidarium` metrics. Note that all histograms are still rendered as Prometheus summaries for now. I have [an open PR](#1192) which will make it simple to provide buckets for histograms, after which they will be rendered as true histograms. ## Testing Testing will likewise be relatively simple once #1192 is merged. In the meantime, I ran the smoke test using the code in this PR and manually checked the new metrics are available and appear sane. ## Metrics - Added `astria_sequencer_check_tx_duration_seconds` histograms with the following labels: - `length check and parse raw tx` - `stateless check` - `nonce check` - `chain id check` - `balance check` - `check for removal` - `insert to app mempool` - Added `astria_sequencer_actions_per_transaction_in_mempool` histogram - Added `astria_sequencer_transaction_in_mempool_size_bytes` histogram - Added `astria_sequencer_transactions_in_mempool_total` gauge - Enabled `cnidarium_get_raw_duration_seconds` histogram - Enabled `cnidarium_nonverifiable_get_raw_duration_seconds` histogram ## Related Issues Closes #1247.

SuperFluffy

I am very sorry this review took so long.

I have to say I am somewhat ambivalent with us taking such a big detour from "normal" Rust metric handling using the global registry and macros only. It's annoying that we end up threading the &'static Metric through every corner of our stack.

On the other hand I understand and welcome that the invidiual metric types are handled more strictly.

crates/astria-bridge-withdrawer/src/metrics.rs

crates/astria-telemetry/src/metrics/into_f64.rs

SuperFluffy · 2024-08-30T13:59:15Z

crates/astria-telemetry/src/metrics/builders.rs

+
+    /// Enables or disables setting the global metrics recorder.
+    #[must_use]
+    pub fn with_global_recorder(mut self, use_global_recorder: bool) -> Self {


bikeshed: for builders, I have used the convention to use e.g. fn global_recorder(mut self) to activate it, but set_global_recorder(mut self, use_global_recorder: bool) to take arguments (same for the other setters).

I think it's a little unconventional to have two ways of doing the same thing in a builder, and a method like global_recorder looks more like a getter than anything else to me. Happy to change the prefix with_ to set_ though :)

Done in 7432b9c.

crates/astria-telemetry/src/metrics/builders.rs

SuperFluffy · 2024-08-30T14:08:38Z

crates/astria-telemetry/src/metrics/builders.rs

+            metrics::set_global_recorder(registering_builder.recorder)
+                .map_err(|_| Error::GlobalMetricsRecorderAlreadySet)?;


Does this need a https://doc.rust-lang.org/stable/std/sync/struct.LazyLock.html similar to what we do when setting up tracing in tests?

I don't know how metrics behaves if one tries to register multiple global recorders - tracing panics. Maybe we can provide an extra flag use_test_protection (or something less asinine) that ensures that no more than 1 global recorder is set?

(In services it's IMO desirable to just panic because one should not have more than 1 global recorder).

It does return an error if we try to register a global recorder more than once. The error is being returned from from astria_telemetry::Config::try_init in the same way as for a tracing init error, so we're covered that way for production code.

In all our tests (where we use the LazyLock for the telemetry config) we're not setting the metrics up as part of the astria_telemetry::configure() flow, since we want to avoid using the global recorder. So in all tests where your question applies, we're doing something like

let (metrics, metrics_handle) = metrics::ConfigBuilder::new() .set_global_recorder(false) .build(&()) .unwrap();

meaning we won't have an error returned since we're not setting the global recorder.

* main: chore: ibc e2e smoke test (#1284) chore(metrics): restrict `metrics` crate usage to `astria-telemetry` (#1192) fix(charts)!: sequencer-relayer chart correct startup env var (#1437) chore(bridge-withdrawer): Add instrumentation (#1324) chore(conductor): Add instrumentation (#1330) fix(cli, bridge-withdrawer): dont fail entire block due to bad withdraw event (#1409) feat(sequencer, bridge-withdrawer)!: enforce withdrawals consumed (#1391)

…1192) ## Summary This PR introduces several types in `astria-telemetry` to support building and recording metrics in our other crates. ## Background There are a few reasons driving the design here: * I wanted to remove the third-party `metrics` crate as a direct dependency from all our crates except `astria-telemetry` so that its macros aren't available in our other crates. This precludes accidental addition of a new metric in the future which isn't properly initialized at app startup. * There's no possibility of a description for a given metric being ignored due to an accidental mismatch in the `metrics` macros being used (e.g. we previously had a case of `describe_gauge!` being used with a `counter!` meaning the description was ignored) * We can set buckets on a per-histogram basis * We can set up metrics in our black box tests without requiring an actual http server to be running. Configuring the `metrics` crate currently doesn't support either passing in an already-bound listener (where we could bind to port 0 and find out what actual port was used before giving the listener to `metrics`) or querying the running server in `metrics` to find out the port. This makes reliable testing using an actual http server essentially impossible. With the changes in this PR, the black box tests will retain a handle to the underlying exporter, meaning snapshots can be rendered which yield the same view of the metrics as if the http endpoint had been called. ## Changes * Added newtypes for `Counter`, `Gauge` and `Histogram`. * Added a `Metrics` trait, implemented by each of the crates' `Metrics` structs to support registering the metrics via `telemetry::Config::try_init`. * Added `metrics::ConfigBuilder` to support configuring and initializing metrics outside of `telemetry::Config::try_init` (for use in black box tests). * Added `metrics::RegisteringBuilder` to support registering individual metrics via the `Metrics` trait. To register a metric, this builder returns an appropriate factory, and the factory is then used potentially multiple times to register and return the individual metric with or without groups of labels applied. * Added `metrics::BucketBuilder` to support registering histogram buckets via the `Metrics` trait. This is unfortunately a separate builder since the `metrics` crate doesn't support registering buckets at the same point as registering the histograms. The former must happen via the `PrometheusBuilder`, but the latter can only be done once the `PrometheusBuilder` has been consumed and converted to a `PrometheusRecorder`. I added checks to ensure that buckets set via the `BucketBuilder` are all related to actual histograms registered later. * Changed the `telemetry::Config` setters to use a consistent `set_` naming convention and merged the two metrics-related setters into a single one, since now we require either both or neither. Note that these changes aren't mutually exclusive to using the raw `metrics` crate macros. So while we want to discourage usage in our own crates, dependencies using the macros will continue to function as before assuming we have configured metrics to use the global recorder. Using `telemetry::Config::try_init` does enable the global recorder, whereas we disable it in our black box tests so they don't conflict. ## Testing Manually checked metrics are still available right after startup, and manually checked the handle provided to black box tests functions as expected. However, proper testing of metrics should now be possible and I'd like to tackle that in follow-up PRs. ## Related Issues Closes #1147.

## Summary This adds further metrics to the sequencer. ## Background This should help diagnose block production slowdown when the sequencer is stress-tested. ## Changes - Added metrics (see below for list). - Enabled `cnidarium` metrics. Note that all histograms are still rendered as Prometheus summaries for now. I have [an open PR](astriaorg/astria#1192) which will make it simple to provide buckets for histograms, after which they will be rendered as true histograms. ## Testing Testing will likewise be relatively simple once #1192 is merged. In the meantime, I ran the smoke test using the code in this PR and manually checked the new metrics are available and appear sane. ## Metrics - Added `astria_sequencer_check_tx_duration_seconds` histograms with the following labels: - `length check and parse raw tx` - `stateless check` - `nonce check` - `chain id check` - `balance check` - `check for removal` - `insert to app mempool` - Added `astria_sequencer_actions_per_transaction_in_mempool` histogram - Added `astria_sequencer_transaction_in_mempool_size_bytes` histogram - Added `astria_sequencer_transactions_in_mempool_total` gauge - Enabled `cnidarium_get_raw_duration_seconds` histogram - Enabled `cnidarium_nonverifiable_get_raw_duration_seconds` histogram ## Related Issues Closes #1247.

Fraser999 requested a review from a team as a code owner June 18, 2024 12:42

Fraser999 requested a review from SuperFluffy June 18, 2024 12:42

github-actions bot added conductor pertaining to the astria-conductor crate sequencer pertaining to the astria-sequencer crate sequencer-relayer pertaining to the astria-sequencer-relayer crate composer pertaining to composer labels Jun 18, 2024

restrict metrics crate usage to astria-telemetry

661cd58

Fraser999 force-pushed the fraser/wrap-metrics branch from eee59d5 to 661cd58 Compare June 18, 2024 12:44

Fraser999 added 2 commits June 20, 2024 11:09

Merge remote-tracking branch 'upstream/main' into fraser/wrap-metrics

130684d

Merge remote-tracking branch 'upstream/main' into fraser/wrap-metrics

1e5414b

Fraser999 mentioned this pull request Jul 8, 2024

chore(sequencer)!: add metrics #1248

Merged

Merge remote-tracking branch 'upstream/main' into fraser/wrap-metrics

d035c4a

Fraser999 mentioned this pull request Jul 15, 2024

ENG-33 chore(sequencer): add cache for immutable check-tx results #1271

Closed

Fraser999 added 3 commits July 15, 2024 15:29

Merge remote-tracking branch 'upstream/main' into fraser/wrap-metrics

908025f

Merge remote-tracking branch 'upstream/main' into fraser/wrap-metrics

2015474

Merge remote-tracking branch 'upstream/main' into fraser/wrap-metrics

1932ae7

Merge remote-tracking branch 'upstream/main' into fraser/wrap-metrics

5b893e6

SuperFluffy approved these changes Aug 30, 2024

View reviewed changes

Fraser999 added 2 commits August 30, 2024 22:21

Merge remote-tracking branch 'upstream/main' into fraser/wrap-metrics

93e6bdf

minor renames and re-export

7432b9c

Fraser999 enabled auto-merge August 30, 2024 21:59

Fraser999 added this pull request to the merge queue Aug 30, 2024

Merged via the queue into main with commit f251316 Aug 30, 2024

Fraser999 deleted the fraser/wrap-metrics branch August 30, 2024 22:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(metrics): restrict `metrics` crate usage to `astria-telemetry`#1192

chore(metrics): restrict `metrics` crate usage to `astria-telemetry`#1192
Fraser999 merged 10 commits intomainfrom
fraser/wrap-metrics

Fraser999 commented Jun 18, 2024 •

edited by sync-by-unito bot

Loading

Uh oh!

SuperFluffy left a comment

Uh oh!

Uh oh!

Uh oh!

SuperFluffy Aug 30, 2024

Uh oh!

Fraser999 Aug 30, 2024

Uh oh!

Fraser999 Aug 30, 2024

Uh oh!

Uh oh!

SuperFluffy Aug 30, 2024

Uh oh!

Fraser999 Aug 30, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		metrics::set_global_recorder(registering_builder.recorder)
		.map_err(\|_\| Error::GlobalMetricsRecorderAlreadySet)?;

Conversation

Fraser999 commented Jun 18, 2024 • edited by sync-by-unito bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Background

Changes

Testing

Related Issues

Uh oh!

SuperFluffy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

SuperFluffy Aug 30, 2024

Choose a reason for hiding this comment

Uh oh!

Fraser999 Aug 30, 2024

Choose a reason for hiding this comment

Uh oh!

Fraser999 Aug 30, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SuperFluffy Aug 30, 2024

Choose a reason for hiding this comment

Uh oh!

Fraser999 Aug 30, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fraser999 commented Jun 18, 2024 •

edited by sync-by-unito bot

Loading