-
Notifications
You must be signed in to change notification settings - Fork 810
feat(reexecute/c): decouple metrics server and collector #4415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The When running locally with
When running locally with
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR decouples the metrics server and collector functionality by replacing the boolean METRICS_ENABLED
parameter with a more granular METRICS_MODE
parameter. This allows users to run only the Prometheus server without the collector, enabling access to VM metrics while using their own monitoring stack.
- Introduces a new
metricsMode
type with three values:disabled
,server-only
, andfull
- Replaces
METRICS_ENABLED
withMETRICS_MODE
across all configuration files and scripts - Updates the
collectRegistry
function to conditionally start the collector based on the metrics mode
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
tests/reexecute/c/vm_reexecute_test.go | Implements the new metricsMode type with validation and updates function signatures |
tests/reexecute/c/README.md | Updates documentation to explain the three metrics mode options |
scripts/benchmark_cchain_range.sh | Changes flag from metrics-enabled to metrics-mode |
Taskfile.yml | Updates default value from "false" to "disabled" for the new metrics mode |
.github/actions/c-chain-reexecution-benchmark/action.yml | Sets CI to use "full" metrics mode instead of boolean true |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
func (m *metricsMode) Set(s string) error { | ||
s = strings.ToLower(strings.TrimSpace(s)) | ||
|
||
switch s { | ||
case "disabled": | ||
*m = MetricsDisabled | ||
case "server-only": | ||
*m = MetricsServerOnly | ||
case "full": | ||
*m = MetricsFull | ||
default: | ||
return fmt.Errorf("invalid metrics mode: %s (valid options: disabled, server-only, full)", s) | ||
} | ||
return nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just use a string here and perhaps an alias type rather than implementing Set
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified the metricsMode
type here: a7cb056
if metricsMode.shouldStartServer() { | ||
collectRegistry(b, log, "c-chain-reexecution", prefixGatherer, labels, metricsMode.shouldStartCollector()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we clean this up a little bit? It seems odd that we decompose metricsMode
into two separate booleans and use one to handle the if condition here and the other half of it as an argument that gets passed in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaned up the logic for starting the metrics server vs starting the metrics server and collector here: 9e319d0
tests/reexecute/c/README.md
Outdated
- `METRICS_MODE=disabled`: no metrics are available. | ||
- `METRICS_MODE=server-only`: starts a Prometheus server exporting VM metrics. A | ||
link to the metrics endpoint is logged during execution. | ||
- `METRICS_MODE=full`: starts both a Prometheus server exporting VM metrics and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given these names are not very self-explanatory (it's not clear what server-only and full refer to without the descriptions), I think it would be better to simply configure the two separately and require that if grafana is enabled, then the prometheus server must be enabled as well.
It's fine imo for the metrics server to be enabled and grafana disabled by default since that won't require setting any extra credentials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 5160738
If running locally, metrics collection can be customized via the following parameters: | ||
|
||
- `METRICS_SERVER_ENABLED`: starts a Prometheus server exporting VM metrics. | ||
- `METRICS_COLLECTOR_ENABLED`: starts a Prometheus collector (if enabled, then `METRICS_SERVER_ENABLED` must be enabled as well). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this just implicit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On it's own, I'm not opposed to METRICS_COLLECTOR_ENABLED=true
implicitly setting METRICS_SERVER_ENABLED=true
as well. However, considering that this PR will be followed up by #4418 (which adds the ability to configure a port for the metrics server), I think this becomes confusing (i.e. it isn't clear what happens if METRICS_COLLECTOR_ENABLED=true
and METRICS_PORT=X
without reading the description of METRICS_COLLECTOR_ENABLED
).
This could be fixed by renaming METRICS_COLLECTOR_ENABLED
to METRICS_SERVER_AND_COLLECTOR_ENABLED
, but this looks similar to a previous iteration of this PR which received this review comment: #4415 (comment)
Why this should be merged
As mentioned in #4362, splitting up the Prometheus server and collector is ideal for clients of Firewood who want access to VM metrics but would prefer to use their own monitoring stack.
Although #4362 also discusses the option of hardcoding the metrics server port, I've opted to split this out into a separate PR.
How this works
Changes the
METRICS_ENABLED
parameter toMETRICS_MODE
, which has the following three options:disabled
: does not start both the Prometheus server and collectorserver-only
: starts only the Prometheus serverfull
: starts both the Prometheus server and collectorI've opted for keeping a single environment variable rather than splitting
METRICS_ENABLED
since there should never be a case for starting the Prometheus collector but not the Prometheus server. In either theserver-only
orfull
cases, a metrics endpoint or Grafana URL is printed out, respectively.How this was tested
CI
Need to be documented in RELEASES.md?
No