-
Notifications
You must be signed in to change notification settings - Fork 8
Wire up Vert.x metrics in fetcher docker with Prometheus Otel reader #800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe updates enhance metrics configuration by supporting multiple metrics readers (HTTP OTLP and Prometheus) in the fetcher startup script and service launcher. HTTP/2 server settings in Changes
Sequence Diagram(s)sequenceDiagram
participant StartScript as fetcher/start.sh
participant JavaApp as Java Process
participant Launcher as ChrononServiceLauncher
participant Metrics as Metrics System
StartScript->>JavaApp: Pass METRICS_OPTS with reader type and endpoints
JavaApp->>Launcher: Start service with VertxOptions
Launcher->>Metrics: Get metrics reader type
alt Reader = "http"
Launcher->>Metrics: configureOtlpHttpMetrics()
else Reader = "prometheus"
Launcher->>Metrics: configurePrometheusMetrics()
else
Launcher->>Launcher: Log warning, disable metrics
end
Possibly related PRs
Suggested reviewers
Poem
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 30th. To opt out, configure 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
online/src/main/scala/ai/chronon/online/metrics/Metrics.scala (1)
131-140: Short-circuit onmetricsEnabledto avoid needless reporter parsingIf
metricsEnabledisfalse, we still parsemetricsReporterand may throw for an unknown value even though metrics are off. Returning a noop client immediately is safer.- val reporter: String = System.getProperty(MetricsReporter, "otel") - reporter.toLowerCase match { + if (!metricsEnabled) return new OtelMetricsReporter(OpenTelemetry.noop()) + + val reporter: String = System.getProperty(MetricsReporter, "otel") + reporter.toLowerCase match {service_commons/src/main/java/ai/chronon/service/ChrononServiceLauncher.java (2)
32-36: Follow Java constant naming conventions
VertxPrometheusPort/DefaultVertxPrometheusPortshould be upper-snake (VERTX_PROMETHEUS_PORT) for consistency with the rest of the file.
92-110: Expose scrape path / customise host
PrometheusHttpServerdefaults to/metricson0.0.0.0, while Vert.x embedded server useslocalhost. Aligning host/port or making both configurable prevents head-scratching in prod.
Example:- .setEmbeddedServerOptions(new HttpServerOptions().setPort(Integer.parseInt(prometheusPort))); + .setEmbeddedServerOptions( + new HttpServerOptions() + .setHost("0.0.0.0") + .setPort(Integer.parseInt(prometheusPort)));online/src/main/scala/ai/chronon/online/metrics/OtelMetricsReporter.scala (2)
88-100: Consider renamingMetricsReaderHttp→MetricsReaderOtlpHttpThe constant describes an OTLP/HTTP reader, not generic HTTP. Explicit naming avoids confusion with the Prometheus HTTP server introduced below.
102-120: Duplicate port property names may confuse operatorsWe now have:
ai.chronon.metrics.exporter.port(PrometheusHTTPServer)ai.chronon.vertx.metrics.exporter.port(Vert.x Prometheus)A comment or unified prefix would help users remember which one controls which endpoint.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro (Legacy)
📒 Files selected for processing (7)
docker/fetcher/start.sh(2 hunks)maven_install.json(13 hunks)online/src/main/scala/ai/chronon/online/metrics/Metrics.scala(1 hunks)online/src/main/scala/ai/chronon/online/metrics/OtelMetricsReporter.scala(1 hunks)service_commons/BUILD.bazel(1 hunks)service_commons/src/main/java/ai/chronon/service/ChrononServiceLauncher.java(3 hunks)tools/build_rules/dependencies/maven_repository.bzl(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- service_commons/BUILD.bazel
- tools/build_rules/dependencies/maven_repository.bzl
- maven_install.json
🚧 Files skipped from review as they are similar to previous changes (1)
- docker/fetcher/start.sh
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: service_tests
- GitHub Check: service_commons_tests
- GitHub Check: api_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: service_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: online_tests
- GitHub Check: online_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: flink_tests
- GitHub Check: api_tests
- GitHub Check: aggregator_tests
- GitHub Check: aggregator_tests
- GitHub Check: flink_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (1)
service_commons/src/main/java/ai/chronon/service/ChrononServiceLauncher.java (1)
50-59: Handle unknown reader more gracefullyWhen the reader value is unrecognised we log but silently disable Vert.x metrics, leaving
enableMetrics=true. Consider either:
- Failing fast (invalid config), or
- Falling back to OTLP to avoid running without metrics unexpectedly.
a5ba23c to
2f42eb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
service_commons/src/main/java/ai/chronon/service/ChrononServiceLauncher.java (1)
92-110: Prometheus metrics configurationCorrectly implements Prometheus metrics with embedded server. Consider adding error handling for port parsing.
private void configurePrometheusMetrics(VertxOptions options) { // Configure Prometheus using Micrometer's built-in registry String prometheusPort = System.getProperty(VertxPrometheusPort, DefaultVertxPrometheusPort); + int port; + try { + port = Integer.parseInt(prometheusPort); + } catch (NumberFormatException e) { + logger.warn("Invalid Prometheus port specified: {}. Using default: {}", prometheusPort, DefaultVertxPrometheusPort); + port = Integer.parseInt(DefaultVertxPrometheusPort); + } VertxPrometheusOptions promOptions = new VertxPrometheusOptions() .setEnabled(true) .setStartEmbeddedServer(true) - .setEmbeddedServerOptions(new HttpServerOptions().setPort(Integer.parseInt(prometheusPort))); + .setEmbeddedServerOptions(new HttpServerOptions().setPort(port)); + logger.info("Configuring Prometheus metrics server on port {}", port);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
docker/fetcher/start.sh(2 hunks)maven_install.json(13 hunks)online/src/main/scala/ai/chronon/online/metrics/Metrics.scala(1 hunks)online/src/main/scala/ai/chronon/online/metrics/OtelMetricsReporter.scala(1 hunks)service/src/main/java/ai/chronon/service/FetcherVerticle.java(2 hunks)service_commons/BUILD.bazel(1 hunks)service_commons/src/main/java/ai/chronon/service/ChrononServiceLauncher.java(3 hunks)tools/build_rules/dependencies/maven_repository.bzl(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- service_commons/BUILD.bazel
- tools/build_rules/dependencies/maven_repository.bzl
- service/src/main/java/ai/chronon/service/FetcherVerticle.java
- docker/fetcher/start.sh
- online/src/main/scala/ai/chronon/online/metrics/Metrics.scala
- maven_install.json
- online/src/main/scala/ai/chronon/online/metrics/OtelMetricsReporter.scala
🧰 Additional context used
🧬 Code Graph Analysis (1)
service_commons/src/main/java/ai/chronon/service/ChrononServiceLauncher.java (2)
online/src/main/scala/ai/chronon/online/metrics/Metrics.scala (1)
Metrics(26-237)online/src/main/scala/ai/chronon/online/metrics/OtelMetricsReporter.scala (4)
OtelMetricsReporter(19-79)OtelMetricsReporter(81-151)getMetricsReader(98-100)getExporterUrl(94-96)
⏰ Context from checks skipped due to timeout of 90000ms (17)
- GitHub Check: service_tests
- GitHub Check: service_commons_tests
- GitHub Check: service_tests
- GitHub Check: online_tests
- GitHub Check: online_tests
- GitHub Check: flink_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: cloud_gcp_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: api_tests
- GitHub Check: cloud_aws_tests
- GitHub Check: flink_tests
- GitHub Check: api_tests
- GitHub Check: aggregator_tests
- GitHub Check: scala_compile_fmt_fix
- GitHub Check: aggregator_tests
- GitHub Check: enforce_triggered_workflows
🔇 Additional comments (6)
service_commons/src/main/java/ai/chronon/service/ChrononServiceLauncher.java (6)
7-8: Appropriate imports addedAdded necessary imports for Prometheus integration.
17-20: Logger imports and Prometheus options addedRequired imports for new functionality.
32-34: Port configuration constantsConstants properly define the system property and default port for Prometheus metrics.
35-36: Logger initializationStandard logger setup for the class.
43-60: Metrics initialization refactoringProper implementation of reader type selection logic that aligns with OtelMetricsReporter.
62-90: OTLP HTTP metrics configuration extractedClean extraction of existing logic into a dedicated method.
Summary
Currently our fetcher docker container assumes that folks wiring up their Otel metrics are using the Http reader. This doesn't work for some users as they want to use prom to scrape metrics. As Vert.x's micrometer uses prom using a set of interfaces and classes that differ a bit from the otel prom reader setup, we need to expose a PromHTTP server for scraping of Vert.x metrics. Thus for the complete set of metrics, we'll have them available across two ports (one for Vertx and one for fetcher / core chronon). In a future iteration we can try a bridge where we spin up just one server and have Vertx pipe to Fetcher's prom server (or vice versa). I did explore this and the work seemed a little involved so have punted on it for now.
Also added a couple of HTTP2 settings to bump concurrent streams from Vert.x's default as we noticed some errors due to them in a recent load test with a http2 client.
Checklist
Summary by CodeRabbit
Summary by CodeRabbit