Skip to content

feat: implement metric backend#13

Merged
StarpTech merged 19 commits intomainfrom
dustin/metric_backend
Aug 28, 2023
Merged

feat: implement metric backend#13
StarpTech merged 19 commits intomainfrom
dustin/metric_backend

Conversation

@StarpTech
Copy link
Copy Markdown
Contributor

@StarpTech StarpTech commented Aug 26, 2023

Motivation and Context

This PR instrument the Router with OpenTelemetry Metrics. All metrics are exposed through the PrometheusExporter on port http://127.0.0.1:8088/metrics (configurable) and sent to the otelcollector metrics endpoint.

The PR introduces built-in support for Prometheus. Data is exposed at http://127.0.0.1:8088/metrics. We export the default Go and Process metrics. In addition, we export (R.E.D) metrics related to incoming GraphQL traffic:

  • router_http_requests_total: Total count of incoming requests
  • router_http_response_content_length_total: Total bytes of incoming requests
  • router_http_request_content_length_total: Total bytes of outgoing responses
  • router_http_duration_milliseconds: End-to-end duration of incoming requests in milliseconds (histogram)
  • router_http_in_flight_requests: Number of in-flight requests

All metrics are tracked along the following dimensions:

  • operation_name
  • operation_type
  • status_code
  • federated_graph_name
  • config_version

This enables you to answer the following questions:

  • What is the error/success rate of my router or a specific operation?
  • How is the performance of my router or a specific operation?
  • What is the average request/response size of a specific operation?
  • How much traffic went through a router instance?

During this work, I found dealing with net/http handlers and custom gin middleware very cumbersome to manage. I used the time to migrate to chin which is 100% compatible with the std library. This is also an important step towards custom middleware support in the router. I refactored the GraphQL handlers for easier testing as well.

TODO

  • Test Helm Charts

@StarpTech StarpTech marked this pull request as ready for review August 26, 2023 17:11
@StarpTech StarpTech merged commit 4c0a790 into main Aug 28, 2023
jensneuse added a commit that referenced this pull request Apr 24, 2026
- #13: start the race-test deadline after workers are scheduled so -race +
  parallel tests don't execute near-zero iterations. Applied in all three
  race tests (TestArticleStoreNoRace, TestListingStoreNoRace,
  TestResolverPathNoRace)
- #14: nil-guard location.Address in the cachegraph Venue query resolver;
  returns "location.address is required" instead of panicking on nil deref
- #15: parameterize `make demo` startup wait via DEMO_STARTUP_ATTEMPTS
  (default 60) and DEMO_STARTUP_SLEEP (default 0.5). Previous hard-coded
  20 × 0.5s was too short on cold checkouts

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants