Skip to content

feat: circuit breaker implementation#1929

Merged
SkArchon merged 65 commits intomainfrom
milinda/eng-7100-simple-circuit-breaker-for-subgraph-requests
Jul 7, 2025
Merged

feat: circuit breaker implementation#1929
SkArchon merged 65 commits intomainfrom
milinda/eng-7100-simple-circuit-breaker-for-subgraph-requests

Conversation

@SkArchon
Copy link
Copy Markdown
Contributor

@SkArchon SkArchon commented Jun 3, 2025

Motivation and Context

This will add circuit breakers to the router. Circuit breakers are off by default, including metrics that are off by default.
We group subgraphs not by the name/id, but instead the routing url, and we create a circuit breaker per every single unique routing url. We use the full url, this means that

subgraph1: host.com/url1
subgraph2: host.com/url1
subgraph3: host.com/url2

would result in two circuit breakers, since we consider the entire url.

User's can however force subgraph1 and subgraph2 to use two separate circuit breakers by specifying circuit breaker configurations at the per subgraph level.

Example configuration

Global config only

traffic_shaping: 
  all: 
    circuit_breaker:
      enabled: true
      num_buckets: 100
      ...

Subgraph disable for subgraph 5

traffic_shaping:
  all:
    circuit_breaker:
      enabled: true
      ...
  subgraphs:
    subgraph5:
      circuit_breaker:
        enabled: false

Subgraph overrides
Even when the routing url is the same, this will result in separate circuit breakers being created

traffic_shaping:
  all:
    circuit_breaker:
      enabled: true
  subgraphs:
    subgraph1:
      circuit_breaker:
        enabled: true
    subgraph2:
      circuit_breaker:
        enabled: true

Options Included

This PR includes a configurable circuit breaker with the following configurations

  • error_threshold_percentage: The error threshold that needs to be reached to trip / open the circuit breaker.
  • request_threshold: How many requests are needed in minimum TO START deciding if the circuit breaker should be opened or not.
  • sleep_window: The window after the circuit breaker tripped that it definitely does not accept any requests
  • half_open_attempts: How many failed attempts are allowed after the sleep window before preventing any requests
  • required_successful: How many successful attempts are required to change the circuit breaker from half open to required successful
  • rolling_duration: The duration of the sliding window
  • num_buckets: How many buckets are allocated for the sliding window duration, requests are put into each bucket, which will be evicted eventually as the window rolls forward
  • execution_timeout: Execution timeout, how long the circuit breaker should wait before timing out, these are counted as errors
  • max_concurrent_requests: The maximum number of concurrent requests that are allowed on the circuit breaker, -1 results in disabling this

In addition to this the PR introduces the following metrics

  • router.circuit_breaker.state: This shows the state of the circuit breaker, sliced by feature-flag and subgraph.
  • router.circuit_breaker.short_circuits: This indicates the number of requests that were not even executed because of the open circuit breaker, sliced by feature-flag and subgraph

Docs PR: https://github.com/wundergraph/cosmo-docs/pull/104/files

Checklist

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features

    • Introduced configurable circuit breaker support for subgraph requests, allowing granular control over error thresholds, request limits, and recovery behavior.
    • Added metrics and observability for circuit breaker state and short-circuit events, with support for OpenTelemetry and Prometheus.
    • Enabled per-subgraph circuit breaker configuration via new configuration options and schema.
    • Integrated circuit breaker logic into the request pipeline and HTTP transport for improved resilience.
    • Added circuit breaker management supporting grouping by routing URLs and feature flag scoping.
  • Tests

    • Added comprehensive integration tests to validate circuit breaker behavior, state transitions, feature flag handling, and metrics emission.
    • Added configuration schema validation tests for circuit breaker settings.
    • Added unit tests for circuit breaker manager and routing URL grouping logic.
  • Documentation

    • Updated configuration schema and example files to illustrate new circuit breaker settings.

@github-actions github-actions Bot added the router label Jun 3, 2025
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 3, 2025

Router image scan passed

✅ No security vulnerabilities found in image:

ghcr.io/wundergraph/cosmo/router:sha-5e078dde203921be1aa911275854af04b735409b

@SkArchon SkArchon force-pushed the milinda/eng-7100-simple-circuit-breaker-for-subgraph-requests branch from 859fdd5 to 6c1a657 Compare June 17, 2025 18:16
Comment thread router/internal/traceclient/traceclient.go Outdated
@SkArchon SkArchon changed the title feat: circuit [WIP] feat: circuit breaker implementation Jun 17, 2025
@SkArchon SkArchon marked this pull request as ready for review June 17, 2025 18:57
Comment thread router/internal/circuit/breaker.go Outdated
Comment thread router/pkg/metric/circuitbreaker.go
Comment thread router-tests/circuit_breaker_test.go Outdated
@SkArchon
Copy link
Copy Markdown
Contributor Author

SkArchon commented Jul 7, 2025

@coderabbitai resolve

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 7, 2025

✅ Actions performed

Comments resolved.

@SkArchon SkArchon merged commit c63f83b into main Jul 7, 2025
43 of 46 checks passed
@SkArchon SkArchon deleted the milinda/eng-7100-simple-circuit-breaker-for-subgraph-requests branch July 7, 2025 13:53
yuzoonc1 pushed a commit to yuzoonc1/cosmo that referenced this pull request Jul 9, 2025
Co-authored-by: Dustin Deus <deusdustin@gmail.com>
@coderabbitai coderabbitai Bot mentioned this pull request Aug 17, 2025
5 tasks
@coderabbitai coderabbitai Bot mentioned this pull request Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants