diff --git a/Index.md b/Index.md index 7e4cf07..9db8b26 100644 --- a/Index.md +++ b/Index.md @@ -30,7 +30,7 @@ A Kubernetes-native Go microservice framework for building production-grade gRPC | **Distributed Tracing** | [OpenTelemetry] and [New Relic] support with automatic span creation in interceptors — traces can be sent to any OTLP-compatible backend including [Jaeger] | | **Prometheus Metrics** | Built-in request latency, error rate, and circuit breaker metrics at `/metrics` | | **Error Tracking** | Stack traces, gRPC status codes, and async notification to [Sentry], Rollbar, or Airbrake | -| **Resilience** | Client-side circuit breaking and retries via interceptors | +| **Rate Limiting** | Per-pod token bucket rate limiter — disabled by default, pluggable via custom [`ratelimit.Limiter`](https://pkg.go.dev/github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/ratelimit#Limiter) interface for distributed or per-tenant rate limiting. Config: `RATE_LIMIT_PER_SECOND`. See [interceptors howto](/howto/interceptors#rate-limiting) | | **Fast Serialization** | [vtprotobuf] codec enabled by default — faster gRPC marshalling with automatic fallback to standard protobuf | | **Kubernetes-native** | Health/ready probes, graceful SIGTERM shutdown, structured JSON logs, Prometheus metrics — all wired automatically | | **Swagger / OpenAPI** | Interactive API docs auto-served at `/swagger/` from your protobuf definitions | diff --git a/architecture.md b/architecture.md index d2e9488..2ac0a1d 100644 --- a/architecture.md +++ b/architecture.md @@ -152,13 +152,16 @@ When a request arrives at a ColdBrew service, it flows through several layers: │ ┌──────────────────────────────────────────┐ │ │ │ Server Interceptor Chain │ │ │ │ │ │ - │ │ 1. Response Time Logging │ │ - │ │ 2. Trace ID Injection │ │ - │ │ 3. Proto Validate │ │ - │ │ 4. Prometheus Metrics │ │ - │ │ 5. Error Notification (Sentry/Rollbar) │ │ - │ │ 6. New Relic Transaction │ │ - │ │ 7. Panic Recovery │ │ + │ │ 1. Default Timeout (60s deadline) │ │ + │ │ 2. Rate Limiting (token bucket) │ │ + │ │ 3. Response Time Logging │ │ + │ │ 4. Trace ID Injection │ │ + │ │ 5. Debug Log (per-request level) │ │ + │ │ 6. Proto Validate │ │ + │ │ 7. Prometheus Metrics │ │ + │ │ 8. Error Notification (Sentry/Rollbar) │ │ + │ │ 9. New Relic Transaction │ │ + │ │ 10. Panic Recovery │ │ │ │ (OTEL tracing via gRPC stats handler) │ │ │ │ │ │ │ └────────────────────┬─────────────────────┘ │ @@ -199,13 +202,16 @@ Interceptors are gRPC middleware that run on every request. ColdBrew chains them | Order | Interceptor | Package | What It Does | |-------|------------|---------|--------------| -| 1 | Response Time Logging | `interceptors` | Logs method name, duration, and status code | -| 2 | Trace ID | `interceptors` | Generates a trace ID (or reads it from the `x-trace-id` HTTP header or a `trace_id` proto field) and propagates it to structured logs, Sentry/Rollbar error reports, and OpenTelemetry spans (as the `coldbrew.trace_id` attribute) | -| 3 | Proto Validate | `interceptors` | Validates incoming messages using [protovalidate](https://github.com/bufbuild/protovalidate) annotations. Returns `InvalidArgument` on failure. Disable with `DISABLE_PROTO_VALIDATE` | -| 4 | Prometheus | `interceptors` | Records request count, latency histogram, and status codes | -| 5 | Error Notification | `interceptors` | Sends errors to Sentry/Rollbar/Airbrake asynchronously | -| 6 | New Relic | `interceptors` | Creates a New Relic transaction for APM | -| 7 | Panic Recovery | `interceptors` | Catches panics and converts them to gRPC errors | +| 1 | Default Timeout | `interceptors` | Applies a 60s deadline to unary RPCs without one. Prevents resource exhaustion from clients that don't set deadlines. Config: `GRPC_SERVER_DEFAULT_TIMEOUT_IN_SECONDS` | +| 2 | Rate Limiting | `interceptors` | Per-pod token bucket rate limiter. Returns `ResourceExhausted` when exceeded. Disabled by default. Config: `RATE_LIMIT_PER_SECOND`, `RATE_LIMIT_BURST` | +| 3 | Response Time Logging | `interceptors` | Logs method name, duration, and status code | +| 4 | Trace ID | `interceptors` | Generates a trace ID (or reads it from the `x-trace-id` HTTP header or a `trace_id` proto field) and propagates it to structured logs, Sentry/Rollbar error reports, and OpenTelemetry spans (as the `coldbrew.trace_id` attribute) | +| 5 | Debug Log | `interceptors` | Enables per-request log level override via `bool debug` or `bool enable_debug` proto field, or `x-debug-log-level` metadata header. Config: `DISABLE_DEBUG_LOG_INTERCEPTOR`, `DEBUG_LOG_HEADER_NAME` | +| 6 | Proto Validate | `interceptors` | Validates incoming messages using [protovalidate](https://github.com/bufbuild/protovalidate) annotations. Returns `InvalidArgument` on failure. Config: `DISABLE_PROTO_VALIDATE` | +| 7 | Prometheus | `interceptors` | Records request count, latency histogram, and status codes | +| 8 | Error Notification | `interceptors` | Sends errors to Sentry/Rollbar/Airbrake asynchronously | +| 9 | New Relic | `interceptors` | Creates a New Relic transaction for APM | +| 10 | Panic Recovery | `interceptors` | Catches panics and converts them to gRPC errors | {: .note } OpenTelemetry tracing spans are created by the `otelgrpc` stats handler configured at the gRPC server/client level, not as an interceptor in the chain. diff --git a/config-reference.md b/config-reference.md index 0296393..ba598dc 100644 --- a/config-reference.md +++ b/config-reference.md @@ -55,6 +55,9 @@ cfg := config.GetColdBrewConfig() | `DISABLE_DEBUG_LOG_INTERCEPTOR` | bool | `false` | Disable the DebugLogInterceptor. When disabled, proto `debug`/`enable_debug` fields and `x-debug-log-level` headers will not trigger per-request debug logging | | `DEBUG_LOG_HEADER_NAME` | string | `x-debug-log-level` | gRPC metadata / HTTP header name for per-request debug logging. The header value should be a valid log level (`debug`, `info`, `warn`, `error`). See [Log How-To](/howto/Log/#production-debugging-with-overrideloglevel--trace-id) | | `GRPC_SERVER_DEFAULT_TIMEOUT_IN_SECONDS` | int | `60` | Default timeout for incoming unary gRPC requests without a deadline. Set to `0` to disable. Does not apply to stream RPCs | +| `RATE_LIMIT_PER_SECOND` | float64 | `0` | Maximum incoming requests per second for this pod (per-pod in-memory token bucket). Set to `0` to disable (default). With N pods, effective cluster-wide limit is N × this value. For distributed rate limiting, use `interceptors.SetRateLimiter()` with a custom implementation | +| `RATE_LIMIT_BURST` | int | `1` | Maximum burst size for the token bucket rate limiter. Only takes effect when `RATE_LIMIT_PER_SECOND > 0` | +| `DISABLE_RATE_LIMIT` | bool | `false` | Disable the rate limiting interceptor entirely | ## gRPC TLS diff --git a/howto/interceptors.md b/howto/interceptors.md index 4642ee6..df1a54e 100644 --- a/howto/interceptors.md +++ b/howto/interceptors.md @@ -171,6 +171,96 @@ func init() { Set `DISABLE_PROTO_VALIDATE=true` to skip validation entirely. +## Rate limiting + +ColdBrew includes a built-in per-pod token bucket rate limiter. It is **disabled by default** and must be explicitly enabled. + +### Enabling via environment variables + +```yaml +env: + - name: RATE_LIMIT_PER_SECOND + value: "100" # 100 requests per second per pod + - name: RATE_LIMIT_BURST + value: "50" # allow bursts up to 50 +``` + +{: .important } +This is a **per-pod in-memory limit**. With N pods, the effective cluster-wide limit is N × `RATE_LIMIT_PER_SECOND`. For cluster-wide rate limiting, use a custom limiter (see below) or your load balancer. + +When a request exceeds the rate limit, the interceptor returns a `ResourceExhausted` gRPC status code. + +### Custom per-API rate limiter + +For different rate limits per API method, implement the [`ratelimit.Limiter`](https://pkg.go.dev/github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/ratelimit#Limiter) interface and register it during initialization: + +```go +import ( + "context" + "fmt" + + "github.com/go-coldbrew/interceptors" + ratelimit "github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/ratelimit" + "github.com/grpc-ecosystem/grpc-gateway/v2/runtime" + "golang.org/x/time/rate" + "google.golang.org/grpc" +) + +// Compile-time check that perMethodLimiter implements the interface. +var _ ratelimit.Limiter = (*perMethodLimiter)(nil) + +type perMethodLimiter struct { + limiters map[string]*rate.Limiter + fallback *rate.Limiter +} + +func (l *perMethodLimiter) Limit(ctx context.Context) error { + // grpc.Method works for native gRPC calls; + // runtime.RPCMethod works for HTTP→gRPC via grpc-gateway + method, ok := grpc.Method(ctx) + if !ok { + method, ok = runtime.RPCMethod(ctx) + } + if !ok { + method = "unknown" + } + limiter, found := l.limiters[method] + if !found { + limiter = l.fallback + } + if !limiter.Allow() { + return fmt.Errorf("rate limit exceeded for %s", method) + } + return nil +} + +func init() { + interceptors.SetRateLimiter(&perMethodLimiter{ + limiters: map[string]*rate.Limiter{ + "/myservice.v1.UserService/CreateUser": rate.NewLimiter(10, 5), // 10 rps + "/myservice.v1.UserService/ListUsers": rate.NewLimiter(100, 50), // 100 rps + }, + fallback: rate.NewLimiter(50, 25), // 50 rps default + }) +} +``` + +### Distributed rate limiting + +For rate limiting across pods or per-tenant, implement `ratelimit.Limiter` with a distributed backend. Libraries that work well with ColdBrew's limiter interface: + +| Library | Backend | Notes | +|---------|---------|-------| +| [mennanov/limiters](https://github.com/mennanov/limiters) | Redis, etcd, DynamoDB, memory | Most flexible — has explicit gRPC example, multiple algorithms | +| [go-redis/redis_rate](https://github.com/go-redis/redis_rate) | Redis | GCRA algorithm, good if you already use go-redis (last release 2023 — check for activity) | +| [sethvargo/go-limiter](https://github.com/sethvargo/go-limiter) | Redis, memory | Clean API, actively maintained | + +For large-scale multi-service rate limiting, consider a dedicated rate limiting service like [gubernator](https://github.com/gubernator-io/gubernator) (peer-to-peer, no Redis) or [Envoy ratelimit](https://github.com/envoyproxy/ratelimit) (Redis-backed). + +### Disabling + +Set `DISABLE_RATE_LIMIT=true` to remove the rate limiting interceptor from the chain entirely. + ## Adding custom interceptors to Default interceptors You can add your own interceptors to the [Default Interceptors] by appending to the list of interceptors. diff --git a/howto/production.md b/howto/production.md index 0205c6b..e467824 100644 --- a/howto/production.md +++ b/howto/production.md @@ -473,6 +473,11 @@ env: # Never use debug level on public services — may log request payloads - name: LOG_LEVEL value: "info" + # Rate limit incoming requests (per-pod). Adjust to your service's capacity. + - name: RATE_LIMIT_PER_SECOND + value: "1000" + - name: RATE_LIMIT_BURST + value: "50" # GRPC_MAX_SEND_MSG_SIZE limits response size FROM your service (default ~2GB). # GRPC_MAX_RECV_MSG_SIZE limits request size TO your service (default 4MB). # Consider reducing send size for public APIs; use streaming for large payloads. @@ -531,7 +536,7 @@ These are your responsibility to handle at the infrastructure level: - **CORS** — ColdBrew does not handle CORS headers. Use a reverse proxy (Nginx, Envoy, Istio) or add CORS middleware to the HTTP gateway. - **Authentication/authorization** — Admin endpoints (`/debug/pprof`, `/metrics`, `/swagger`) have no built-in auth. Disable them for public services or restrict access at the load balancer. -- **Rate limiting** — No built-in rate limiting on any endpoint. Use your load balancer, service mesh, or a rate-limiting proxy. +- **Cluster-wide rate limiting** — Built-in rate limiting (`RATE_LIMIT_PER_SECOND`) is per-pod only. For cluster-wide or per-tenant rate limiting, use `interceptors.SetRateLimiter()` with a custom implementation or your load balancer. See [Interceptors How-To](/howto/interceptors#rate-limiting). - **HTTP header forwarding** — `HTTP_HEADER_PREFIXES` forwards matching HTTP headers to gRPC metadata. Never add `authorization`, `cookie`, or `x-api-key` prefixes unless you are intentionally doing header-based gRPC auth. ## Production checklist @@ -557,6 +562,7 @@ These are your responsibility to handle at the infrastructure level: - [ ] `DISABLE_SWAGGER=true` — disable API documentation - [ ] `DISABLE_GRPC_REFLECTION=true` — disable service discovery - [ ] `DISABLE_DEBUG_LOG_INTERCEPTOR=true` — disable header-based debug logging +- [ ] Enable rate limiting — `RATE_LIMIT_PER_SECOND` + `RATE_LIMIT_BURST` (per-pod, adjust to capacity). See [interceptors howto](/howto/interceptors#rate-limiting) - [ ] Consider reducing `GRPC_MAX_SEND_MSG_SIZE` from its ~2GB default if responses are small - [ ] Restrict `/metrics` access at the load balancer - [ ] `LOG_LEVEL=info` or higher (never `debug`)