go-coldbrew · ankurs · Apr 9, 2026 · Apr 9, 2026 · Apr 9, 2026 · Apr 9, 2026
diff --git a/Index.md b/Index.md
@@ -30,7 +30,7 @@ A Kubernetes-native Go microservice framework for building production-grade gRPC
 | **Distributed Tracing** | [OpenTelemetry] and [New Relic] support with automatic span creation in interceptors — traces can be sent to any OTLP-compatible backend including [Jaeger] |
 | **Prometheus Metrics** | Built-in request latency, error rate, and circuit breaker metrics at `/metrics` |
 | **Error Tracking** | Stack traces, gRPC status codes, and async notification to [Sentry], Rollbar, or Airbrake |
-| **Resilience** | Client-side circuit breaking and retries via interceptors |
+| **Rate Limiting** | Per-pod token bucket rate limiter — disabled by default, pluggable via custom [`ratelimit.Limiter`](https://pkg.go.dev/github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/ratelimit#Limiter) interface for distributed or per-tenant rate limiting. Config: `RATE_LIMIT_PER_SECOND`. See [interceptors howto](/howto/interceptors#rate-limiting) |
 | **Fast Serialization** | [vtprotobuf] codec enabled by default — faster gRPC marshalling with automatic fallback to standard protobuf |
 | **Kubernetes-native** | Health/ready probes, graceful SIGTERM shutdown, structured JSON logs, Prometheus metrics — all wired automatically |
 | **Swagger / OpenAPI** | Interactive API docs auto-served at `/swagger/` from your protobuf definitions |

diff --git a/architecture.md b/architecture.md
@@ -152,13 +152,16 @@ When a request arrives at a ColdBrew service, it flows through several layers:
   │  ┌──────────────────────────────────────────┐   │
   │  │          Server Interceptor Chain         │   │
   │  │                                           │   │
-  │  │  1. Response Time Logging                 │   │
-  │  │  2. Trace ID Injection                    │   │
-  │  │  3. Proto Validate                        │   │
-  │  │  4. Prometheus Metrics                    │   │
-  │  │  5. Error Notification (Sentry/Rollbar)   │   │
-  │  │  6. New Relic Transaction                 │   │
-  │  │  7. Panic Recovery                        │   │
+  │  │   1. Default Timeout (60s deadline)       │   │
+  │  │   2. Rate Limiting (token bucket)         │   │
+  │  │   3. Response Time Logging                │   │
+  │  │   4. Trace ID Injection                   │   │
+  │  │   5. Debug Log (per-request level)        │   │
+  │  │   6. Proto Validate                       │   │
+  │  │   7. Prometheus Metrics                   │   │
+  │  │   8. Error Notification (Sentry/Rollbar)  │   │
+  │  │   9. New Relic Transaction                │   │
+  │  │  10. Panic Recovery                       │   │
   │  │  (OTEL tracing via gRPC stats handler)    │   │
   │  │                                           │   │
   │  └────────────────────┬─────────────────────┘   │
@@ -199,13 +202,16 @@ Interceptors are gRPC middleware that run on every request. ColdBrew chains them
 
 | Order | Interceptor | Package | What It Does |
 |-------|------------|---------|--------------|
-| 1 | Response Time Logging | `interceptors` | Logs method name, duration, and status code |
-| 2 | Trace ID | `interceptors` | Generates a trace ID (or reads it from the `x-trace-id` HTTP header or a `trace_id` proto field) and propagates it to structured logs, Sentry/Rollbar error reports, and OpenTelemetry spans (as the `coldbrew.trace_id` attribute) |
-| 3 | Proto Validate | `interceptors` | Validates incoming messages using [protovalidate](https://github.com/bufbuild/protovalidate) annotations. Returns `InvalidArgument` on failure. Disable with `DISABLE_PROTO_VALIDATE` |
-| 4 | Prometheus | `interceptors` | Records request count, latency histogram, and status codes |
-| 5 | Error Notification | `interceptors` | Sends errors to Sentry/Rollbar/Airbrake asynchronously |
-| 6 | New Relic | `interceptors` | Creates a New Relic transaction for APM |
-| 7 | Panic Recovery | `interceptors` | Catches panics and converts them to gRPC errors |
+| 1 | Default Timeout | `interceptors` | Applies a 60s deadline to unary RPCs without one. Prevents resource exhaustion from clients that don't set deadlines. Config: `GRPC_SERVER_DEFAULT_TIMEOUT_IN_SECONDS` |
+| 2 | Rate Limiting | `interceptors` | Per-pod token bucket rate limiter. Returns `ResourceExhausted` when exceeded. Disabled by default. Config: `RATE_LIMIT_PER_SECOND`, `RATE_LIMIT_BURST` |
+| 3 | Response Time Logging | `interceptors` | Logs method name, duration, and status code |
+| 4 | Trace ID | `interceptors` | Generates a trace ID (or reads it from the `x-trace-id` HTTP header or a `trace_id` proto field) and propagates it to structured logs, Sentry/Rollbar error reports, and OpenTelemetry spans (as the `coldbrew.trace_id` attribute) |
+| 5 | Debug Log | `interceptors` | Enables per-request log level override via `bool debug` or `bool enable_debug` proto field, or `x-debug-log-level` metadata header. Config: `DISABLE_DEBUG_LOG_INTERCEPTOR`, `DEBUG_LOG_HEADER_NAME` |
+| 6 | Proto Validate | `interceptors` | Validates incoming messages using [protovalidate](https://github.com/bufbuild/protovalidate) annotations. Returns `InvalidArgument` on failure. Config: `DISABLE_PROTO_VALIDATE` |
+| 7 | Prometheus | `interceptors` | Records request count, latency histogram, and status codes |
+| 8 | Error Notification | `interceptors` | Sends errors to Sentry/Rollbar/Airbrake asynchronously |
+| 9 | New Relic | `interceptors` | Creates a New Relic transaction for APM |
+| 10 | Panic Recovery | `interceptors` | Catches panics and converts them to gRPC errors |
 
 {: .note }
 OpenTelemetry tracing spans are created by the `otelgrpc` stats handler configured at the gRPC server/client level, not as an interceptor in the chain.

diff --git a/config-reference.md b/config-reference.md
@@ -55,6 +55,9 @@ cfg := config.GetColdBrewConfig()
 | `DISABLE_DEBUG_LOG_INTERCEPTOR` | bool | `false` | Disable the DebugLogInterceptor. When disabled, proto `debug`/`enable_debug` fields and `x-debug-log-level` headers will not trigger per-request debug logging |
 | `DEBUG_LOG_HEADER_NAME` | string | `x-debug-log-level` | gRPC metadata / HTTP header name for per-request debug logging. The header value should be a valid log level (`debug`, `info`, `warn`, `error`). See [Log How-To](/howto/Log/#production-debugging-with-overrideloglevel--trace-id) |
 | `GRPC_SERVER_DEFAULT_TIMEOUT_IN_SECONDS` | int | `60` | Default timeout for incoming unary gRPC requests without a deadline. Set to `0` to disable. Does not apply to stream RPCs |
+| `RATE_LIMIT_PER_SECOND` | float64 | `0` | Maximum incoming requests per second for this pod (per-pod in-memory token bucket). Set to `0` to disable (default). With N pods, effective cluster-wide limit is N × this value. For distributed rate limiting, use `interceptors.SetRateLimiter()` with a custom implementation |
+| `RATE_LIMIT_BURST` | int | `1` | Maximum burst size for the token bucket rate limiter. Only takes effect when `RATE_LIMIT_PER_SECOND > 0` |
+| `DISABLE_RATE_LIMIT` | bool | `false` | Disable the rate limiting interceptor entirely |
 
 ## gRPC TLS
 

diff --git a/howto/interceptors.md b/howto/interceptors.md
@@ -171,6 +171,96 @@ func init() {
 
 Set `DISABLE_PROTO_VALIDATE=true` to skip validation entirely.
 
+## Rate limiting
+
+ColdBrew includes a built-in per-pod token bucket rate limiter. It is **disabled by default** and must be explicitly enabled.
+
+### Enabling via environment variables
+
+```yaml
+env:
+  - name: RATE_LIMIT_PER_SECOND
+    value: "100"   # 100 requests per second per pod
+  - name: RATE_LIMIT_BURST
+    value: "50"    # allow bursts up to 50
+```
+
+{: .important }
+This is a **per-pod in-memory limit**. With N pods, the effective cluster-wide limit is N × `RATE_LIMIT_PER_SECOND`. For cluster-wide rate limiting, use a custom limiter (see below) or your load balancer.
+
+When a request exceeds the rate limit, the interceptor returns a `ResourceExhausted` gRPC status code.
+
+### Custom per-API rate limiter
+
+For different rate limits per API method, implement the [`ratelimit.Limiter`](https://pkg.go.dev/github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/ratelimit#Limiter) interface and register it during initialization:
+
+```go
+import (
+    "context"
+    "fmt"
+
+    "github.com/go-coldbrew/interceptors"
+    ratelimit "github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/ratelimit"
+    "github.com/grpc-ecosystem/grpc-gateway/v2/runtime"
+    "golang.org/x/time/rate"
+    "google.golang.org/grpc"
+)
+
+// Compile-time check that perMethodLimiter implements the interface.
+var _ ratelimit.Limiter = (*perMethodLimiter)(nil)
+
+type perMethodLimiter struct {
+    limiters map[string]*rate.Limiter
+    fallback *rate.Limiter
+}
+
+func (l *perMethodLimiter) Limit(ctx context.Context) error {
+    // grpc.Method works for native gRPC calls;
+    // runtime.RPCMethod works for HTTP→gRPC via grpc-gateway
+    method, ok := grpc.Method(ctx)
+    if !ok {
+        method, ok = runtime.RPCMethod(ctx)
+    }
+    if !ok {
+        method = "unknown"
+    }
+    limiter, found := l.limiters[method]
+    if !found {
+        limiter = l.fallback
+    }
+    if !limiter.Allow() {
+        return fmt.Errorf("rate limit exceeded for %s", method)
+    }
+    return nil
+}
+
+func init() {
+    interceptors.SetRateLimiter(&perMethodLimiter{
+        limiters: map[string]*rate.Limiter{
+            "/myservice.v1.UserService/CreateUser": rate.NewLimiter(10, 5),   // 10 rps
+            "/myservice.v1.UserService/ListUsers":  rate.NewLimiter(100, 50), // 100 rps
+        },
+        fallback: rate.NewLimiter(50, 25), // 50 rps default
+    })
+}
+```
+
+### Distributed rate limiting
+
+For rate limiting across pods or per-tenant, implement `ratelimit.Limiter` with a distributed backend. Libraries that work well with ColdBrew's limiter interface:
+
+| Library | Backend | Notes |
+|---------|---------|-------|
+| [mennanov/limiters](https://github.com/mennanov/limiters) | Redis, etcd, DynamoDB, memory | Most flexible — has explicit gRPC example, multiple algorithms |
+| [go-redis/redis_rate](https://github.com/go-redis/redis_rate) | Redis | GCRA algorithm, good if you already use go-redis (last release 2023 — check for activity) |
+| [sethvargo/go-limiter](https://github.com/sethvargo/go-limiter) | Redis, memory | Clean API, actively maintained |
+
+For large-scale multi-service rate limiting, consider a dedicated rate limiting service like [gubernator](https://github.com/gubernator-io/gubernator) (peer-to-peer, no Redis) or [Envoy ratelimit](https://github.com/envoyproxy/ratelimit) (Redis-backed).
+
+### Disabling
+
+Set `DISABLE_RATE_LIMIT=true` to remove the rate limiting interceptor from the chain entirely.
+
 ## Adding custom interceptors to Default interceptors
 
 You can add your own interceptors to the [Default Interceptors] by appending to the list of interceptors.

diff --git a/howto/production.md b/howto/production.md
@@ -473,6 +473,11 @@ env:
   # Never use debug level on public services — may log request payloads
   - name: LOG_LEVEL
     value: "info"
+  # Rate limit incoming requests (per-pod). Adjust to your service's capacity.
+  - name: RATE_LIMIT_PER_SECOND
+    value: "1000"
+  - name: RATE_LIMIT_BURST
+    value: "50"
   # GRPC_MAX_SEND_MSG_SIZE limits response size FROM your service (default ~2GB).
   # GRPC_MAX_RECV_MSG_SIZE limits request size TO your service (default 4MB).
   # Consider reducing send size for public APIs; use streaming for large payloads.
@@ -531,7 +536,7 @@ These are your responsibility to handle at the infrastructure level:
 
 - **CORS** — ColdBrew does not handle CORS headers. Use a reverse proxy (Nginx, Envoy, Istio) or add CORS middleware to the HTTP gateway.
 - **Authentication/authorization** — Admin endpoints (`/debug/pprof`, `/metrics`, `/swagger`) have no built-in auth. Disable them for public services or restrict access at the load balancer.
-- **Rate limiting** — No built-in rate limiting on any endpoint. Use your load balancer, service mesh, or a rate-limiting proxy.
+- **Cluster-wide rate limiting** — Built-in rate limiting (`RATE_LIMIT_PER_SECOND`) is per-pod only. For cluster-wide or per-tenant rate limiting, use `interceptors.SetRateLimiter()` with a custom implementation or your load balancer. See [Interceptors How-To](/howto/interceptors#rate-limiting).
 - **HTTP header forwarding** — `HTTP_HEADER_PREFIXES` forwards matching HTTP headers to gRPC metadata. Never add `authorization`, `cookie`, or `x-api-key` prefixes unless you are intentionally doing header-based gRPC auth.
 
 ## Production checklist
@@ -557,6 +562,7 @@ These are your responsibility to handle at the infrastructure level:
 - [ ] `DISABLE_SWAGGER=true` — disable API documentation
 - [ ] `DISABLE_GRPC_REFLECTION=true` — disable service discovery
 - [ ] `DISABLE_DEBUG_LOG_INTERCEPTOR=true` — disable header-based debug logging
+- [ ] Enable rate limiting — `RATE_LIMIT_PER_SECOND` + `RATE_LIMIT_BURST` (per-pod, adjust to capacity). See [interceptors howto](/howto/interceptors#rate-limiting)
 - [ ] Consider reducing `GRPC_MAX_SEND_MSG_SIZE` from its ~2GB default if responses are small
 - [ ] Restrict `/metrics` access at the load balancer
 - [ ] `LOG_LEVEL=info` or higher (never `debug`)