go-coldbrew · ankurs · Apr 9, 2026 · Apr 9, 2026 · Apr 9, 2026 · Apr 9, 2026
diff --git a/Index.md b/Index.md
@@ -30,7 +30,7 @@ A Kubernetes-native Go microservice framework for building production-grade gRPC
 | **Distributed Tracing** | [OpenTelemetry] and [New Relic] support with automatic span creation in interceptors — traces can be sent to any OTLP-compatible backend including [Jaeger] |
 | **Prometheus Metrics** | Built-in request latency, error rate, and circuit breaker metrics at `/metrics` |
 | **Error Tracking** | Stack traces, gRPC status codes, and async notification to [Sentry], Rollbar, or Airbrake |
-| **Resilience** | Client-side circuit breaking and retries via interceptors |
+| **Rate Limiting** | Per-pod token bucket rate limiter — disabled by default, pluggable via custom `Limiter` interface for distributed or per-tenant rate limiting. Config: `RATE_LIMIT_PER_SECOND` |
 | **Fast Serialization** | [vtprotobuf] codec enabled by default — faster gRPC marshalling with automatic fallback to standard protobuf |
 | **Kubernetes-native** | Health/ready probes, graceful SIGTERM shutdown, structured JSON logs, Prometheus metrics — all wired automatically |
 | **Swagger / OpenAPI** | Interactive API docs auto-served at `/swagger/` from your protobuf definitions |

diff --git a/architecture.md b/architecture.md
@@ -199,13 +199,16 @@ Interceptors are gRPC middleware that run on every request. ColdBrew chains them
 
 | Order | Interceptor | Package | What It Does |
 |-------|------------|---------|--------------|
-| 1 | Response Time Logging | `interceptors` | Logs method name, duration, and status code |
-| 2 | Trace ID | `interceptors` | Generates a trace ID (or reads it from the `x-trace-id` HTTP header or a `trace_id` proto field) and propagates it to structured logs, Sentry/Rollbar error reports, and OpenTelemetry spans (as the `coldbrew.trace_id` attribute) |
-| 3 | Proto Validate | `interceptors` | Validates incoming messages using [protovalidate](https://github.com/bufbuild/protovalidate) annotations. Returns `InvalidArgument` on failure. Disable with `DISABLE_PROTO_VALIDATE` |
-| 4 | Prometheus | `interceptors` | Records request count, latency histogram, and status codes |
-| 5 | Error Notification | `interceptors` | Sends errors to Sentry/Rollbar/Airbrake asynchronously |
-| 6 | New Relic | `interceptors` | Creates a New Relic transaction for APM |
-| 7 | Panic Recovery | `interceptors` | Catches panics and converts them to gRPC errors |
+| 1 | Default Timeout | `interceptors` | Applies a 60s deadline to unary RPCs without one. Prevents resource exhaustion from clients that don't set deadlines. Config: `GRPC_SERVER_DEFAULT_TIMEOUT_IN_SECONDS` |
+| 2 | Rate Limiting | `interceptors` | Per-pod token bucket rate limiter. Returns `ResourceExhausted` when exceeded. Disabled by default. Config: `RATE_LIMIT_PER_SECOND`, `RATE_LIMIT_BURST` |
+| 3 | Response Time Logging | `interceptors` | Logs method name, duration, and status code |
+| 4 | Trace ID | `interceptors` | Generates a trace ID (or reads it from the `x-trace-id` HTTP header or a `trace_id` proto field) and propagates it to structured logs, Sentry/Rollbar error reports, and OpenTelemetry spans (as the `coldbrew.trace_id` attribute) |
+| 5 | Debug Log | `interceptors` | Enables per-request log level override via `bool debug` proto field or `x-debug-log-level` metadata header. Config: `DISABLE_DEBUG_LOG_INTERCEPTOR`, `DEBUG_LOG_HEADER_NAME` |
+| 6 | Proto Validate | `interceptors` | Validates incoming messages using [protovalidate](https://github.com/bufbuild/protovalidate) annotations. Returns `InvalidArgument` on failure. Config: `DISABLE_PROTO_VALIDATE` |
+| 7 | Prometheus | `interceptors` | Records request count, latency histogram, and status codes |
+| 8 | Error Notification | `interceptors` | Sends errors to Sentry/Rollbar/Airbrake asynchronously |
+| 9 | New Relic | `interceptors` | Creates a New Relic transaction for APM |
+| 10 | Panic Recovery | `interceptors` | Catches panics and converts them to gRPC errors |
 
 {: .note }
 OpenTelemetry tracing spans are created by the `otelgrpc` stats handler configured at the gRPC server/client level, not as an interceptor in the chain.

diff --git a/config-reference.md b/config-reference.md
@@ -55,6 +55,9 @@ cfg := config.GetColdBrewConfig()
 | `DISABLE_DEBUG_LOG_INTERCEPTOR` | bool | `false` | Disable the DebugLogInterceptor. When disabled, proto `debug`/`enable_debug` fields and `x-debug-log-level` headers will not trigger per-request debug logging |
 | `DEBUG_LOG_HEADER_NAME` | string | `x-debug-log-level` | gRPC metadata / HTTP header name for per-request debug logging. The header value should be a valid log level (`debug`, `info`, `warn`, `error`). See [Log How-To](/howto/Log/#production-debugging-with-overrideloglevel--trace-id) |
 | `GRPC_SERVER_DEFAULT_TIMEOUT_IN_SECONDS` | int | `60` | Default timeout for incoming unary gRPC requests without a deadline. Set to `0` to disable. Does not apply to stream RPCs |
+| `RATE_LIMIT_PER_SECOND` | float64 | `0` | Maximum incoming requests per second for this pod (per-pod in-memory token bucket). Set to `0` to disable (default). With N pods, effective cluster-wide limit is N × this value. For distributed rate limiting, use `interceptors.SetRateLimiter()` with a custom implementation |
+| `RATE_LIMIT_BURST` | int | `1` | Maximum burst size for the token bucket rate limiter. Only takes effect when `RATE_LIMIT_PER_SECOND > 0` |
+| `DISABLE_RATE_LIMIT` | bool | `false` | Disable the rate limiting interceptor entirely |
 
 ## gRPC TLS
 

diff --git a/howto/interceptors.md b/howto/interceptors.md
@@ -171,6 +171,73 @@ func init() {
 
 Set `DISABLE_PROTO_VALIDATE=true` to skip validation entirely.
 
+## Rate limiting
+
+ColdBrew includes a built-in per-pod token bucket rate limiter. It is **disabled by default** and must be explicitly enabled.
+
+### Enabling via environment variables
+
+```yaml
+env:
+  - name: RATE_LIMIT_PER_SECOND
+    value: "100"   # 100 requests per second per pod
+  - name: RATE_LIMIT_BURST
+    value: "50"    # allow bursts up to 50
+```
+
+{: .important }
+This is a **per-pod in-memory limit**. With N pods, the effective cluster-wide limit is N × `RATE_LIMIT_PER_SECOND`. For cluster-wide rate limiting, use a custom limiter (see below) or your load balancer.
+
+When a request exceeds the rate limit, the interceptor returns a `ResourceExhausted` gRPC status code.
+
+### Custom per-API rate limiter
+
+For different rate limits per API method, implement the `ratelimit.Limiter` interface and register it during initialization:
+
+```go
+import (
+    "context"
+    "fmt"
+
+    "github.com/go-coldbrew/interceptors"
+    "golang.org/x/time/rate"
+    "google.golang.org/grpc"
+)
+
+type perMethodLimiter struct {
+    limiters map[string]*rate.Limiter
+    fallback *rate.Limiter
+}
+
+func (l *perMethodLimiter) Limit(ctx context.Context) error {
+    method, _ := grpc.Method(ctx)
+    limiter, ok := l.limiters[method]
+    if !ok {
+        limiter = l.fallback
+    }
+    if !limiter.Allow() {
+        return fmt.Errorf("rate limit exceeded for %s", method)
+    }
+    return nil
+}
+
+func init() {
+    interceptors.SetRateLimiter(&perMethodLimiter{
+        limiters: map[string]*rate.Limiter{
+            "/myservice.v1.UserService/CreateUser": rate.NewLimiter(10, 5),   // 10 rps
+            "/myservice.v1.UserService/ListUsers":  rate.NewLimiter(100, 50), // 100 rps
+        },
+        fallback: rate.NewLimiter(50, 25), // 50 rps default
+    })
+}
+```
+
+For distributed rate limiting (e.g., across pods or per-tenant), implement the same interface with a Redis-backed limiter.
+
+### Disabling
+
+Set `DISABLE_RATE_LIMIT=true` to remove the rate limiting interceptor from the chain entirely.
+
 ## Adding custom interceptors to Default interceptors
 
 You can add your own interceptors to the [Default Interceptors] by appending to the list of interceptors.

diff --git a/howto/production.md b/howto/production.md
@@ -473,6 +473,11 @@ env:
   # Never use debug level on public services — may log request payloads
   - name: LOG_LEVEL
     value: "info"
+  # Rate limit incoming requests (per-pod). Adjust to your service's capacity.
+  - name: RATE_LIMIT_PER_SECOND
+    value: "1000"
+  - name: RATE_LIMIT_BURST
+    value: "50"
   # GRPC_MAX_SEND_MSG_SIZE limits response size FROM your service (default ~2GB).
   # GRPC_MAX_RECV_MSG_SIZE limits request size TO your service (default 4MB).
   # Consider reducing send size for public APIs; use streaming for large payloads.
@@ -531,7 +536,7 @@ These are your responsibility to handle at the infrastructure level:
 
 - **CORS** — ColdBrew does not handle CORS headers. Use a reverse proxy (Nginx, Envoy, Istio) or add CORS middleware to the HTTP gateway.
 - **Authentication/authorization** — Admin endpoints (`/debug/pprof`, `/metrics`, `/swagger`) have no built-in auth. Disable them for public services or restrict access at the load balancer.
-- **Rate limiting** — No built-in rate limiting on any endpoint. Use your load balancer, service mesh, or a rate-limiting proxy.
+- **Cluster-wide rate limiting** — Built-in rate limiting (`RATE_LIMIT_PER_SECOND`) is per-pod only. For cluster-wide or per-tenant rate limiting, use `interceptors.SetRateLimiter()` with a custom implementation or your load balancer. See [Interceptors How-To](/howto/interceptors#rate-limiting).
 - **HTTP header forwarding** — `HTTP_HEADER_PREFIXES` forwards matching HTTP headers to gRPC metadata. Never add `authorization`, `cookie`, or `x-api-key` prefixes unless you are intentionally doing header-based gRPC auth.
 
 ## Production checklist
@@ -557,6 +562,7 @@ These are your responsibility to handle at the infrastructure level:
 - [ ] `DISABLE_SWAGGER=true` — disable API documentation
 - [ ] `DISABLE_GRPC_REFLECTION=true` — disable service discovery
 - [ ] `DISABLE_DEBUG_LOG_INTERCEPTOR=true` — disable header-based debug logging
+- [ ] Enable rate limiting — `RATE_LIMIT_PER_SECOND` (per-pod, adjust to capacity)
 - [ ] Consider reducing `GRPC_MAX_SEND_MSG_SIZE` from its ~2GB default if responses are small
 - [ ] Restrict `/metrics` access at the load balancer
 - [ ] `LOG_LEVEL=info` or higher (never `debug`)