Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Index.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ A Kubernetes-native Go microservice framework for building production-grade gRPC
| **Distributed Tracing** | [OpenTelemetry] and [New Relic] support with automatic span creation in interceptors — traces can be sent to any OTLP-compatible backend including [Jaeger] |
| **Prometheus Metrics** | Built-in request latency, error rate, and circuit breaker metrics at `/metrics` |
| **Error Tracking** | Stack traces, gRPC status codes, and async notification to [Sentry], Rollbar, or Airbrake |
| **Resilience** | Client-side circuit breaking and retries via interceptors |
| **Rate Limiting** | Per-pod token bucket rate limiter — disabled by default, pluggable via custom [`ratelimit.Limiter`](https://pkg.go.dev/github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/ratelimit#Limiter) interface for distributed or per-tenant rate limiting. Config: `RATE_LIMIT_PER_SECOND`. See [interceptors howto](/howto/interceptors#rate-limiting) |
Comment thread
ankurs marked this conversation as resolved.
| **Fast Serialization** | [vtprotobuf] codec enabled by default — faster gRPC marshalling with automatic fallback to standard protobuf |
| **Kubernetes-native** | Health/ready probes, graceful SIGTERM shutdown, structured JSON logs, Prometheus metrics — all wired automatically |
| **Swagger / OpenAPI** | Interactive API docs auto-served at `/swagger/` from your protobuf definitions |
Expand Down
34 changes: 20 additions & 14 deletions architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,13 +152,16 @@ When a request arrives at a ColdBrew service, it flows through several layers:
│ ┌──────────────────────────────────────────┐ │
│ │ Server Interceptor Chain │ │
│ │ │ │
│ │ 1. Response Time Logging │ │
│ │ 2. Trace ID Injection │ │
│ │ 3. Proto Validate │ │
│ │ 4. Prometheus Metrics │ │
│ │ 5. Error Notification (Sentry/Rollbar) │ │
│ │ 6. New Relic Transaction │ │
│ │ 7. Panic Recovery │ │
│ │ 1. Default Timeout (60s deadline) │ │
│ │ 2. Rate Limiting (token bucket) │ │
│ │ 3. Response Time Logging │ │
│ │ 4. Trace ID Injection │ │
│ │ 5. Debug Log (per-request level) │ │
│ │ 6. Proto Validate │ │
│ │ 7. Prometheus Metrics │ │
│ │ 8. Error Notification (Sentry/Rollbar) │ │
│ │ 9. New Relic Transaction │ │
│ │ 10. Panic Recovery │ │
│ │ (OTEL tracing via gRPC stats handler) │ │
│ │ │ │
│ └────────────────────┬─────────────────────┘ │
Expand Down Expand Up @@ -199,13 +202,16 @@ Interceptors are gRPC middleware that run on every request. ColdBrew chains them

| Order | Interceptor | Package | What It Does |
|-------|------------|---------|--------------|
| 1 | Response Time Logging | `interceptors` | Logs method name, duration, and status code |
| 2 | Trace ID | `interceptors` | Generates a trace ID (or reads it from the `x-trace-id` HTTP header or a `trace_id` proto field) and propagates it to structured logs, Sentry/Rollbar error reports, and OpenTelemetry spans (as the `coldbrew.trace_id` attribute) |
| 3 | Proto Validate | `interceptors` | Validates incoming messages using [protovalidate](https://github.com/bufbuild/protovalidate) annotations. Returns `InvalidArgument` on failure. Disable with `DISABLE_PROTO_VALIDATE` |
| 4 | Prometheus | `interceptors` | Records request count, latency histogram, and status codes |
| 5 | Error Notification | `interceptors` | Sends errors to Sentry/Rollbar/Airbrake asynchronously |
| 6 | New Relic | `interceptors` | Creates a New Relic transaction for APM |
| 7 | Panic Recovery | `interceptors` | Catches panics and converts them to gRPC errors |
| 1 | Default Timeout | `interceptors` | Applies a 60s deadline to unary RPCs without one. Prevents resource exhaustion from clients that don't set deadlines. Config: `GRPC_SERVER_DEFAULT_TIMEOUT_IN_SECONDS` |
| 2 | Rate Limiting | `interceptors` | Per-pod token bucket rate limiter. Returns `ResourceExhausted` when exceeded. Disabled by default. Config: `RATE_LIMIT_PER_SECOND`, `RATE_LIMIT_BURST` |
| 3 | Response Time Logging | `interceptors` | Logs method name, duration, and status code |
| 4 | Trace ID | `interceptors` | Generates a trace ID (or reads it from the `x-trace-id` HTTP header or a `trace_id` proto field) and propagates it to structured logs, Sentry/Rollbar error reports, and OpenTelemetry spans (as the `coldbrew.trace_id` attribute) |
| 5 | Debug Log | `interceptors` | Enables per-request log level override via `bool debug` or `bool enable_debug` proto field, or `x-debug-log-level` metadata header. Config: `DISABLE_DEBUG_LOG_INTERCEPTOR`, `DEBUG_LOG_HEADER_NAME` |
| 6 | Proto Validate | `interceptors` | Validates incoming messages using [protovalidate](https://github.com/bufbuild/protovalidate) annotations. Returns `InvalidArgument` on failure. Config: `DISABLE_PROTO_VALIDATE` |
| 7 | Prometheus | `interceptors` | Records request count, latency histogram, and status codes |
| 8 | Error Notification | `interceptors` | Sends errors to Sentry/Rollbar/Airbrake asynchronously |
| 9 | New Relic | `interceptors` | Creates a New Relic transaction for APM |
| 10 | Panic Recovery | `interceptors` | Catches panics and converts them to gRPC errors |

{: .note }
OpenTelemetry tracing spans are created by the `otelgrpc` stats handler configured at the gRPC server/client level, not as an interceptor in the chain.
Expand Down
3 changes: 3 additions & 0 deletions config-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@ cfg := config.GetColdBrewConfig()
| `DISABLE_DEBUG_LOG_INTERCEPTOR` | bool | `false` | Disable the DebugLogInterceptor. When disabled, proto `debug`/`enable_debug` fields and `x-debug-log-level` headers will not trigger per-request debug logging |
| `DEBUG_LOG_HEADER_NAME` | string | `x-debug-log-level` | gRPC metadata / HTTP header name for per-request debug logging. The header value should be a valid log level (`debug`, `info`, `warn`, `error`). See [Log How-To](/howto/Log/#production-debugging-with-overrideloglevel--trace-id) |
| `GRPC_SERVER_DEFAULT_TIMEOUT_IN_SECONDS` | int | `60` | Default timeout for incoming unary gRPC requests without a deadline. Set to `0` to disable. Does not apply to stream RPCs |
| `RATE_LIMIT_PER_SECOND` | float64 | `0` | Maximum incoming requests per second for this pod (per-pod in-memory token bucket). Set to `0` to disable (default). With N pods, effective cluster-wide limit is N × this value. For distributed rate limiting, use `interceptors.SetRateLimiter()` with a custom implementation |
| `RATE_LIMIT_BURST` | int | `1` | Maximum burst size for the token bucket rate limiter. Only takes effect when `RATE_LIMIT_PER_SECOND > 0` |
| `DISABLE_RATE_LIMIT` | bool | `false` | Disable the rate limiting interceptor entirely |

## gRPC TLS

Expand Down
90 changes: 90 additions & 0 deletions howto/interceptors.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,96 @@ func init() {

Set `DISABLE_PROTO_VALIDATE=true` to skip validation entirely.

## Rate limiting

ColdBrew includes a built-in per-pod token bucket rate limiter. It is **disabled by default** and must be explicitly enabled.

### Enabling via environment variables

```yaml
env:
- name: RATE_LIMIT_PER_SECOND
value: "100" # 100 requests per second per pod
- name: RATE_LIMIT_BURST
value: "50" # allow bursts up to 50
```

{: .important }
This is a **per-pod in-memory limit**. With N pods, the effective cluster-wide limit is N × `RATE_LIMIT_PER_SECOND`. For cluster-wide rate limiting, use a custom limiter (see below) or your load balancer.

When a request exceeds the rate limit, the interceptor returns a `ResourceExhausted` gRPC status code.

### Custom per-API rate limiter

For different rate limits per API method, implement the [`ratelimit.Limiter`](https://pkg.go.dev/github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/ratelimit#Limiter) interface and register it during initialization:

```go
Comment thread
ankurs marked this conversation as resolved.
import (
"context"
"fmt"

"github.com/go-coldbrew/interceptors"
ratelimit "github.com/grpc-ecosystem/go-grpc-middleware/v2/interceptors/ratelimit"
"github.com/grpc-ecosystem/grpc-gateway/v2/runtime"
"golang.org/x/time/rate"
"google.golang.org/grpc"
)

// Compile-time check that perMethodLimiter implements the interface.
var _ ratelimit.Limiter = (*perMethodLimiter)(nil)

type perMethodLimiter struct {
limiters map[string]*rate.Limiter
fallback *rate.Limiter
}

func (l *perMethodLimiter) Limit(ctx context.Context) error {
// grpc.Method works for native gRPC calls;
// runtime.RPCMethod works for HTTP→gRPC via grpc-gateway
method, ok := grpc.Method(ctx)
if !ok {
method, ok = runtime.RPCMethod(ctx)
}
if !ok {
method = "unknown"
}
limiter, found := l.limiters[method]
if !found {
limiter = l.fallback
}
if !limiter.Allow() {
return fmt.Errorf("rate limit exceeded for %s", method)
}
return nil
}

func init() {
interceptors.SetRateLimiter(&perMethodLimiter{
limiters: map[string]*rate.Limiter{
"/myservice.v1.UserService/CreateUser": rate.NewLimiter(10, 5), // 10 rps
"/myservice.v1.UserService/ListUsers": rate.NewLimiter(100, 50), // 100 rps
},
fallback: rate.NewLimiter(50, 25), // 50 rps default
})
}
```
Comment thread
ankurs marked this conversation as resolved.

### Distributed rate limiting

For rate limiting across pods or per-tenant, implement `ratelimit.Limiter` with a distributed backend. Libraries that work well with ColdBrew's limiter interface:

| Library | Backend | Notes |
|---------|---------|-------|
| [mennanov/limiters](https://github.com/mennanov/limiters) | Redis, etcd, DynamoDB, memory | Most flexible — has explicit gRPC example, multiple algorithms |
| [go-redis/redis_rate](https://github.com/go-redis/redis_rate) | Redis | GCRA algorithm, good if you already use go-redis (last release 2023 — check for activity) |
| [sethvargo/go-limiter](https://github.com/sethvargo/go-limiter) | Redis, memory | Clean API, actively maintained |

For large-scale multi-service rate limiting, consider a dedicated rate limiting service like [gubernator](https://github.com/gubernator-io/gubernator) (peer-to-peer, no Redis) or [Envoy ratelimit](https://github.com/envoyproxy/ratelimit) (Redis-backed).

Comment thread
coderabbitai[bot] marked this conversation as resolved.
### Disabling

Set `DISABLE_RATE_LIMIT=true` to remove the rate limiting interceptor from the chain entirely.

## Adding custom interceptors to Default interceptors

You can add your own interceptors to the [Default Interceptors] by appending to the list of interceptors.
Expand Down
8 changes: 7 additions & 1 deletion howto/production.md
Original file line number Diff line number Diff line change
Expand Up @@ -473,6 +473,11 @@ env:
# Never use debug level on public services — may log request payloads
- name: LOG_LEVEL
value: "info"
# Rate limit incoming requests (per-pod). Adjust to your service's capacity.
- name: RATE_LIMIT_PER_SECOND
value: "1000"
- name: RATE_LIMIT_BURST
value: "50"
# GRPC_MAX_SEND_MSG_SIZE limits response size FROM your service (default ~2GB).
# GRPC_MAX_RECV_MSG_SIZE limits request size TO your service (default 4MB).
# Consider reducing send size for public APIs; use streaming for large payloads.
Expand Down Expand Up @@ -531,7 +536,7 @@ These are your responsibility to handle at the infrastructure level:

- **CORS** — ColdBrew does not handle CORS headers. Use a reverse proxy (Nginx, Envoy, Istio) or add CORS middleware to the HTTP gateway.
- **Authentication/authorization** — Admin endpoints (`/debug/pprof`, `/metrics`, `/swagger`) have no built-in auth. Disable them for public services or restrict access at the load balancer.
- **Rate limiting** — No built-in rate limiting on any endpoint. Use your load balancer, service mesh, or a rate-limiting proxy.
- **Cluster-wide rate limiting** — Built-in rate limiting (`RATE_LIMIT_PER_SECOND`) is per-pod only. For cluster-wide or per-tenant rate limiting, use `interceptors.SetRateLimiter()` with a custom implementation or your load balancer. See [Interceptors How-To](/howto/interceptors#rate-limiting).
- **HTTP header forwarding** — `HTTP_HEADER_PREFIXES` forwards matching HTTP headers to gRPC metadata. Never add `authorization`, `cookie`, or `x-api-key` prefixes unless you are intentionally doing header-based gRPC auth.

## Production checklist
Expand All @@ -557,6 +562,7 @@ These are your responsibility to handle at the infrastructure level:
- [ ] `DISABLE_SWAGGER=true` — disable API documentation
- [ ] `DISABLE_GRPC_REFLECTION=true` — disable service discovery
- [ ] `DISABLE_DEBUG_LOG_INTERCEPTOR=true` — disable header-based debug logging
- [ ] Enable rate limiting — `RATE_LIMIT_PER_SECOND` + `RATE_LIMIT_BURST` (per-pod, adjust to capacity). See [interceptors howto](/howto/interceptors#rate-limiting)
- [ ] Consider reducing `GRPC_MAX_SEND_MSG_SIZE` from its ~2GB default if responses are small
- [ ] Restrict `/metrics` access at the load balancer
- [ ] `LOG_LEVEL=info` or higher (never `debug`)
Expand Down
Loading