diff --git a/Index.md b/Index.md index 9db8b26..4768187 100644 --- a/Index.md +++ b/Index.md @@ -72,6 +72,8 @@ Your service starts with all of these endpoints ready: | `localhost:9091/swagger/` | Swagger UI | | `localhost:9091/debug/pprof/` | Go pprof profiling | +> **Tip:** Set `ADMIN_PORT` to serve metrics, profiling, and swagger on a dedicated port for [security isolation](/howto/production/#security-hardening). + ## Define Once, Get Everything Your API is defined once in protobuf — ColdBrew generates everything else: diff --git a/architecture.md b/architecture.md index 2ac0a1d..963945c 100644 --- a/architecture.md +++ b/architecture.md @@ -23,7 +23,7 @@ ColdBrew follows [12-factor app](https://12factor.net/) methodology and is desig | 12-Factor Principle | How ColdBrew Implements It | |--------------------|-----------------------------| | **Config** | All configuration via environment variables ([envconfig](https://github.com/kelseyhightower/envconfig)) — no config files, no YAML. See [Configuration Reference](/config-reference) | -| **Port binding** | Self-contained HTTP (`:9091`) and gRPC (`:9090`) servers, no external app server needed | +| **Port binding** | Self-contained HTTP (`:9091`) and gRPC (`:9090`) servers, optional dedicated admin port (`ADMIN_PORT`) for endpoint isolation | | **Logs** | Structured JSON to stdout by default — ready for any log aggregator (Fluentd, Loki, CloudWatch) | | **Disposability** | Graceful SIGTERM handling with configurable drain periods. See [Signals](/howto/signals) | | **Dev/prod parity** | Same binary, same config mechanism, same observability in every environment | @@ -55,6 +55,8 @@ Each output maps to a self-documenting endpoint: | Metrics | `:9091/metrics` | Prometheus self-describing exposition format with HELP lines | | Profiling | `:9091/debug/pprof/` | Standard Go pprof index page | +> **Tip:** Set `ADMIN_PORT` to serve metrics, profiling, and swagger on a dedicated port. Health and readiness endpoints remain on `:9091`. See [Security hardening](/howto/production/#security-hardening). + **Every client gets documentation for free:** - **gRPC clients** use server reflection to discover services and methods without proto files - **REST clients** use the interactive Swagger UI or import the OpenAPI spec @@ -173,9 +175,11 @@ When a request arrives at a ColdBrew service, it flows through several layers: │ └─────────────────┘ │ │ │ │ Built-in Endpoints: │ - │ /metrics - Prometheus │ │ /healthcheck - Liveness probe │ │ /readycheck - Readiness probe │ + │ │ + │ Admin Endpoints (movable to ADMIN_PORT): │ + │ /metrics - Prometheus │ │ /debug/pprof/ - Go profiling │ │ /swagger/ - OpenAPI docs │ └─────────────────────────────────────────────────┘ @@ -324,7 +328,7 @@ ColdBrew is designed for Kubernetes deployments: - **Readiness probe:** `GET /readycheck` — returns the same version JSON when ready for traffic, or an error if the service hasn't called `SetReady()` yet - **gRPC health protocol:** Implements `grpc.health.v1.Health` ([standard gRPC health checking](https://github.com/grpc/grpc/blob/master/doc/health-checking.md)) on the gRPC port — used by gRPC load balancers, Envoy, Istio, and other service meshes for native health checking - **Graceful shutdown:** On SIGTERM, the service marks itself as not ready, drains in-flight requests, then exits cleanly -- **Metrics scraping:** Prometheus scrapes `/metrics` on the HTTP port +- **Metrics scraping:** Prometheus scrapes `/metrics` on the HTTP port (or `ADMIN_PORT` when configured) ### Gateway Performance Options diff --git a/config-reference.md b/config-reference.md index ba598dc..b412d19 100644 --- a/config-reference.md +++ b/config-reference.md @@ -31,6 +31,7 @@ cfg := config.GetColdBrewConfig() | `LISTEN_HOST` | string | `0.0.0.0` | Host address to listen on | | `GRPC_PORT` | int | `9090` | gRPC server port | | `HTTP_PORT` | int | `9091` | HTTP gateway port | +| `ADMIN_PORT` | int | `0` (disabled) | Dedicated port for admin endpoints (pprof, metrics, swagger). When set to a non-zero value, these endpoints are served on this port instead of `HTTP_PORT`, enabling network-level isolation via Kubernetes NetworkPolicy. See [Security hardening](/howto/production/#security-hardening) | | `APP_NAME` | string | `""` | Application name (used in logs, metrics, New Relic) | | `ENVIRONMENT` | string | `""` | Environment name (e.g., production, staging, development) | | `RELEASE_NAME` | string | `""` | Release/version name | diff --git a/howto/Debugging.md b/howto/Debugging.md index 1633099..78f1376 100644 --- a/howto/Debugging.md +++ b/howto/Debugging.md @@ -15,14 +15,18 @@ description: "Debugging ColdBrew services with pprof and log overrides" Golang provides a built-in profiler called [pprof](https://golang.org/pkg/net/http/pprof/). It is a tool that can be used to collect CPU and memory profiles. It can be used to collect profiles from a running application and then analyze them to find the root cause of performance issues. -ColdBrew exposes `/debug/pprof/` endpoint on the HTTP port that can be used to collect profiles. The endpoint is only available when the [configuration option] `DisableDebug` is set to `false` (which is the default behaviour). +ColdBrew exposes `/debug/pprof/` endpoint on the HTTP port that can be used to collect profiles. When `ADMIN_PORT` is configured, pprof is served on the admin port instead. The endpoint is only available when the [configuration option] `DisableDebug` is set to `false` (which is the default behaviour). ### Collecting profiles To collect a profile, you can use the `go tool pprof` command. For example, to collect a CPU profile, you can run the following command: ```bash +# Default (no ADMIN_PORT): $ go tool pprof http://localhost:9091/debug/pprof/profile + +# With ADMIN_PORT configured (e.g., ADMIN_PORT=9092): +$ go tool pprof http://localhost:9092/debug/pprof/profile ``` This will open an interactive shell where you can run commands to analyze the profile. For example, to see the top 10 functions that are consuming the most CPU, you can run the following command: diff --git a/howto/production.md b/howto/production.md index e467824..bb40a24 100644 --- a/howto/production.md +++ b/howto/production.md @@ -448,15 +448,68 @@ This section provides general security guidance for ColdBrew configuration. Alwa ColdBrew's defaults are tuned for **internal services** — debug endpoints, API docs, and gRPC reflection are enabled by default. Public-facing services need different settings. +### Dedicated admin port (recommended) + +The **preferred approach** is to serve admin endpoints (pprof, metrics, swagger) on a **separate port** using `ADMIN_PORT`. This keeps profiling and metrics available for operations while isolating them from external traffic via Kubernetes NetworkPolicy: + +```yaml +env: + # Serve admin endpoints on a dedicated internal port + - name: ADMIN_PORT + value: "9092" +``` + +When `ADMIN_PORT` is set: +- **Port 9090** (gRPC): gRPC server — expose as needed +- **Port 9091** (HTTP): gRPC-gateway + health/readiness probes — expose with path allowlisting +- **Admin port** (e.g., 9092): pprof, metrics, swagger — restrict via NetworkPolicy + +```yaml +# Kubernetes NetworkPolicy — restricts admin port (9092) to monitoring namespace +# while leaving app ports (9090/9091) open. Add further restrictions to +# 9090/9091 if you need to limit app traffic sources too. +apiVersion: networking.k8s.io/v1 +kind: NetworkPolicy +metadata: + name: restrict-admin-port +spec: + podSelector: + matchLabels: + app: my-service + policyTypes: + - Ingress + ingress: + # Allow app traffic (gRPC + HTTP gateway) from anywhere + - ports: + - port: 9090 + - port: 9091 + # Restrict admin port to monitoring namespace only + - from: + - namespaceSelector: + matchLabels: + kubernetes.io/metadata.name: monitoring + ports: + - port: 9092 +``` + +This approach is better than disabling endpoints entirely because: +- Prometheus can still scrape `/metrics` on the admin port +- Operations can still access pprof for production debugging +- No application-level auth needed — network isolation handles it + ### Public-facing services -Services exposed to external traffic (API gateways, user-facing endpoints) should whitelist only the API paths that need to be public and disable discovery and debug features: +For services exposed to external traffic where a separate admin port is not sufficient, disable discovery and debug features entirely: {: .important } -The most effective security measure is to **whitelist public API paths** at your load balancer or reverse proxy and block everything else. ColdBrew serves the HTTP gateway, debug, metrics, and swagger on the HTTP port (default 9091) and gRPC on a separate port (default 9090). Only your application's API routes (e.g., `/api/v1/*`) should be exposed externally — block `/debug/*`, `/metrics`, `/swagger/*`, and any other internal paths at the infrastructure level. +The most effective security measure is to **use `ADMIN_PORT`** to separate admin endpoints, or **whitelist public API paths** at your load balancer and block everything else. ColdBrew serves the HTTP gateway on the HTTP port (default 9091) and gRPC on a separate port (default 9090). When `ADMIN_PORT` is not set, admin endpoints (debug, metrics, swagger) share the HTTP port. Only your application's API routes (e.g., `/api/v1/*`) should be exposed externally. ```yaml env: + # Option 1 (preferred): Separate admin port + - name: ADMIN_PORT + value: "9092" + # Option 2: Disable admin endpoints entirely # Disable pprof — exposes CPU/memory profiling data - name: DISABLE_DEBUG value: "true" @@ -486,7 +539,7 @@ env: ``` {: .important } -The `/metrics` endpoint exposes request counts, latency distributions, and Go runtime stats. For public-facing services, restrict access to `/metrics` at the load balancer level (IP whitelist or path-based routing) rather than disabling Prometheus entirely. +The `/metrics` endpoint exposes request counts, latency distributions, and Go runtime stats. When using `ADMIN_PORT`, metrics are automatically served on the admin port only. Without `ADMIN_PORT`, restrict access to `/metrics` at the load balancer level (IP whitelist or path-based routing) rather than disabling Prometheus entirely. ### Internal services