Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,11 +65,12 @@ See the [Tracing How-To](/howto/Tracing/#trace-id-propagation) for details.

## How do I migrate from OpenTracing to OpenTelemetry?

The `tracing` package supports both. To switch:
The OpenTracing bridge has been removed. ColdBrew now uses OpenTelemetry natively:

1. Update your tracing initialization to use OpenTelemetry's SDK
2. The `tracing.NewInternalSpan()`, `tracing.NewDatastoreSpan()`, and `tracing.NewExternalSpan()` functions work with both backends
3. See the [Tracing How-To](/howto/Tracing/) and [Integrations](/integrations) guides for setup details
1. Remove any direct `opentracing.GlobalTracer()` calls — use `otel.Tracer("my-service")` instead
2. The `tracing.NewInternalSpan()`, `tracing.NewDatastoreSpan()`, and `tracing.NewExternalSpan()` functions use OpenTelemetry natively
3. If you had `OTLP_USE_OPENTRACING_BRIDGE=true`, remove it — the setting is now ignored (a warning is logged if set to `true`)
4. See the [Tracing How-To](/howto/Tracing/) and [Integrations](/integrations) guides for setup details

## What is vtprotobuf and why does ColdBrew use it?

Expand Down
7 changes: 5 additions & 2 deletions config-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,10 @@ When `OTLP_ENDPOINT` is set, it takes precedence over New Relic OpenTelemetry co
| `OTLP_COMPRESSION` | string | `gzip` | Compression type: `gzip` or `none` |
| `OTLP_INSECURE` | bool | `false` | Disable TLS for OTLP connection (development only) |
| `OTLP_SAMPLING_RATIO` | float64 | `0.1` | Trace sampling ratio (0.0–1.0, where 1.0 = sample all) |
| `OTLP_USE_OPENTRACING_BRIDGE` | bool | `false` | **Deprecated.** Enable legacy OpenTracing bridge — only needed for services with unmigrated OpenTracing instrumentation |
| `OTLP_USE_OPENTRACING_BRIDGE` | bool | `false` | **Deprecated.** Ignored — OpenTracing bridge has been removed. If set to `true`, a warning is logged at startup |
| `OTEL_USE_LEGACY_INSTRUMENTATION` | bool | `false` | Revert to legacy `otelgrpc`-based gRPC OpenTelemetry instrumentation. Set to `true` only for rollback |
| `ENABLE_OTEL_METRICS` | bool | `false` | Enable OpenTelemetry metrics export via OTLP alongside Prometheus. Does not replace Prometheus |
| `OTEL_METRICS_INTERVAL` | int | `60` | Export interval in seconds for OTEL metrics (only applies when `ENABLE_OTEL_METRICS=true`) |

## Error Tracking

Expand Down Expand Up @@ -150,7 +153,7 @@ When `OTLP_ENDPOINT` is set, it takes precedence over New Relic OpenTelemetry co
|----------|------------|-------|
| `HTTP_HEADER_PREFIX` | `HTTP_HEADER_PREFIXES` | Single prefix replaced by comma-separated list |
| `DISABLE_PORMETHEUS` | `DISABLE_PROMETHEUS` | Typo variant — both work, use the correct spelling |
| `OTLP_USE_OPENTRACING_BRIDGE` | Remove | Legacy OpenTracing bridge — remove once all instrumentation uses OpenTelemetry |
| `OTLP_USE_OPENTRACING_BRIDGE` | Remove | OpenTracing bridge has been removed — this field is now ignored (logs a warning if set to `true`) |

---

Expand Down
4 changes: 2 additions & 2 deletions howto/Debugging.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Here's what a typical ColdBrew CPU profile looks like under load (Apple M1 Pro,
| Go runtime (scheduling, GC) | ~15% | Goroutine scheduling, garbage collection |
| `TraceIdInterceptor` | ~6% | Trace ID extraction and propagation |
| `errors/notifier.SetTraceIdWithValue` | ~5% | Setting trace ID on error notifier context |
| `otelgrpc.TagRPC` | ~1% | OpenTelemetry span creation |
| gRPC OpenTelemetry stats handler | ~1% | OpenTelemetry span creation |
| Prometheus metrics | ~1% | Histogram bucket recording |
Comment thread
ankurs marked this conversation as resolved.

{: .important }
Expand All @@ -79,7 +79,7 @@ curl -s "http://localhost:9091/debug/pprof/heap?debug=0" -o heap.prof
go tool pprof -alloc_objects -top heap.prof
```

Top allocation sources under load are gRPC metadata copying (~27%), otelgrpc span creation (~13%), and options context store (~10%). These are largely inherent to gRPC's per-request metadata model.
Top allocation sources under load are gRPC metadata copying (~27%), OpenTelemetry span creation (~13%), and options context store (~10%). These are largely inherent to gRPC's per-request metadata model.

### Analyzing profiles

Expand Down
45 changes: 44 additions & 1 deletion howto/Metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ layout: default
title: "Metrics"
parent: "How To"
nav_order: 6
description: "Prometheus metrics and custom metrics in ColdBrew: default runtime metrics, custom counters and histograms, and Hystrix circuit breaker monitoring"
description: "Prometheus and OpenTelemetry metrics in ColdBrew: default runtime metrics, OTLP export, custom counters and histograms, and Hystrix circuit breaker monitoring"
---
## Table of contents
{: .no_toc .text-delta }
Expand Down Expand Up @@ -52,6 +52,49 @@ These metrics will be automatically collected and exposed by ColdBrew on the `/m
{: .note .note-info }
To learn more about the Prometheus and the data types it supports, see [here](https://prometheus.io/docs/concepts/metric_types/)

## OpenTelemetry Metrics (OTLP Export)

In addition to Prometheus, ColdBrew can export gRPC metrics via OpenTelemetry's OTLP protocol. This is useful when your observability stack uses an OTLP-compatible backend (Grafana Cloud, Datadog, Honeycomb, etc.) and you want metrics alongside traces in the same pipeline.

{: .important }
OTEL metrics export is **opt-in** and runs **alongside** Prometheus — it does not replace the `/metrics` endpoint. Both can be active at the same time.

### Enabling OTEL Metrics

Set the following environment variables:

```bash
export ENABLE_OTEL_METRICS=true
export OTEL_METRICS_INTERVAL=60 # export interval in seconds (default: 60)
export OTLP_ENDPOINT=localhost:4317 # same endpoint used for traces
```

When enabled, ColdBrew exports standard [gRPC OpenTelemetry metrics](https://grpc.io/docs/guides/opentelemetry-metrics/) via the native `grpc/stats/opentelemetry` package:

| Metric | Type | Description |
|--------|------|-------------|
| `grpc.server.call.started` | Counter | Server RPCs started |
| `grpc.server.call.duration` | Histogram | Server RPC duration |
| `grpc.server.call.sent_total_compressed_message_size` | Histogram | Server response size |
| `grpc.server.call.rcvd_total_compressed_message_size` | Histogram | Server request size |
| `grpc.client.call.duration` | Histogram | Client RPC duration |
| `grpc.client.attempt.started` | Counter | Client RPC attempts |

{: .note }
Health check, readiness, and server reflection RPCs are bucketed under a generic `"other"` method label to reduce cardinality — they still generate data points but won't create high-cardinality method attributes.

### How it relates to Prometheus

| Aspect | Prometheus (`/metrics`) | OTEL Metrics (OTLP) |
|--------|------------------------|---------------------|
| Protocol | Pull (scrape) | Push (OTLP gRPC) |
| Metric names | `grpc_server_handled_total`, etc. | `grpc.server.call.duration`, etc. |
| Custom app metrics | `promauto.NewCounter(...)` | Not exported (Prometheus only) |
| Enabled by default | Yes | No (`ENABLE_OTEL_METRICS=true`) |
| Endpoint config | None (built-in) | `OTLP_ENDPOINT` (shared with traces) |

Both export pipelines use independent metric names and registries, so there is no conflict or double-counting.

## How to use Hystrix Metrics in Prometheus

{: .warning }
Expand Down
16 changes: 16 additions & 0 deletions howto/production.md
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,22 @@ env:
value: "0.2"
```

### OTEL metrics (alongside Prometheus)

To export gRPC metrics via OTLP alongside Prometheus scraping, enable OTEL metrics on the same endpoint used for tracing:

```yaml
env:
- name: ENABLE_OTEL_METRICS
value: "true"
- name: OTEL_METRICS_INTERVAL
value: "60" # seconds between OTLP metric exports
# OTLP_ENDPOINT is already set for tracing above
```

{: .note }
This does not replace Prometheus — both `/metrics` scraping and OTLP push run in parallel. See the [Metrics How-To](/howto/Metrics/#opentelemetry-metrics-otlp-export) for details on exported metric names.

### What gets traced

ColdBrew automatically creates spans for:
Expand Down
56 changes: 36 additions & 20 deletions integrations.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,25 @@ If you are using ColdBrew packages in your app, you need to initialise Prometheu
ColdBrew uses the [prometheus/client_golang] package to collect metrics. To see how to use it check out the [metrics documentation].


## OpenTelemetry Metrics

ColdBrew can export gRPC metrics via OTLP alongside Prometheus. This uses the native `grpc/stats/opentelemetry` package and shares the same OTLP endpoint as tracing.

### Configuring

Set the following environment variables as defined in [Config]:
- `ENABLE_OTEL_METRICS`: Set to `true` to enable OTLP metrics export
- `OTEL_METRICS_INTERVAL`: Export interval in seconds (default: `60`)
- `OTLP_ENDPOINT`: OTLP gRPC endpoint (shared with tracing)

### Using

OTEL metrics are exported automatically when enabled — no code changes required. Standard gRPC server/client metrics (`grpc.server.call.duration`, `grpc.client.call.duration`, etc.) are exported via OTLP.

Custom application metrics registered with `promauto` are **not** exported via OTLP — they remain Prometheus-only. To export custom metrics via OTLP, use the [OpenTelemetry Go SDK](https://opentelemetry.io/docs/languages/go/) directly with the global MeterProvider (available via `otel.GetMeterProvider()` or `core.OTELMeterProvider()`).

See the [Metrics How-To](/howto/Metrics/) for details on which metrics are exported and how OTEL metrics relate to Prometheus.

## Sentry

[Sentry] is an error tracking tool that helps to monitor and fix crashes in real time. It collects data about the errors and displays it in a dashboard. It also provides alerts when the service is not performing well.
Expand Down Expand Up @@ -156,14 +175,13 @@ To configure generic OpenTelemetry, you can use the [OTLPConfig] struct:

```go
type OTLPConfig struct {
Endpoint string // OTLP gRPC endpoint (e.g., "localhost:4317")
Headers map[string]string // Custom headers (e.g., API keys)
ServiceName string // Name of your service
ServiceVersion string // Version of your service
SamplingRatio float64 // Sampling ratio (0.0 to 1.0)
Compression string // "gzip" or "none"
UseOpenTracingBridge bool // Deprecated: enable legacy OpenTracing bridge
Insecure bool // Disable TLS (for local development)
Endpoint string // OTLP gRPC endpoint (e.g., "localhost:4317")
Headers map[string]string // Custom headers (e.g., API keys)
ServiceName string // Name of your service
ServiceVersion string // Version of your service
SamplingRatio float64 // Sampling ratio (0.0 to 1.0)
Compression string // "gzip" or "none"
Insecure bool // Disable TLS (for local development)
}
```

Expand All @@ -174,12 +192,11 @@ import "github.com/go-coldbrew/core"

func main() {
config := core.OTLPConfig{
Endpoint: "localhost:4317",
ServiceName: "my-service",
ServiceVersion: "v1.0.0",
SamplingRatio: 0.1,
// UseOpenTracingBridge: true, // only needed for legacy OpenTracing code
Insecure: true, // for local development
Endpoint: "localhost:4317",
ServiceName: "my-service",
ServiceVersion: "v1.0.0",
SamplingRatio: 0.1,
Insecure: true, // for local development
}
err := core.SetupOpenTelemetry(config)
if err != nil {
Expand All @@ -201,12 +218,11 @@ import "github.com/go-coldbrew/core"

func main() {
config := core.OTLPConfig{
Endpoint: "localhost:4317", // Jaeger OTLP endpoint
ServiceName: "my-service",
ServiceVersion: "v1.0.0",
SamplingRatio: 0.1,
// UseOpenTracingBridge: true, // only needed for legacy OpenTracing code
Insecure: true,
Endpoint: "localhost:4317", // Jaeger OTLP endpoint
ServiceName: "my-service",
ServiceVersion: "v1.0.0",
SamplingRatio: 0.1,
Insecure: true,
}
err := core.SetupOpenTelemetry(config)
if err != nil {
Expand Down
8 changes: 5 additions & 3 deletions tests/links.spec.ts
Original file line number Diff line number Diff line change
Expand Up @@ -82,10 +82,12 @@ test.describe("External Links (sample)", () => {

for (const url of pkgLinks) {
const response = await request.get(url);
// Accept 429 (rate limited) — the URL exists, the server is just throttling CI runners.
const status = response.status();
expect(
response.status(),
`${url} returned ${response.status()}`
).toBeLessThan(400);
status < 400 || status === 429,
`${url} returned ${status}`
).toBeTruthy();
}
});

Expand Down
Loading