Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions benchmarks/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
src/gen
src/gen-protobufjs
node_modules
dist
bench-results.json
214 changes: 214 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,214 @@
# protobuf-es Benchmark Suite

## Context

This directory contains microbenchmarks for the `@bufbuild/protobuf` serialization workloads. It addresses the measurement gap discussed in [#333](https://github.com/bufbuild/protobuf-es/issues/333) and [#1035](https://github.com/bufbuild/protobuf-es/issues/1035), where performance arguments have historically relied on ad-hoc user-provided numbers without a reproducible suite living alongside the library.

The OTLP-like fixture in `proto/nested.proto` is modelled on the real-world workload that produced [open-telemetry/opentelemetry-js#6221](https://github.com/open-telemetry/opentelemetry-js/issues/6221) (a ~13x serialization regression when protobuf-es was briefly adopted via the `fromJsonString + toBinary` path). Exercising the same message shape under controlled conditions makes that regression class observable against future protobuf-es versions.

## Running

```bash
# From the monorepo root (first time only)
npm ci
npx turbo run build --filter=@bufbuild/protobuf
npx turbo run generate --filter=@bufbuild/protobuf-benchmarks

# Run the full suite
npm run bench -w @bufbuild/protobuf-benchmarks

# Or run individual suites
npm run bench:create -w @bufbuild/protobuf-benchmarks
npm run bench:toBinary -w @bufbuild/protobuf-benchmarks
npm run bench:create-toBinary -w @bufbuild/protobuf-benchmarks
npm run bench:fromBinary -w @bufbuild/protobuf-benchmarks
npm run bench:fromJson-path -w @bufbuild/protobuf-benchmarks
npm run bench:comparison -w @bufbuild/protobuf-benchmarks
npm run bench:matrix -w @bufbuild/protobuf-benchmarks
npm run bench:memory -w @bufbuild/protobuf-benchmarks
```

## Benchmarks

| File | What it measures |
|------|------------------|
| `bench-create.ts` | Cost of `create(Schema, init)` in isolation — small flat message vs. full OTLP-like tree constructed via many nested `create()` calls |
| `bench-toBinary.ts` | Cost of `toBinary(Schema, message)` on pre-built messages — serialization-only, no allocation of the message graph |
| `bench-create-toBinary.ts` | Combined workload: build the message graph fresh each iteration, then serialize. This is the end-to-end shape of one OTLP export call |
| `bench-fromBinary.ts` | Cost of `fromBinary(Schema, bytes)` on pre-encoded payloads — reflective decoder walk in isolation |
| `bench-fromJson-path.ts` | `fromJsonString + toBinary` and `fromJson + toBinary` paths on the same fixture. The first one is the #6221 regression shape; the second is the partial-fix midpoint |
| `bench-comparison-protobufjs.ts` | Cross-library comparison: protobuf-es vs `protobufjs` (pbjs static codegen) on the same `.proto` fixture. Covers full roundtrip, encode-only, decode-only |
| `bench-matrix.ts` | `toBinary` + `fromBinary` across the full realistic-fixture matrix (OTel traces/metrics/logs, K8s Pod list, GraphQL request/response, RPC envelope, stress). Emits a JSON summary on stdout for CI diffing |
| `bench-memory.ts` | Heap allocations per operation (`heapUsed` delta after forced GC) for both libraries. Requires `--expose-gc` |

## Methodology

- Uses [tinybench](https://github.com/tinylibs/tinybench) for sampling, CI, and stats.
- 500 ms warmup, 2000 ms measurement per case.
- Node version taken from `.nvmrc` at repo root.
- Results are sensitive to host load. For tighter numbers pin to a single core:
```bash
taskset -c 0 npm run bench -w @bufbuild/protobuf-benchmarks
```

## Fixtures

The suite runs across a matrix of payload shapes so a regression can be
attributed to a class of workload rather than lumped into a single "encoder
is slower" result. All fixtures live under `proto/` and are built by
helpers in `src/fixtures.ts`.

| Fixture | `.proto` | Shape | Typical encoded size | Notes |
|---------|----------|-------|---------------------:|-------|
| `SimpleMessage` | `small.proto` | 3 scalar fields | ~19 B | per-call overhead baseline |
| `ExportTraceRequest` | `nested.proto` | OTel traces: 100 spans × 10 attrs, fixed64 timestamps, bytes IDs | ~35 KB | repro of [open-telemetry/opentelemetry-js#6221](https://github.com/open-telemetry/opentelemetry-js/issues/6221) |
| `ExportMetricsRequest` | `otel-metrics.proto` | OTel metrics: 50 series with Gauge/Sum/Histogram mix, explicit bucket bounds | ~17 KB | exercises the `oneof data` dispatch + repeated doubles/uint64s |
| `ExportLogsRequest` | `otel-logs.proto` | OTel logs: 100 LogRecords, severity, string body, trace/span IDs | ~21 KB | string-heavy with attribute maps |
| `K8sPodList` | `k8s-pod.proto` | 20 Pods with labels/annotations, 2 containers each, ports + env + resource limits | ~29 KB | map-dominant config payload |
| `GraphQLRequest` | `graphql.proto` | Long query string + JSON-encoded variables map | ~0.6 KB | mixes a large string with `map<string,bytes>` |
| `GraphQLResponse` | `graphql.proto` | JSON-encoded `data` + structured errors | ~1.4 KB | bytes + repeated messages with string paths |
| `RpcRequest` | `rpc-simple.proto` | Routing fields + header map + 256-byte payload | ~0.5 KB | baseline RPC envelope |
| `RpcResponse` | `rpc-simple.proto` | Status + header map + 512-byte payload | ~0.6 KB | baseline RPC response |
| `StressMessage` | `stress.proto` | Depth-8 self-nested + 200-wide int32/string/attr arrays + 4KB blob + every scalar type | ~13 KB | synthetic — surfaces per-scalar-type regressions |

### Design notes

- **Map-heavy vs. list-heavy.** Kubernetes payloads stress `map<string,string>`
encode paths; OTel payloads stress repeated nested messages. Both show up in
production consumers and the encoder walks differ.
- **Deep nesting.** The stress fixture recurses through `StressMessage.child`
eight levels deep. The encoder pays a length-prefix per level (fork buffer +
measure + prefix), so depth is a distinct failure mode from total size.
- **All scalar types exactly once.** `StressMessage` declares each proto3
scalar in a fixed slot so a regression specific to `sfixed64` or `sint32`
varint zig-zag is visible in this fixture but not in realistic ones.
- **GraphQL/RPC payloads use `bytes` for opaque data** rather than structured
sub-messages because real clients carry JSON-encoded variables and
opaque RPC payloads as bytes on the wire; the benchmark reflects that.

### Future work

- `bench-matrix` currently measures `toBinary` + `fromBinary` on pre-built
messages. A follow-up pass should add `create + toBinary` (full roundtrip)
and `fromJsonString + toBinary` paths across the matrix to catch
regressions in the JSON-input code paths that the existing
`bench-fromJson-path` only exercises on the OTLP traces fixture.
- GraphQL variables are currently modelled as `map<string,bytes>` with JSON
blobs per value. A richer fixture using a `google.protobuf.Struct`-like
shape would exercise the same code paths the `@bufbuild/protobuf/wkt`
`Value` type uses in real services.

## Report snapshot

Generated by `npm run bench:report -w @bufbuild/protobuf-benchmarks`. The
script writes `bench-results.json`, `chart.svg`, and the table below. See
`src/report.ts` for the generator and `src/report-helpers.ts` for the
rendering helpers.

![chart](./chart.svg)

<!--BENCHMARK_TABLE_START-->

| Fixture | Bytes | toBinary | toBinaryFast | protobufjs | Best |
| ---------------------------------- | -----: | -------: | -----------: | ---------: | -------------------- |
| SimpleMessage | 19 | 1.06M | 1.23M | - | toBinaryFast (1.16x) |
| ExportTraceRequest (100 spans) | 32,926 | 1,778 | 3,218 | 2,676 | toBinaryFast (1.20x) |
| ExportMetricsRequest (50 series) | 17,696 | 3,316 | 6,429 | - | toBinaryFast (1.94x) |
| ExportLogsRequest (100 records) | 21,319 | 3,257 | 6,032 | - | toBinaryFast (1.85x) |
| K8sPodList (20 pods) | 28,900 | 3,458 | 5,444 | - | toBinaryFast (1.57x) |
| GraphQLRequest | 624 | 256,431 | 312,765 | - | toBinaryFast (1.22x) |
| GraphQLResponse | 1,366 | 293,549 | 402,902 | - | toBinaryFast (1.37x) |
| RpcRequest | 501 | 379,472 | 455,529 | - | toBinaryFast (1.20x) |
| RpcResponse | 602 | 579,153 | 766,432 | - | toBinaryFast (1.32x) |
| StressMessage (depth=8, width=200) | 12,868 | 10,848 | 18,034 | - | toBinaryFast (1.66x) |

<!--BENCHMARK_TABLE_END-->

## Current results

First local run on Node v25.8.1, linux/x64 (non-isolated host, margins of error are realistic for an unpinned benchmark machine — numbers are directionally stable but will shift on quieter hardware). All ops/sec are medians from `tinybench.table()`.

### create() cost

| Case | ops/sec (median) | ± |
|------|------------------|---|
| `create() SimpleMessage (3 scalar fields)` | 2,123,142 | 16% |
| `create() ExportTraceRequest nested (100 spans, 10 attrs each)` | 3,674 | 3.4% |

### toBinary() cost on pre-built messages

| Case | ops/sec (median) | ± |
|------|------------------|---|
| `toBinary() SimpleMessage (pre-built)` | 690,608 | 18% |
| `toBinary() ExportTraceRequest (pre-built, 100 spans)` | 267 | 26% |

### create() + toBinary() combined workload

| Case | ops/sec (median) | ± |
|------|------------------|---|
| `create() + toBinary() SimpleMessage` | 402,091 | 42% |
| `create() + toBinary() ExportTraceRequest (100 spans, OTel-like)` | 285 | 19% |

### fromJson / fromJsonString + toBinary paths

| Case | ops/sec (median) | ± |
|------|------------------|---|
| `fromJsonString + toBinary (100 spans) — OTel #6221 shape` | 235 | 15% |
| `fromJson + toBinary (100 spans) — plainObject path` | 275 | 12% |

### fromBinary() parsing cost

| Case | ops/sec (median) | ± |
|------|------------------|---|
| `fromBinary() SimpleMessage (19 B)` | 1,663,894 | 0.02% |
| `fromBinary() ExportTraceRequest (100 spans, 35,283 B)` | 1,501 | 0.40% |

Parsing (decode) is materially faster than encoding on the same nested workload: ~1,500 ops/s decode vs ~600 ops/s encode. The encode walk pays varint length prefixing (writing length-delimited sub-messages requires allocating a fork buffer, encoding into it, then measuring its length to write the length prefix) that does not have a symmetric cost in the decode path.

### protobuf-es vs protobufjs (same `.proto`, same host)

Bench run on the OTLP-like 100-span fixture; protobufjs generated via `pbjs -t static-module -w commonjs --force-long`. Numbers below are medians from standalone runs of `bench:comparison` (host not isolated; margins of error on protobuf-es cases are wider than on protobufjs because each protobuf-es iteration takes longer, giving fewer samples in the 2000 ms measurement window).

| Workload | protobuf-es ops/s | protobufjs ops/s | Ratio |
|----------|---------------------:|--------------------:|--------:|
| create + encode (100 spans) | 666 | 3,788 | **5.7x slower** |
| encode pre-built (100 spans) | 622 | 4,041 | **6.5x slower** |
| decode 100 spans | 1,501 | 6,868 | **4.6x slower** |

Run-to-run variance on an unpinned host moves these ratios by roughly ±20%. Observed ranges across multiple runs: 5.3x–6.3x on create+encode, 4.4x–6.5x on decode. For tighter numbers pin to a single core.

### Memory (heap bytes per operation)

Coarse measurement via `heapUsed` delta across 1,000 iterations after forced GC. Requires `--expose-gc`.

| Case | Bytes/op (avg) | Ratio vs protobufjs |
|------|----------------:|---------------------:|
| protobuf-es: create + toBinary (100 spans) | 23,524 | 3.2x more |
| protobufjs: create + encode (100 spans) | 7,449 | baseline |
| protobuf-es: fromBinary (100 spans) | 31,569 | 0.94x (less) |
| protobufjs: decode (100 spans) | 33,594 | baseline |

Observations:
- Encode side: protobuf-es allocates ~3x more heap per operation than protobufjs. Consistent with the reflective encoder path constructing intermediate length-prefix buffers via `BinaryWriter.fork()` + array joins per sub-message.
- Decode side: allocations are within jitter — both libraries materialize the message tree, and the decoded object graph dominates the delta.

### Reading these numbers

- On the 100-span nested workload, `toBinary` dominates: pre-built `toBinary` (267 ops/s) and combined `create() + toBinary()` (285 ops/s) are within jitter of each other, i.e. constructing the message graph is cheap compared to the reflective binary encode walk.
- The `fromJsonString + toBinary` path is roughly 20% slower than direct `create + toBinary` on this fixture (235 vs 285 ops/s median). The OTel incident report observed ~13x — most of that gap is the extra transformer-level traversals building the JSON-shaped object tree upstream of `fromJsonString`, which this benchmark does not exercise. Here we isolate just the `fromJsonString + toBinary` step, so the observed ratio is the lower bound on the regression's protobuf-es-side contribution.
- The `SimpleMessage` numbers illustrate per-call overhead on a trivial shape. Relevant when many small messages are serialized in tight loops (e.g. gRPC unary call payloads).
- The comparison vs protobufjs is consistent with the OTel report's directional claim (protobuf-es is slower on this shape), but the observed ratio here is ~5–7x, not the 13x–30x sometimes cited from external measurements. The difference is attributable to (a) pbjs static-module codegen producing ahead-of-time encoders, which isolates only the encoder/decoder walk; real-world numbers include app-level traversal, JSON conversion, BigInt handling, and allocator pressure which this suite deliberately does not measure; (b) different Node versions and host conditions. This suite reports what protobuf-es actually spends on encode/decode under controlled conditions — use those numbers for tracking, not for headline claims.

## Methodology notes

- The `bench-fromJson-path` cases deliberately reproduce a known-pathological pattern. Do not read the numbers there as "protobuf-es is slow" — they show the cost of an unnecessary extra traversal. See `bench-create-toBinary` for the idiomatic path.
- `create()` is called per sub-message in the nested benchmark (every `KeyValue`, `Span`, `ScopeSpans`, etc.) because protobuf-es's reflective `toBinary` relies on the `$typeName`-tagged prototype established by `create` — this matches the real-world cost of constructing an OTLP-like tree.
- The comparison benchmark uses pbjs static-module codegen (ahead-of-time encoder/decoder), which is the protobufjs mode most commonly adopted in production. pbjs reflection-mode numbers would be slower and not representative of what protobufjs users actually deploy.
- The memory benchmark uses a `heapUsed` delta across 1,000 iterations with `gc()` sandwiching the measurement. This is coarse — it does not separate young-gen allocations cleared between minor GCs from steady-state retained memory — but it is internally consistent across the libraries compared here. For finer attribution use `node --heap-prof` and inspect the resulting `.heapprofile` in Chrome DevTools.

## Future work

- CI integration — run on PR and publish trends; flag regressions above a configurable threshold.
- `ts-proto` comparison on the same fixtures (separate package, opt-in dependency). Would round out the "ahead-of-time codegen" comparison group alongside protobufjs.
- Streaming write benchmarks (`sizeDelimitedEncode`) for gRPC-style workloads.
- Allocation tracking via `node --heap-prof` for per-call-site attribution (replacing the coarse `heapUsed` delta in `bench-memory`).
7 changes: 7 additions & 0 deletions benchmarks/biome.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"$schema": "https://biomejs.dev/schemas/1.9.4/schema.json",
"extends": ["../biome.base.json"],
"files": {
"ignore": ["dist", "src/gen", "src/gen-protobufjs"]
}
}
10 changes: 10 additions & 0 deletions benchmarks/buf.gen.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Learn more: https://buf.build/docs/configuration/v2/buf-gen-yaml
version: v2
inputs:
- directory: proto
clean: true
plugins:
- local: protoc-gen-es
opt: target=ts
out: src/gen
include_imports: true
Loading
Loading