From e5e20954a50c7c858d7e81264da4bea68630c6f6 Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Fri, 30 Jan 2026 17:50:09 +0000 Subject: [PATCH 1/2] Add /realtime API benchmarks to Benchmarks documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Added new section showing performance improvements for /realtime endpoint - Included before/after metrics showing 182× faster p99 latency - Added test setup specifications and key optimizations - Referenced from v1.80.5-stable release notes Co-authored-by: ishaan --- docs/my-website/docs/benchmarks.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/docs/my-website/docs/benchmarks.md b/docs/my-website/docs/benchmarks.md index 640212808bd..00f45c5d4b3 100644 --- a/docs/my-website/docs/benchmarks.md +++ b/docs/my-website/docs/benchmarks.md @@ -48,6 +48,34 @@ In these tests the baseline latency characteristics are measured against a fake- - High-percentile latencies drop significantly: P95 630 ms → 150 ms, P99 1,200 ms → 240 ms. - Setting workers equal to CPU count gives optimal performance. +## `/realtime` API Benchmarks + +LiteLLM's `/realtime` endpoint has been optimized for low-latency WebSocket connections, achieving significant performance improvements through removal of redundant encodings, SSL context reuse, and caching of formatting strings. + +### Performance Metrics + +| Metric | Before | After | Improvement | +| --------------- | --------- | --------- | -------------------------- | +| Median latency | 2,200 ms | **59 ms** | **−97% (~37× faster)** | +| p95 latency | 8,500 ms | **67 ms** | **−99% (~127× faster)** | +| p99 latency | 18,000 ms | **99 ms** | **−99% (~182× faster)** | +| Average latency | 3,214 ms | **63 ms** | **−98% (~51× faster)** | +| RPS | 165 | **1,207** | **+631% (~7.3× increase)** | + +### Test Setup + +| Category | Specification | +|----------|---------------| +| **Load Testing** | Locust: 1,000 concurrent users, 500 ramp-up | +| **System** | 4 vCPUs, 8 GB RAM, 4 workers, 4 instances | +| **Database** | PostgreSQL (Redis unused) | + +### Key Optimizations + +- Removed redundant encodings on the hot path +- Reused shared SSL contexts to prevent excessive memory allocation +- Cached formatting strings that were being regenerated twice per request + ## Machine Spec used for testing Each machine deploying LiteLLM had the following specs: From 5b7458234b6d2683655077d2cabd293c98186dae Mon Sep 17 00:00:00 2001 From: Cursor Agent Date: Fri, 30 Jan 2026 18:14:51 +0000 Subject: [PATCH 2/2] Update /realtime benchmarks to show current performance only - Removed before/after comparison, showing only current metrics - Clarified that benchmarks are e2e latency against fake realtime endpoint - Simplified table format for better readability Co-authored-by: ishaan --- docs/my-website/docs/benchmarks.md | 22 ++++++++-------------- 1 file changed, 8 insertions(+), 14 deletions(-) diff --git a/docs/my-website/docs/benchmarks.md b/docs/my-website/docs/benchmarks.md index 00f45c5d4b3..a1489081b4c 100644 --- a/docs/my-website/docs/benchmarks.md +++ b/docs/my-website/docs/benchmarks.md @@ -50,17 +50,17 @@ In these tests the baseline latency characteristics are measured against a fake- ## `/realtime` API Benchmarks -LiteLLM's `/realtime` endpoint has been optimized for low-latency WebSocket connections, achieving significant performance improvements through removal of redundant encodings, SSL context reuse, and caching of formatting strings. +End-to-end latency benchmarks for the `/realtime` endpoint tested against a fake realtime endpoint. ### Performance Metrics -| Metric | Before | After | Improvement | -| --------------- | --------- | --------- | -------------------------- | -| Median latency | 2,200 ms | **59 ms** | **−97% (~37× faster)** | -| p95 latency | 8,500 ms | **67 ms** | **−99% (~127× faster)** | -| p99 latency | 18,000 ms | **99 ms** | **−99% (~182× faster)** | -| Average latency | 3,214 ms | **63 ms** | **−98% (~51× faster)** | -| RPS | 165 | **1,207** | **+631% (~7.3× increase)** | +| Metric | Value | +| --------------- | ---------- | +| Median latency | 59 ms | +| p95 latency | 67 ms | +| p99 latency | 99 ms | +| Average latency | 63 ms | +| RPS | 1,207 | ### Test Setup @@ -70,12 +70,6 @@ LiteLLM's `/realtime` endpoint has been optimized for low-latency WebSocket conn | **System** | 4 vCPUs, 8 GB RAM, 4 workers, 4 instances | | **Database** | PostgreSQL (Redis unused) | -### Key Optimizations - -- Removed redundant encodings on the hot path -- Reused shared SSL contexts to prevent excessive memory allocation -- Cached formatting strings that were being regenerated twice per request - ## Machine Spec used for testing Each machine deploying LiteLLM had the following specs: