TensorZero

TensorZero creates a feedback loop for optimizing LLM applications — turning production data into smarter, faster, and cheaper models.

Integrate our model gateway
Send metrics or feedback
Optimize prompts, models, and inference strategies
Watch your LLMs improve over time

It provides a data & learning flywheel for LLMs by unifying:

Inference: one API for all LLMs, with <1ms P99 overhead
Observability: inference & feedback → your database
Optimization: from prompts to fine-tuning and RL
Experimentation: built-in A/B testing, routing, fallbacks

Website · Docs · Twitter · Slack · Discord

Quick Start (5min) · Comprehensive Tutorial · Deployment Guide · API Reference · Configuration Reference

Features

🌐 LLM Gateway

Integrate with TensorZero once and access every major LLM provider.

Model Providers

Features

The TensorZero Gateway natively supports:

Need something else? Your provider is most likely supported because TensorZero integrates with any OpenAI-compatible API (e.g. Ollama).

The TensorZero Gateway supports advanced features like:

The TensorZero Gateway is written in Rust 🦀 with performance in mind (<1ms p99 latency overhead @ 10k QPS). See Benchmarks.

You can run inference using the TensorZero client (recommended), the OpenAI client, or the HTTP API.

Usage: TensorZero Python Client (Recommended)

You can access any provider using the TensorZero Python client.

pip install tensorzero
Optional: Set up the TensorZero configuration.
Run inference:

from tensorzero import TensorZeroGateway


with TensorZeroGateway.build_embedded(clickhouse_url="...", config_file="...") as client:
    response = client.inference(
        model_name="openai::gpt-4o-mini",
        input={
            "messages": [
                {
                    "role": "user",
                    "content": "Write a haiku about artificial intelligence.",
                }
            ]
        },
    )

See Quick Start for more information.

Usage: OpenAI Python Client

You can access any provider using the OpenAI Python client with TensorZero.

Deploy tensorzero/gateway using Docker. Detailed instructions →
Set up the TensorZero configuration.
Run inference:

from openai import OpenAI


with OpenAI(base_url="http://localhost:3000/openai/v1") as client:
    response = client.chat.completions.create(
        model="tensorzero::function_name::your_function_name",  # defined in configuration (step 2)
        messages=[
            {
                "role": "user",
                "content": "Write a haiku about artificial intelligence.",
            }
        ],
    )

See Quick Start for more information.

Usage: Other Languages & Platforms (HTTP)

TensorZero supports virtually any programming language or platform via its HTTP API.

Deploy tensorzero/gateway using Docker. Detailed instructions →
Optional: Set up the TensorZero configuration.
Run inference:

curl -X POST "http://localhost:3000/inference" \
  -H "Content-Type: application/json" \
  -d '{
    "model_name": "openai::gpt-4o-mini",
    "input": {
      "messages": [
        {
          "role": "user",
          "content": "Write a haiku about artificial intelligence."
        }
      ]
    }
  }'

See Quick Start for more information.

📈 LLM Optimization

Send production metrics and human feedback to easily optimize your prompts, models, and inference strategies — using the UI or programmatically.

Model Optimization

Optimize closed-source and open-source models using supervised fine-tuning (SFT) and preference fine-tuning (DPO).

Supervised Fine-tuning — UI	Preference Fine-tuning (DPO) — Jupyter Notebook

Inference-Time Optimization

Boost performance by dynamically updating your prompts with relevant examples, combining responses from multiple inferences, and more.

Best-of-N Sampling	Mixture-of-N Sampling

Dynamic In-Context Learning (DICL)
	More coming soon...

Prompt Optimization

Optimize your prompts programmatically using research-driven optimization techniques.

Today we provide a sample integration with DSPy.

More coming soon...

🔍 LLM Observability

Zoom in to debug individual API calls, or zoom out to monitor metrics across models and prompts over time — all using the open-source TensorZero UI.

Observability » Inference	Observability » Function

Demo

Watch LLMs get better at data extraction in real-time with TensorZero!

Dynamic in-context learning (DICL) is a powerful inference-time optimization available out of the box with TensorZero. It enhances LLM performance by automatically incorporating relevant historical examples into the prompt, without the need for model fine-tuning.

LLMs-get-better-at-data-extraction-in-real-time-with-TensorZero.mp4

LLM Engineering with TensorZero

The TensorZero Gateway is a high-performance model gateway written in Rust 🦀 that provides a unified API interface for all major LLM providers, allowing for seamless cross-platform integration and fallbacks.
It handles structured schema-based inference with <1ms P99 latency overhead (see Benchmarks) and built-in observability, experimentation, and inference-time optimizations.
It also collects downstream metrics and feedback associated with these inferences, with first-class support for multi-step LLM systems.
Everything is stored in a ClickHouse data warehouse that you control for real-time, scalable, and developer-friendly analytics.
Over time, TensorZero Recipes leverage this structured dataset to optimize your prompts and models: run pre-built recipes for common workflows like fine-tuning, or create your own with complete flexibility using any language and platform.
Finally, the gateway's experimentation features and GitOps orchestration enable you to iterate and deploy with confidence, be it a single LLM or thousands of LLMs.

Our goal is to help engineers build, manage, and optimize the next generation of LLM applications: systems that learn from real-world experience. Read more about our Vision & Roadmap.

Get Started

Start building today. The Quick Start shows it's easy to set up an LLM application with TensorZero. If you want to dive deeper, the Tutorial teaches how to build a simple chatbot, an email copilot, a weather RAG system, and a structured data extraction pipeline.

Questions? Ask us on Slack or Discord.

Using TensorZero at work? Email us at [email protected] to set up a Slack or Teams channel with your team (free).

Work with us. We're hiring in NYC. We'd also welcome open-source contributions!

Examples

We are working on a series of complete runnable examples illustrating TensorZero's data & learning flywheel.

Optimizing Data Extraction (NER) with TensorZero

This example shows how to use TensorZero to optimize a data extraction pipeline. We demonstrate techniques like fine-tuning and dynamic in-context learning (DICL). In the end, a optimized GPT-4o Mini model outperforms GPT-4o on this task — at a fraction of the cost and latency — using a small amount of training data.

Writing Haikus to Satisfy a Judge with Hidden Preferences

This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste. You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants. You'll see progress by fine-tuning the LLM multiple times.

Improving LLM Chess Ability with Best-of-N Sampling

This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options.

Improving Math Reasoning with a Custom Recipe for Automated Prompt Engineering (DSPy)

TensorZero provides a number of pre-built optimization recipes covering common LLM engineering workflows. But you can also easily create your own recipes and workflows! This example shows how to optimize a TensorZero function using an arbitrary tool — here, DSPy.

& many more on the way!

Name		Name	Last commit message	Last commit date
Latest commit History 612 Commits
.cargo		.cargo
.config		.config
.github		.github
ci		ci
clients		clients
examples		examples
gateway		gateway
provider-proxy		provider-proxy
recipes		recipes
tensorzero-internal		tensorzero-internal
ui		ui
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
RELEASE_GUIDE.md		RELEASE_GUIDE.md
clippy.toml		clippy.toml
deny.toml		deny.toml
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorZero

Features

🌐 LLM Gateway

📈 LLM Optimization

Model Optimization

Inference-Time Optimization

Prompt Optimization

🔍 LLM Observability

Demo

LLM Engineering with TensorZero

Get Started

Examples

About

Releases 32

Contributors 26

Languages

License

tensorzero/tensorzero

Folders and files

Latest commit

History

Repository files navigation

TensorZero

Features

🌐 LLM Gateway

📈 LLM Optimization

Model Optimization

Inference-Time Optimization

Prompt Optimization

🔍 LLM Observability

Demo

LLM Engineering with TensorZero

Get Started

Examples

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 32

Contributors 26

Languages