evalite

Poor man's LLM eval platform. Still under active development.

We allow comparing against

multiple LLM providers (eg OpenAI vs Anthropic)
same models on different providers (eg Llama on Fireworks vs on Together)
sampling strategies (i.e. varied temperature)
different versions of the prompt

We support rating results with simple thumbs up / down.

Features

MIT license
Autogenerate prompts from task description
Templated variable replacement
Autogenerate test cases from prompt (generated values for variables)
Run test cases against multiple LLM versions / sampling strategies
XML output formatting
Ordinal ranking (thumbs up / down, unpaired)

Future:

Streaming output
Latency statistics
Multimodal input
Human rate test cases (pairwise)
Auto LLM grade test cases

Architecture

We use a Vite + React frontend and a Go backend. Communication is handled via Connect / Protobufs.

Usage

We use hermit to manage dependencies.

Run the dev servers:

go run cmd/serverd/main.go 
cd frontend && bun install && bun run dev

The go server must be able to load env vars corresponding to your LLM provider API keys (eg $OPENAI_API_KEY). This can be via a .env file or via the environment. If the keys are found, the server will automatically load support for the LLM provider and make it available to the frontend.

To regenerate protobufs after a change, unfortunately you need a second bun install. This is an artifact of needing protoc-gen-[typescript] in the CLI path, and that requires a bun install to be available.

# optionally I think these are available in hermit too, but not the typescript generators
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install connectrpc.com/connect/cmd/protoc-gen-connect-go@latest

bun install
bun x buf generate

Production is an exercise left to the reader.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.idea		.idea
bin		bin
cmd/serverd		cmd/serverd
eval/v1		eval/v1
frontend		frontend
gen/eval/v1		gen/eval/v1
packages		packages
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
buf.gen.yaml		buf.gen.yaml
buf.yaml		buf.yaml
bun.lockb		bun.lockb
go.mod		go.mod
go.sum		go.sum
package.json		package.json
readme.md		readme.md
screenshot.png		screenshot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

evalite

Features

Architecture

Usage

About

Releases

Packages

Languages

License

tincans-ai/evalite

Folders and files

Latest commit

History

Repository files navigation

evalite

Features

Architecture

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages