Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 23 additions & 57 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@
[![PyPI version](https://badge.fury.io/py/mockllm.svg)](https://badge.fury.io/py/mockllm)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

A FastAPI-based mock LLM server that mimics OpenAI and Anthropic API formats. Instead of calling actual language models,
it uses predefined responses from a YAML configuration file.
A FastAPI-based mock LLM server that mimics OpenAI and Anthropic API formats. Instead of calling actual language models, it uses predefined responses from a YAML configuration file.

This is made for when you want a deterministic response for testing or development purposes.

Expand All @@ -17,8 +16,6 @@ Check out the [CodeGate](https://github.com/stacklok/codegate) project when you'
- Streaming support (character-by-character response streaming)
- Configurable responses via YAML file
- Hot-reloading of response configurations
- JSON logging
- Error handling
- Mock token counting

## Installation
Expand Down Expand Up @@ -128,10 +125,11 @@ curl -X POST http://localhost:8000/v1/messages \

### Response Configuration

Responses are configured in `responses.yml`. The file has two main sections:
Responses are configured in `responses.yml`. The file has three main sections:

1. `responses`: Maps input prompts to predefined responses
2. `defaults`: Contains default configurations like the unknown response message
3. `settings`: Contains server behavior settings like network lag simulation

Example `responses.yml`:
```yaml
Expand All @@ -141,71 +139,39 @@ responses:

defaults:
unknown_response: "I don't know the answer to that. This is a mock response."
```

### Hot Reloading

The server automatically detects changes to `responses.yml` and reloads the configuration without requiring a restart.

## Development

The project uses Poetry for dependency management and includes a Makefile to help with common development tasks:

```bash
# Set up development environment
make setup

# Run all checks (setup, lint, test)
make all

# Run tests
make test

# Format code
make format

# Run all linting and type checking
make lint

# Clean up build artifacts
make clean

# See all available commands
make help
settings:
lag_enabled: true
lag_factor: 10 # Higher values = faster responses (10 = fast, 1 = slow)
```

### Development Commands
### Network Lag Simulation

- `make setup`: Install all development dependencies with Poetry
- `make test`: Run the test suite
- `make format`: Format code with black and isort
- `make lint`: Run all code quality checks (format, lint, type)
- `make build`: Build the package with Poetry
- `make clean`: Remove build artifacts and cache files
- `make install-dev`: Install package with development dependencies
The server can simulate network latency for more realistic testing scenarios. This is controlled by two settings:

For more details on available commands, run `make help`.
- `lag_enabled`: When true, enables artificial network lag
- `lag_factor`: Controls the speed of responses
- Higher values (e.g., 10) result in faster responses
- Lower values (e.g., 1) result in slower responses
- Affects both streaming and non-streaming responses

## Error Handling
For streaming responses, the lag is applied per-character with slight random variations to simulate realistic network conditions.

The server includes comprehensive error handling:

- Invalid requests return 400 status codes with descriptive messages
- Server errors return 500 status codes with error details
- All errors are logged using JSON format
### Hot Reloading

## Logging
The server automatically detects changes to `responses.yml` and reloads the configuration without restarting the server.

The server uses JSON-formatted logging for:
## Testing

- Incoming request details
- Response configuration loading
- Error messages and stack traces
To run the tests:
```bash
poetry run pytest
```

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.
Contributions are welcome! Please open an issue or submit a PR.

## License

This project is licensed under the Apache License, Version 2.0 - see the [LICENSE](LICENSE) file for details.
This project is licensed under the [Apache 2.0 License](LICENSE).
6 changes: 5 additions & 1 deletion example.responses.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,8 @@ responses:
"what is the meaning of life?": "According to this mock response, the meaning of life is to write better mock servers."

defaults:
unknown_response: "I don't know the answer to that. This is a mock response."
unknown_response: "I don't know the answer to that. This is a mock response."

settings:
lag_enabled: true
lag_factor: 10 # Higher values = faster responses (10 = fast, 1 = slow)
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "mockllm"
version = "0.0.6"
version = "0.0.7"
description = "A mock server that mimics OpenAI and Anthropic API formats for testing"
authors = ["Luke Hinds <[email protected]>"]
license = "Apache-2.0"
Expand Down
2 changes: 1 addition & 1 deletion src/mockllm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@
Mock LLM Server - You will do what I tell you!
"""

__version__ = "0.0.6"
__version__ = "0.0.7"
41 changes: 40 additions & 1 deletion src/mockllm/config.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import asyncio
import logging
import random
from pathlib import Path
from typing import Dict, Generator, Optional
from typing import AsyncGenerator, Dict, Generator, Optional

import yaml
from fastapi import HTTPException
Expand All @@ -20,6 +22,8 @@ def __init__(self, yaml_path: str = "responses.yml"):
self.last_modified = 0
self.responses: Dict[str, str] = {}
self.default_response = "I don't know the answer to that."
self.lag_enabled = False
self.lag_factor = 10
self.load_responses()

def load_responses(self) -> None:
Expand All @@ -33,6 +37,9 @@ def load_responses(self) -> None:
self.default_response = data.get("defaults", {}).get(
"unknown_response", self.default_response
)
settings = data.get("settings", {})
self.lag_enabled = settings.get("lag_enabled", False)
self.lag_factor = settings.get("lag_factor", 10)
self.last_modified = int(current_mtime)
logger.info(
f"Loaded {len(self.responses)} responses from {self.yaml_path}"
Expand Down Expand Up @@ -62,3 +69,35 @@ def get_streaming_response(
# Yield response character by character
for char in response:
yield char

async def get_response_with_lag(self, prompt: str) -> str:
"""Get response with artificial lag for non-streaming responses."""
response = self.get_response(prompt)
if self.lag_enabled:
# Base delay on response length and lag factor
delay = len(response) / (self.lag_factor * 10)
await asyncio.sleep(delay)
return response

async def get_streaming_response_with_lag(
self, prompt: str, chunk_size: Optional[int] = None
) -> AsyncGenerator[str, None]:
"""Generator that yields response content with artificial lag."""
response = self.get_response(prompt)

if chunk_size:
for i in range(0, len(response), chunk_size):
chunk = response[i : i + chunk_size]
if self.lag_enabled:
delay = len(chunk) / (self.lag_factor * 10)
await asyncio.sleep(delay)
yield chunk
else:
for char in response:
if self.lag_enabled:
# Add random variation to character delay
base_delay = 1 / (self.lag_factor * 10)
variation = random.uniform(-0.5, 0.5) * base_delay
delay = max(0, base_delay + variation)
await asyncio.sleep(delay)
yield char
15 changes: 9 additions & 6 deletions src/mockllm/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,14 @@

async def openai_stream_response(content: str, model: str) -> AsyncGenerator[str, None]:
"""Generate OpenAI-style streaming response in SSE format."""
# Send the first message with role
first_chunk = OpenAIStreamResponse(
model=model,
choices=[OpenAIStreamChoice(delta=OpenAIDeltaMessage(role="assistant"))],
)
yield f"data: {first_chunk.model_dump_json()}\n\n"

# Stream the content character by character
for chunk in response_config.get_streaming_response(content):
# Stream the content character by character with lag
async for chunk in response_config.get_streaming_response_with_lag(content):
chunk_response = OpenAIStreamResponse(
model=model,
choices=[OpenAIStreamChoice(delta=OpenAIDeltaMessage(content=chunk))],
Expand All @@ -58,7 +57,7 @@ async def anthropic_stream_response(
content: str, model: str
) -> AsyncGenerator[str, None]:
"""Generate Anthropic-style streaming response in SSE format."""
for chunk in response_config.get_streaming_response(content):
async for chunk in response_config.get_streaming_response_with_lag(content):
stream_response = AnthropicStreamResponse(
delta=AnthropicStreamDelta(delta={"text": chunk})
)
Expand Down Expand Up @@ -98,7 +97,9 @@ async def openai_chat_completion(
media_type="text/event-stream",
)

response_content = response_config.get_response(last_message.content)
response_content = await response_config.get_response_with_lag(
last_message.content
)

# Calculate mock token counts
prompt_tokens = len(str(request.messages).split())
Expand Down Expand Up @@ -159,7 +160,9 @@ async def anthropic_chat_completion(
media_type="text/event-stream",
)

response_content = response_config.get_response(last_message.content)
response_content = await response_config.get_response_with_lag(
last_message.content
)

# Calculate mock token counts
prompt_tokens = len(str(request.messages).split())
Expand Down