Fluid Server: Local AI server for your Windows apps

THIS PROJECT IS UNDER ACTIVE DEVELOPMENT Its not ready for production usage but serves as a good reference for hwo to run whisper on Qualcomm and Intel NPUs

A portable, packaged OpenAI-compatible server for Windows desktop applications. LLM, Transcription, embeddings, and vector DB, all out of the box.

Note that this does require you to run the .exe as a sepearte async process, like a local serving server in your application, and you will need to make requests to serve inference.

Features

Core Capabilities

LLM Chat Completions - OpenAI-compatible API with streaming, backed by llama.cpp and OpenVINO
Audio Transcription - Whisper models with NPU acceleration, backed by OpenVINO and Qualcomm QNN
Text Embeddings - Vector embeddings for search and RAG
Vector Database - LanceDB integration for multimodal storage

Hardware Acceleration

Intel NPU via OpenVINO backend
Qualcomm NPU via QNN (Snapdragon X Elite)
Vulkan GPU via llama-cpp

Quick Start

1. Download or Build

Option A: Download Release

Download fluid-server.exe from releases

Option B: Run from Source

# Install dependencies and run
uv sync
uv run

2. Run the Server

# Run with default settings
.\dist\fluid-server.exe

# Or with custom options
.\dist\fluid-server.exe --host 127.0.0.1 --port 8080

3. Test the API

Health Check: http://localhost:8080/health
API Docs: http://localhost:8080/docs
Models: http://localhost:8080/v1/models

Usage Examples

Basic Chat Completion

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen3-8b-int8-ov", "messages": [{"role": "user", "content": "Hello!"}]}'

Python Integration

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="local")

# Chat with streaming
for chunk in client.chat.completions.create(
    model="qwen3-8b-int8-ov",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True
):
    print(chunk.choices[0].delta.content or "", end="")

Audio Transcription

curl -X POST http://localhost:8080/v1/audio/transcriptions \
  -F "[email protected]" \
  -F "model=whisper-large-v3-turbo-qnn"

Documentation

📖 Comprehensive Guides

NPU Support Guide - Intel & Qualcomm NPU configuration
Integration Guide - Python, .NET, Node.js examples
Development Guide - Setup, building, and contributing
LanceDB Integration - Vector database and embeddings
GGUF Model Support - Using any GGUF model
Compilation Guide - Build system details

FAQ

Why Python? Best ML ecosystem support and PyInstaller packaging.

Why not llama.cpp? We support multiple runtimes and AI accelerators beyond GGML.

Acknowledgements

Built using ty, FastAPI, Pydantic, ONNX Runtime, OpenAI Whisper, and various other AI libraries.

Runtime Technologies:

OpenVINO - Intel NPU and GPU acceleration
Qualcomm QNN - Snapdragon NPU optimization with HTP backend
ONNX Runtime - Cross-platform AI inference

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.claude		.claude
.github		.github
docs		docs
scripts		scripts
src/fluid_server		src/fluid_server
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
fluid-server.spec		fluid-server.spec
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fluid Server: Local AI server for your Windows apps

Features

Quick Start

1. Download or Build

2. Run the Server

3. Test the API

Usage Examples

Basic Chat Completion

Python Integration

Audio Transcription

Documentation

FAQ

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

FluidInference/fluid-server

Folders and files

Latest commit

History

Repository files navigation

Fluid Server: Local AI server for your Windows apps

Features

Quick Start

1. Download or Build

2. Run the Server

3. Test the API

Usage Examples

Basic Chat Completion

Python Integration

Audio Transcription

Documentation

FAQ

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages