tetherto · lauripiisang · May 14, 2026 · May 13, 2026 · May 13, 2026 · May 13, 2026
@@ -12,6 +12,7 @@ This package is published to npm as **`@qvac/cli`** and lives in the QVAC monore
   - [`bundle sdk`](#bundle-sdk)
   - [`verify deps`](#verify-deps)
   - [`verify bundle`](#verify-bundle)
+  - [`serve openai`](#serve-openai)
 - [Configuration](#configuration)
 - [System Requirements](#system-requirements)
 - [Development](#development)
@@ -297,6 +298,16 @@ on-device runtime version from a mobile dependency tree. **Pass
 strict ABI verification; otherwise mobile bundles will emit
 `unknown-runtime-version` and skip the ABI check pass.
 
+### `serve openai`
+
+Run an **OpenAI-compatible HTTP server** backed by locally configured QVAC models (`serve.models` in `qvac.config.*`).
+
+```bash
+qvac serve openai [options]
+```
+
+See **[docs/serve-openai.md](./docs/serve-openai.md)** for supported `/v1/...` routes, multipart request shapes, and how to register models — including **`whispercpp-audio-translation`** for `POST /v1/audio/translations` (Whisper translate-to-English).
+
 ## Configuration
 
 The CLI reads configuration from `qvac.config.{json,js,mjs,ts}` in your project root.

@@ -0,0 +1,110 @@
+# `qvac serve openai`
+
+The CLI exposes an **OpenAI-compatible HTTP API** (`qvac serve openai`) so tools and SDKs that target OpenAI can run against local QVAC models.
+
+This document describes the supported routes and how to configure `serve.models` for each capability. For general CLI usage, see [README.md](../README.md).
+
+## Implemented endpoints (today)
+
+| Method | Path | Notes |
+|--------|------|--------|
+| `GET` | `/v1/models` | Lists **loaded** models |
+| `GET` | `/v1/models/{id}` | Model metadata |
+| `DELETE` | `/v1/models/{id}` | Unload |
+| `POST` | `/v1/chat/completions` | Chat |
+| `POST` | `/v1/embeddings` | Embeddings |
+| `POST` | `/v1/audio/transcriptions` | Speech-to-text (source language) |
+| `POST` | `/v1/audio/translations` | Speech-to-text **into English** (Whisper translate task) |
+
+Other OpenAI routes may be added over time; this file is updated when they ship.
+
+## `POST /v1/audio/translations`
+
+OpenAI’s **translations** endpoint always returns **English text**. It maps to Whisper’s **translate** task (not “transcribe then run a text translator”).
+
+### Request
+
+- **Content-Type:** `multipart/form-data`
+- **Fields:**
+  - `file` (required) — audio file (same as transcriptions)
+  - `model` (required) — must name a `serve.models` alias whose **endpoint category** is `audio-translation` (see below)
+  - `prompt` (optional) — passed through to the SDK transcribe path (Whisper initial prompt where supported)
+  - `response_format` (optional) — `json` (default) or `text`. `srt`, `vtt`, and `verbose_json` are not implemented yet.
+- **Not supported:** `language`. Per-request language selection is not part of OpenAI’s translations API; output is always English. Use `/v1/audio/transcriptions` if you need non-English text.
+
+### Registering a translation model (`whispercpp-audio-translation`)
+
+Use the virtual SDK type **`whispercpp-audio-translation`** in `serve.models`. The CLI resolves it to the real engine **`whispercpp-transcription`** and **forces** `translate: true` on the **loadModel** `modelConfig` (Whisper translate-to-English). Nested `whisperConfig: { … }` in JSON is flattened into the top-level `modelConfig` for this alias so it matches what `@qvac/sdk` expects.
+
+You may omit `translate`. If you set `translate: false` (top-level or under `whisperConfig`), it is **overridden to `true`** with a console warning.
+
+The recommended shape is the same `"model": "<SDK_CONSTANT>"` shorthand used elsewhere in `serve.models`, with `type` set to the virtual translation type. The constant resolves to its registry `src`; `type` switches the alias from the constant's natural addon (`whispercpp-transcription`) to `whispercpp-audio-translation`.
+
+**Minimal JSON — same weights as a transcription alias, second alias for translate:**
+
+```json
+{
+  "serve": {
+    "models": {
+      "whisper-transcribe": { "model": "WHISPER_EN_TINY_Q8_0", "preload": true },
+      "whisper-translate": {
+        "model": "WHISPER_EN_TINY_Q8_0",
+        "type": "whispercpp-audio-translation",
+        "preload": true
+      }
+    }
+  }
+}
+```
+
+**Optional full `config`** uses the same **flat** Whisper keys as other `serve.models` Whisper entries (see [changelog example](./changelog/0.2.2/api.md): `language`, `n_threads`, `strategy`, … alongside `contextParams` / `miscConfig` if needed). You may also nest tuning under `whisperConfig`; for **`whispercpp-audio-translation` only**, those keys are merged to the top level before load.
+
+**Example with extra Whisper tuning (flat keys, same style as transcriptions):**
+
+```yaml
+serve:
+  models:
+    whisper-1:
+      model: WHISPER_EN_TINY_Q8_0
+      type: whispercpp-audio-translation
+      preload: true
+      config:
+        language: auto
+        n_threads: 4
+        strategy: greedy
+        contextParams:
+          use_gpu: true
+        miscConfig:
+          caption_enabled: false
+```
+
+If you need to point at non-registry weights (a local path, `https://…`, `registry://…`, etc.), drop the `model` shorthand and use the explicit `{ "type": "whispercpp-audio-translation", "src": "<weights>" }` form. `src` is passed to `@qvac/sdk` as `modelSrc` verbatim, so it cannot be an SDK constant name in that form — use the `model` shorthand above when you want constant resolution.
+
+### Example (`curl`)
+
+```bash
+curl -s http://127.0.0.1:11434/v1/audio/translations \
+  -F model=whisper-translate \
+  -F file=@./sample.wav \
+  -F response_format=json
+```
+
+Response (`json`): `{ "text": "..." }`  
+Response (`text`): body is plain UTF-8 text.
+
+### Same weights as transcriptions
+
+You normally use the **same** underlying weights for both transcription and translation; register **two aliases** that share the same `"model": "WHISPER_…"` constant — one without `type` (defaults to transcription) and one with `type: "whispercpp-audio-translation"`.
+
+### Errors
+
+| HTTP | `error.code` | When |
+|------|----------------|------|
+| 400 | `invalid_content_type` | Not `multipart/form-data` |
+| 400 | `missing_file` / `missing_model` | Required fields missing |
+| 400 | `unsupported_param` | e.g. `language` present |
+| 400 | `unsupported_response_format` | `srt`, `vtt`, `verbose_json` |
+| 400 | `invalid_model_type` | Alias is not an `audio-translation` model (use `type: whispercpp-audio-translation` in `serve.models`) |
+| 404 | `model_not_found` | Unknown alias |
+| 503 | `model_not_ready` | Model not loaded yet |
+| 500 | `translation_error` | SDK / engine failure |
@@ -13,6 +13,7 @@
   "files": [
     "dist/**/*",
     "README.md",
+    "docs/**/*.md",
     "LICENSE",
     "NOTICE"
   ],

@@ -50,6 +50,12 @@ export function createOpenAIAdapter (): APIAdapter {
         return true
       }
 
+      if (method === 'POST' && path === '/v1/audio/translations') {
+        const { handleTranslations } = await import('./routes/translations.js')
+        await handleTranslations(req, res, ctx)
+        return true
+      }
+
       if (method === 'POST' && path === '/v1/images/generations') {
         const { handleImagesGenerations } = await import('./routes/images.js')
         await handleImagesGenerations(req, res, ctx)

@@ -0,0 +1,122 @@
+import type { IncomingMessage, ServerResponse } from 'node:http'
+import { sendJson, sendText, sendError } from '../../../http.js'
+import { readMultipart } from '../../../multipart.js'
+import { resolveModelAlias } from '../../../config.js'
+import { sdkTranscribe } from '../../../core/sdk.js'
+import type { RouteContext } from '../../types.js'
+
+const SUPPORTED_RESPONSE_FORMATS = new Set(['json', 'text'])
+const UNSUPPORTED_RESPONSE_FORMATS = new Set(['srt', 'vtt', 'verbose_json'])
+
+export async function handleTranslations (req: IncomingMessage, res: ServerResponse, ctx: RouteContext): Promise<void> {
+  const contentType = req.headers['content-type'] ?? ''
+  if (!contentType.includes('multipart/form-data')) {
+    sendError(res, 400, 'invalid_content_type', 'Content-Type must be multipart/form-data.')
+    return
+  }
+
+  let fields: Map<string, string>
+  let file: { fieldName: string; fileName: string; contentType: string; data: Buffer } | null
+
+  try {
+    const result = await readMultipart(req)
+    fields = result.fields
+    file = result.file
+  } catch (err) {
+    const message = err instanceof Error ? err.message : String(err)
+    ctx.logger.error(`Multipart parse error: ${message}`)
+    sendError(res, 400, 'invalid_multipart', 'Failed to parse multipart request.')
+    return
+  }
+
+  if (!file || file.fieldName !== 'file') {
+    sendError(res, 400, 'missing_file', '"file" field is required.')
+    return
+  }
+
+  const modelName = fields.get('model')
+  if (!modelName) {
+    sendError(res, 400, 'missing_model', '"model" field is required.')
+    return
+  }
+
+  if (fields.has('language')) {
+    sendError(
+      res,
+      400,
+      'unsupported_param',
+      'The "language" field is not supported on /v1/audio/translations. Output is always English.'
+    )
+    return
+  }
+
+  const responseFormat = fields.get('response_format') ?? 'json'
+  if (UNSUPPORTED_RESPONSE_FORMATS.has(responseFormat)) {
+    sendError(res, 400, 'unsupported_response_format', `response_format "${responseFormat}" is not supported. Use "json" or "text".`)
+    return
+  }
+  if (!SUPPORTED_RESPONSE_FORMATS.has(responseFormat)) {
+    sendError(res, 400, 'invalid_response_format', `Unknown response_format "${responseFormat}". Use "json" or "text".`)
+    return
+  }
+
+  const prompt = fields.get('prompt')
+  const temperature = fields.get('temperature')
+
+  const modelEntry = resolveModelAlias(ctx.serveConfig, modelName) ?? ctx.registry.getEntry(modelName)
+
+  if (!modelEntry) {
+    sendError(res, 404, 'model_not_found', `Model "${modelName}" is not available. Check serve.models config.`)
+    return
+  }
+
+  const endpointCategory = 'endpointCategory' in modelEntry ? modelEntry.endpointCategory : undefined
+  if (endpointCategory !== 'audio-translation') {
+    sendError(
+      res,
+      400,
+      'invalid_model_type',
+      `Model "${modelName}" is not registered for audio translation. Register an alias with type "whispercpp-audio-translation" in serve.models.`
+    )
+    return
+  }
+
+  const alias = 'alias' in modelEntry ? (modelEntry.alias as string) : modelEntry.id
+  const registryEntry = ctx.registry.getEntry(alias)
+  if (!registryEntry || registryEntry.state !== ctx.registry.STATES.READY) {
+    sendError(res, 503, 'model_not_ready', `Model "${modelName}" is not loaded yet.`)
+    return
+  }
+
+  if (temperature) {
+    ctx.logger.warn(`Ignoring unsupported param: temperature=${temperature}`)
+  }
+
+  const sdkModelId = registryEntry.sdkModelId ?? registryEntry.id
+  const fileSizeKB = Math.round(file.data.length / 1024)
+
+  ctx.logger.info(`  translate model=${alias} file=${file.fileName} size=${fileSizeKB}KB format=${responseFormat}${prompt ? ' prompt=yes' : ''}`)
+
+  const transcribe = ctx.transcribeOverride ?? sdkTranscribe
+
+  try {
+    const text = await transcribe({
+      modelId: sdkModelId,
+      audioChunk: file.data,
+      fileName: file.fileName,
+      prompt
+    })
+
+    ctx.logger.info(`  translate done chars=${text.length}`)
+
+    if (responseFormat === 'text') {
+      sendText(res, 200, text)
+    } else {
+      sendJson(res, 200, { text })
+    }
+  } catch (err) {
+    const message = err instanceof Error ? err.message : String(err)
+    ctx.logger.error(`Translation error for "${alias}": ${message}`)
+    sendError(res, 500, 'translation_error', 'An internal error occurred during audio translation.')
+  }
+}
@@ -6,6 +6,13 @@ export interface RouteContext {
   registry: ModelRegistry
   serveConfig: ServeConfig
   logger: Logger
+  /** @internal Unit tests only — replaces sdkTranscribe when set */
+  transcribeOverride?: (opts: {
+    modelId: string
+    audioChunk: Buffer
+    fileName: string
+    prompt?: string | undefined
+  }) => Promise<string>
 }
 
 export type RouteHandler = (req: IncomingMessage, res: ServerResponse, ctx: RouteContext) => Promise<void> | void