diff --git a/packages/cli/README.md b/packages/cli/README.md index 8c613ad32a..b19003e985 100644 --- a/packages/cli/README.md +++ b/packages/cli/README.md @@ -12,6 +12,7 @@ This package is published to npm as **`@qvac/cli`** and lives in the QVAC monore - [`bundle sdk`](#bundle-sdk) - [`verify deps`](#verify-deps) - [`verify bundle`](#verify-bundle) + - [`serve openai`](#serve-openai) - [Configuration](#configuration) - [System Requirements](#system-requirements) - [Development](#development) @@ -297,6 +298,16 @@ on-device runtime version from a mobile dependency tree. **Pass strict ABI verification; otherwise mobile bundles will emit `unknown-runtime-version` and skip the ABI check pass. +### `serve openai` + +Run an **OpenAI-compatible HTTP server** backed by locally configured QVAC models (`serve.models` in `qvac.config.*`). + +```bash +qvac serve openai [options] +``` + +See **[docs/serve-openai.md](./docs/serve-openai.md)** for supported `/v1/...` routes, multipart request shapes, and how to register models — including **`whispercpp-audio-translation`** for `POST /v1/audio/translations` (Whisper translate-to-English). + ## Configuration The CLI reads configuration from `qvac.config.{json,js,mjs,ts}` in your project root. diff --git a/packages/cli/docs/serve-openai.md b/packages/cli/docs/serve-openai.md new file mode 100644 index 0000000000..f41224d58d --- /dev/null +++ b/packages/cli/docs/serve-openai.md @@ -0,0 +1,110 @@ +# `qvac serve openai` + +The CLI exposes an **OpenAI-compatible HTTP API** (`qvac serve openai`) so tools and SDKs that target OpenAI can run against local QVAC models. + +This document describes the supported routes and how to configure `serve.models` for each capability. For general CLI usage, see [README.md](../README.md). + +## Implemented endpoints (today) + +| Method | Path | Notes | +|--------|------|--------| +| `GET` | `/v1/models` | Lists **loaded** models | +| `GET` | `/v1/models/{id}` | Model metadata | +| `DELETE` | `/v1/models/{id}` | Unload | +| `POST` | `/v1/chat/completions` | Chat | +| `POST` | `/v1/embeddings` | Embeddings | +| `POST` | `/v1/audio/transcriptions` | Speech-to-text (source language) | +| `POST` | `/v1/audio/translations` | Speech-to-text **into English** (Whisper translate task) | + +Other OpenAI routes may be added over time; this file is updated when they ship. + +## `POST /v1/audio/translations` + +OpenAI’s **translations** endpoint always returns **English text**. It maps to Whisper’s **translate** task (not “transcribe then run a text translator”). + +### Request + +- **Content-Type:** `multipart/form-data` +- **Fields:** + - `file` (required) — audio file (same as transcriptions) + - `model` (required) — must name a `serve.models` alias whose **endpoint category** is `audio-translation` (see below) + - `prompt` (optional) — passed through to the SDK transcribe path (Whisper initial prompt where supported) + - `response_format` (optional) — `json` (default) or `text`. `srt`, `vtt`, and `verbose_json` are not implemented yet. +- **Not supported:** `language`. Per-request language selection is not part of OpenAI’s translations API; output is always English. Use `/v1/audio/transcriptions` if you need non-English text. + +### Registering a translation model (`whispercpp-audio-translation`) + +Use the virtual SDK type **`whispercpp-audio-translation`** in `serve.models`. The CLI resolves it to the real engine **`whispercpp-transcription`** and **forces** `translate: true` on the **loadModel** `modelConfig` (Whisper translate-to-English). Nested `whisperConfig: { … }` in JSON is flattened into the top-level `modelConfig` for this alias so it matches what `@qvac/sdk` expects. + +You may omit `translate`. If you set `translate: false` (top-level or under `whisperConfig`), it is **overridden to `true`** with a console warning. + +The recommended shape is the same `"model": ""` shorthand used elsewhere in `serve.models`, with `type` set to the virtual translation type. The constant resolves to its registry `src`; `type` switches the alias from the constant's natural addon (`whispercpp-transcription`) to `whispercpp-audio-translation`. + +**Minimal JSON — same weights as a transcription alias, second alias for translate:** + +```json +{ + "serve": { + "models": { + "whisper-transcribe": { "model": "WHISPER_EN_TINY_Q8_0", "preload": true }, + "whisper-translate": { + "model": "WHISPER_EN_TINY_Q8_0", + "type": "whispercpp-audio-translation", + "preload": true + } + } + } +} +``` + +**Optional full `config`** uses the same **flat** Whisper keys as other `serve.models` Whisper entries (see [changelog example](./changelog/0.2.2/api.md): `language`, `n_threads`, `strategy`, … alongside `contextParams` / `miscConfig` if needed). You may also nest tuning under `whisperConfig`; for **`whispercpp-audio-translation` only**, those keys are merged to the top level before load. + +**Example with extra Whisper tuning (flat keys, same style as transcriptions):** + +```yaml +serve: + models: + whisper-1: + model: WHISPER_EN_TINY_Q8_0 + type: whispercpp-audio-translation + preload: true + config: + language: auto + n_threads: 4 + strategy: greedy + contextParams: + use_gpu: true + miscConfig: + caption_enabled: false +``` + +If you need to point at non-registry weights (a local path, `https://…`, `registry://…`, etc.), drop the `model` shorthand and use the explicit `{ "type": "whispercpp-audio-translation", "src": "" }` form. `src` is passed to `@qvac/sdk` as `modelSrc` verbatim, so it cannot be an SDK constant name in that form — use the `model` shorthand above when you want constant resolution. + +### Example (`curl`) + +```bash +curl -s http://127.0.0.1:11434/v1/audio/translations \ + -F model=whisper-translate \ + -F file=@./sample.wav \ + -F response_format=json +``` + +Response (`json`): `{ "text": "..." }` +Response (`text`): body is plain UTF-8 text. + +### Same weights as transcriptions + +You normally use the **same** underlying weights for both transcription and translation; register **two aliases** that share the same `"model": "WHISPER_…"` constant — one without `type` (defaults to transcription) and one with `type: "whispercpp-audio-translation"`. + +### Errors + +| HTTP | `error.code` | When | +|------|----------------|------| +| 400 | `invalid_content_type` | Not `multipart/form-data` | +| 400 | `missing_file` / `missing_model` | Required fields missing | +| 400 | `unsupported_param` | e.g. `language` present | +| 400 | `unsupported_response_format` | `srt`, `vtt`, `verbose_json` | +| 400 | `invalid_model_type` | Alias is not an `audio-translation` model (use `type: whispercpp-audio-translation` in `serve.models`) | +| 404 | `model_not_found` | Unknown alias | +| 503 | `model_not_ready` | Model not loaded yet | +| 500 | `translation_error` | SDK / engine failure | diff --git a/packages/cli/package.json b/packages/cli/package.json index 1c5bdfb255..37e062fa1b 100644 --- a/packages/cli/package.json +++ b/packages/cli/package.json @@ -13,6 +13,7 @@ "files": [ "dist/**/*", "README.md", + "docs/**/*.md", "LICENSE", "NOTICE" ], diff --git a/packages/cli/src/serve/adapters/openai/index.ts b/packages/cli/src/serve/adapters/openai/index.ts index 1f90f8f93e..c2810deacf 100644 --- a/packages/cli/src/serve/adapters/openai/index.ts +++ b/packages/cli/src/serve/adapters/openai/index.ts @@ -50,6 +50,12 @@ export function createOpenAIAdapter (): APIAdapter { return true } + if (method === 'POST' && path === '/v1/audio/translations') { + const { handleTranslations } = await import('./routes/translations.js') + await handleTranslations(req, res, ctx) + return true + } + if (method === 'POST' && path === '/v1/images/generations') { const { handleImagesGenerations } = await import('./routes/images.js') await handleImagesGenerations(req, res, ctx) diff --git a/packages/cli/src/serve/adapters/openai/routes/translations.ts b/packages/cli/src/serve/adapters/openai/routes/translations.ts new file mode 100644 index 0000000000..2c01c9eed8 --- /dev/null +++ b/packages/cli/src/serve/adapters/openai/routes/translations.ts @@ -0,0 +1,122 @@ +import type { IncomingMessage, ServerResponse } from 'node:http' +import { sendJson, sendText, sendError } from '../../../http.js' +import { readMultipart } from '../../../multipart.js' +import { resolveModelAlias } from '../../../config.js' +import { sdkTranscribe } from '../../../core/sdk.js' +import type { RouteContext } from '../../types.js' + +const SUPPORTED_RESPONSE_FORMATS = new Set(['json', 'text']) +const UNSUPPORTED_RESPONSE_FORMATS = new Set(['srt', 'vtt', 'verbose_json']) + +export async function handleTranslations (req: IncomingMessage, res: ServerResponse, ctx: RouteContext): Promise { + const contentType = req.headers['content-type'] ?? '' + if (!contentType.includes('multipart/form-data')) { + sendError(res, 400, 'invalid_content_type', 'Content-Type must be multipart/form-data.') + return + } + + let fields: Map + let file: { fieldName: string; fileName: string; contentType: string; data: Buffer } | null + + try { + const result = await readMultipart(req) + fields = result.fields + file = result.file + } catch (err) { + const message = err instanceof Error ? err.message : String(err) + ctx.logger.error(`Multipart parse error: ${message}`) + sendError(res, 400, 'invalid_multipart', 'Failed to parse multipart request.') + return + } + + if (!file || file.fieldName !== 'file') { + sendError(res, 400, 'missing_file', '"file" field is required.') + return + } + + const modelName = fields.get('model') + if (!modelName) { + sendError(res, 400, 'missing_model', '"model" field is required.') + return + } + + if (fields.has('language')) { + sendError( + res, + 400, + 'unsupported_param', + 'The "language" field is not supported on /v1/audio/translations. Output is always English.' + ) + return + } + + const responseFormat = fields.get('response_format') ?? 'json' + if (UNSUPPORTED_RESPONSE_FORMATS.has(responseFormat)) { + sendError(res, 400, 'unsupported_response_format', `response_format "${responseFormat}" is not supported. Use "json" or "text".`) + return + } + if (!SUPPORTED_RESPONSE_FORMATS.has(responseFormat)) { + sendError(res, 400, 'invalid_response_format', `Unknown response_format "${responseFormat}". Use "json" or "text".`) + return + } + + const prompt = fields.get('prompt') + const temperature = fields.get('temperature') + + const modelEntry = resolveModelAlias(ctx.serveConfig, modelName) ?? ctx.registry.getEntry(modelName) + + if (!modelEntry) { + sendError(res, 404, 'model_not_found', `Model "${modelName}" is not available. Check serve.models config.`) + return + } + + const endpointCategory = 'endpointCategory' in modelEntry ? modelEntry.endpointCategory : undefined + if (endpointCategory !== 'audio-translation') { + sendError( + res, + 400, + 'invalid_model_type', + `Model "${modelName}" is not registered for audio translation. Register an alias with type "whispercpp-audio-translation" in serve.models.` + ) + return + } + + const alias = 'alias' in modelEntry ? (modelEntry.alias as string) : modelEntry.id + const registryEntry = ctx.registry.getEntry(alias) + if (!registryEntry || registryEntry.state !== ctx.registry.STATES.READY) { + sendError(res, 503, 'model_not_ready', `Model "${modelName}" is not loaded yet.`) + return + } + + if (temperature) { + ctx.logger.warn(`Ignoring unsupported param: temperature=${temperature}`) + } + + const sdkModelId = registryEntry.sdkModelId ?? registryEntry.id + const fileSizeKB = Math.round(file.data.length / 1024) + + ctx.logger.info(` translate model=${alias} file=${file.fileName} size=${fileSizeKB}KB format=${responseFormat}${prompt ? ' prompt=yes' : ''}`) + + const transcribe = ctx.transcribeOverride ?? sdkTranscribe + + try { + const text = await transcribe({ + modelId: sdkModelId, + audioChunk: file.data, + fileName: file.fileName, + prompt + }) + + ctx.logger.info(` translate done chars=${text.length}`) + + if (responseFormat === 'text') { + sendText(res, 200, text) + } else { + sendJson(res, 200, { text }) + } + } catch (err) { + const message = err instanceof Error ? err.message : String(err) + ctx.logger.error(`Translation error for "${alias}": ${message}`) + sendError(res, 500, 'translation_error', 'An internal error occurred during audio translation.') + } +} diff --git a/packages/cli/src/serve/adapters/types.ts b/packages/cli/src/serve/adapters/types.ts index d9d20f3e4d..568d77585b 100644 --- a/packages/cli/src/serve/adapters/types.ts +++ b/packages/cli/src/serve/adapters/types.ts @@ -6,6 +6,13 @@ export interface RouteContext { registry: ModelRegistry serveConfig: ServeConfig logger: Logger + /** @internal Unit tests only — replaces sdkTranscribe when set */ + transcribeOverride?: (opts: { + modelId: string + audioChunk: Buffer + fileName: string + prompt?: string | undefined + }) => Promise } export type RouteHandler = (req: IncomingMessage, res: ServerResponse, ctx: RouteContext) => Promise | void diff --git a/packages/cli/src/serve/config.ts b/packages/cli/src/serve/config.ts index 37425bc556..4d81a16a15 100644 --- a/packages/cli/src/serve/config.ts +++ b/packages/cli/src/serve/config.ts @@ -9,6 +9,7 @@ const ENDPOINT_CATEGORY: Record = { 'llamacpp-embedding': 'embedding', whisper: 'transcription', 'whispercpp-transcription': 'transcription', + 'whispercpp-audio-translation': 'audio-translation', parakeet: 'transcription', 'parakeet-transcription': 'transcription', nmt: 'translation', @@ -29,6 +30,7 @@ interface RawServeConfig { interface ConstantModelEntry { model: string + type?: string default?: boolean preload?: boolean config?: Record @@ -92,6 +94,50 @@ export function normalizeEndpointCategory (sdkType: string): string { return ENDPOINT_CATEGORY[sdkType] ?? sdkType } +const VIRTUAL_SDK_WHISPER_AUDIO_TRANSLATION = 'whispercpp-audio-translation' + +/** + * Resolves explicit serve.models entries: maps the virtual whisper translation + * alias to whispercpp-transcription + forces translate=true for SDK loadModel + * (whisper modelConfig is flat whisper fields, not a nested whisperConfig object). + * Exported for unit tests. + */ +export function resolveExplicitServeModel (type: string, config: Record): { + sdkType: string + endpointCategory: string + config: Record +} { + if (type !== VIRTUAL_SDK_WHISPER_AUDIO_TRANSLATION) { + return { + sdkType: type, + endpointCategory: normalizeEndpointCategory(type), + config: { ...config } + } + } + + const out: Record = { ...config } + const nested = out['whisperConfig'] + if (nested !== null && typeof nested === 'object' && !Array.isArray(nested)) { + for (const [k, v] of Object.entries(nested as Record)) { + out[k] = v + } + delete out['whisperConfig'] + } + + if (out['translate'] === false) { + console.warn( + 'serve.models: whispercpp-audio-translation forces translate=true (ignoring translate=false)' + ) + } + out['translate'] = true + + return { + sdkType: 'whispercpp-transcription', + endpointCategory: 'audio-translation', + config: out + } +} + function isConstantModelEntry (entry: unknown): entry is ConstantModelEntry { return ( entry !== null && @@ -101,7 +147,7 @@ function isConstantModelEntry (entry: unknown): entry is ConstantModelEntry { ) } -function resolveModelConstant (alias: string, constantName: string, registry: Map, overrides?: ConstantModelEntry): ResolvedModelEntry { +export function resolveModelConstant (alias: string, constantName: string, registry: Map, overrides?: ConstantModelEntry): ResolvedModelEntry { const model = registry.get(constantName) if (!model) { throw new Error( @@ -110,14 +156,23 @@ function resolveModelConstant (alias: string, constantName: string, registry: Ma ) } + const rawConfig = overrides?.config ?? {} + const resolved = overrides?.type + ? resolveExplicitServeModel(overrides.type, rawConfig) + : { + sdkType: model.addon, + endpointCategory: normalizeEndpointCategory(model.addon), + config: rawConfig + } + return { alias, src: model.src, - sdkType: model.addon, - endpointCategory: normalizeEndpointCategory(model.addon), + sdkType: resolved.sdkType, + endpointCategory: resolved.endpointCategory, isDefault: overrides?.default === true, preload: overrides?.preload !== false, - config: overrides?.config ?? {} + config: resolved.config } } @@ -129,14 +184,17 @@ function parseExplicitEntry (alias: string, entry: ExplicitModelEntry): Resolved throw new Error(`serve.models.${alias}: "type" is required`) } + const rawConfig = entry.config ?? {} + const resolved = resolveExplicitServeModel(entry.type, rawConfig) + return { alias, src: entry.src, - sdkType: entry.type, - endpointCategory: normalizeEndpointCategory(entry.type), + sdkType: resolved.sdkType, + endpointCategory: resolved.endpointCategory, isDefault: entry.default === true, preload: entry.preload === true, - config: entry.config ?? {} + config: resolved.config } } diff --git a/packages/cli/src/serve/index.ts b/packages/cli/src/serve/index.ts index 03515b0bdb..ebcfd4ce47 100644 --- a/packages/cli/src/serve/index.ts +++ b/packages/cli/src/serve/index.ts @@ -113,6 +113,7 @@ const CATEGORY_ENDPOINTS: Record = { chat: ['POST /v1/chat/completions'], embedding: ['POST /v1/embeddings'], transcription: ['POST /v1/audio/transcriptions'], + 'audio-translation': ['POST /v1/audio/translations'], image: ['POST /v1/images/generations'] } @@ -126,6 +127,7 @@ const CATEGORY_LABELS: Record = { chat: 'chat', embedding: 'embedding', transcription: 'transcription', + 'audio-translation': 'audio translation', translation: 'translation', speech: 'speech', ocr: 'ocr', diff --git a/packages/cli/test/cli.bats b/packages/cli/test/cli.bats index e3da6ded24..5695a43692 100755 --- a/packages/cli/test/cli.bats +++ b/packages/cli/test/cli.bats @@ -15,7 +15,7 @@ setup_file() { for name in default auth nocors; do local dir="${FILE_TMPDIR}/${name}" mkdir -p "${dir}" - echo '{ "serve": { "models": {} } }' > "${dir}/qvac.config.json" + echo '{"serve":{"models":{"fake-transcribe":{"type":"whispercpp-transcription","src":"hyper://example.invalid/model","preload":false}}}}' > "${dir}/qvac.config.json" done cd "${FILE_TMPDIR}/default" @@ -355,6 +355,56 @@ http_status() { assert_error "${body}" "model_not_found" } +# ── Serve: translations validation ────────────────────────────────── + +@test "translations: JSON content-type returns 400" { + local body + body=$(curl -s "http://127.0.0.1:19920/v1/audio/translations" \ + -H "Content-Type: application/json" -d '{"model":"test"}') + assert_error "${body}" "invalid_content_type" +} + +@test "translations: missing file returns 400" { + local body + body=$(curl -s "http://127.0.0.1:19920/v1/audio/translations" -F "model=test") + assert_error "${body}" "missing_file" +} + +@test "translations: missing model returns 400" { + local body + body=$(curl -s "http://127.0.0.1:19920/v1/audio/translations" \ + -F "file=@/dev/null;filename=audio.wav") + assert_error "${body}" "missing_model" +} + +@test "translations: language field returns 400" { + local body + body=$(curl -s "http://127.0.0.1:19920/v1/audio/translations" \ + -F "model=fake-transcribe" -F "language=es" -F "file=@/dev/null;filename=audio.wav") + assert_error "${body}" "unsupported_param" +} + +@test "translations: unsupported srt format returns 400" { + local body + body=$(curl -s "http://127.0.0.1:19920/v1/audio/translations" \ + -F "model=fake-transcribe" -F "response_format=srt" -F "file=@/dev/null;filename=audio.wav") + assert_error "${body}" "unsupported_response_format" +} + +@test "translations: transcription-only model returns invalid_model_type" { + local body + body=$(curl -s "http://127.0.0.1:19920/v1/audio/translations" \ + -F "model=fake-transcribe" -F "file=@/dev/null;filename=audio.wav") + assert_error "${body}" "invalid_model_type" +} + +@test "translations: unknown model returns 404" { + local body + body=$(curl -s "http://127.0.0.1:19920/v1/audio/translations" \ + -F "model=nonexistent" -F "file=@/dev/null;filename=audio.wav") + assert_error "${body}" "model_not_found" +} + # ── Serve: routing ──────────────────────────────────────────────────── @test "GET /unknown returns 404" { diff --git a/packages/cli/test/config.test.ts b/packages/cli/test/config.test.ts new file mode 100644 index 0000000000..bfe1c789bb --- /dev/null +++ b/packages/cli/test/config.test.ts @@ -0,0 +1,94 @@ +import { describe, it } from 'node:test' +import assert from 'node:assert/strict' +import { resolveExplicitServeModel, resolveModelConstant } from '../src/serve/config.js' + +const WHISPER_CONST = { + src: 'registry://whisper-en-tiny-q8_0', + addon: 'whispercpp-transcription', + name: 'WHISPER_EN_TINY_Q8_0' +} +const LLM_CONST = { + src: 'registry://qwen3-600m-inst-q4', + addon: 'llamacpp-completion', + name: 'QWEN3_600M_INST_Q4' +} + +function makeRegistry () { + const m = new Map() + m.set('WHISPER_EN_TINY_Q8_0', WHISPER_CONST) + m.set('QWEN3_600M_INST_Q4', LLM_CONST) + return m +} + +describe('resolveExplicitServeModel', () => { + it('maps whispercpp-audio-translation to whispercpp-transcription and audio-translation', () => { + const r = resolveExplicitServeModel('whispercpp-audio-translation', { + whisperConfig: { language: 'auto', n_threads: 4 } + }) + assert.equal(r.sdkType, 'whispercpp-transcription') + assert.equal(r.endpointCategory, 'audio-translation') + assert.equal(r.config['translate'], true) + assert.equal(r.config['language'], 'auto') + assert.equal(r.config['n_threads'], 4) + assert.equal('whisperConfig' in r.config, false) + }) + + it('creates translate when config was empty', () => { + const r = resolveExplicitServeModel('whispercpp-audio-translation', {}) + assert.equal(r.config['translate'], true) + }) + + it('forces translate true when operator set translate false (nested)', () => { + const r = resolveExplicitServeModel('whispercpp-audio-translation', { + whisperConfig: { translate: false } + }) + assert.equal(r.config['translate'], true) + assert.equal('whisperConfig' in r.config, false) + }) + + it('forces translate true when operator set translate false (top-level)', () => { + const r = resolveExplicitServeModel('whispercpp-audio-translation', { + translate: false + }) + assert.equal(r.config['translate'], true) + }) + + it('passes through non-virtual types unchanged', () => { + const r = resolveExplicitServeModel('whispercpp-transcription', { + whisperConfig: { translate: false } + }) + assert.equal(r.sdkType, 'whispercpp-transcription') + assert.equal(r.endpointCategory, 'transcription') + assert.equal((r.config.whisperConfig as Record).translate, false) + }) +}) + +describe('resolveModelConstant', () => { + it('resolves a constant to its registry src and natural addon', () => { + const r = resolveModelConstant('alias', 'WHISPER_EN_TINY_Q8_0', makeRegistry()) + assert.equal(r.src, WHISPER_CONST.src) + assert.equal(r.sdkType, 'whispercpp-transcription') + assert.equal(r.endpointCategory, 'transcription') + }) + + it('honors a type override on a constant entry (whisper → audio-translation)', () => { + const r = resolveModelConstant('alias', 'WHISPER_EN_TINY_Q8_0', makeRegistry(), { + model: 'WHISPER_EN_TINY_Q8_0', + type: 'whispercpp-audio-translation', + config: { language: 'auto' } + }) + assert.equal(r.src, WHISPER_CONST.src) + assert.equal(r.sdkType, 'whispercpp-transcription') + assert.equal(r.endpointCategory, 'audio-translation') + assert.equal(r.config['translate'], true) + assert.equal(r.config['language'], 'auto') + assert.equal('whisperConfig' in r.config, false) + }) + + it('throws on unknown constant names', () => { + assert.throws( + () => resolveModelConstant('alias', 'NOT_A_REAL_CONST', makeRegistry()), + /unknown model constant "NOT_A_REAL_CONST"/ + ) + }) +}) diff --git a/packages/cli/test/e2e.bats b/packages/cli/test/e2e.bats index a947fd1d48..3996d2695c 100644 --- a/packages/cli/test/e2e.bats +++ b/packages/cli/test/e2e.bats @@ -1,6 +1,6 @@ #!/usr/bin/env bats -# End-to-end tests with real models (LLM, embedding, whisper). +# End-to-end tests with real models (LLM, embedding, whisper transcription + translation). # Requires: npm run build, jq, @qvac/sdk installed as devDependency. # These tests download small models and run real inference — expect ~5-10 min on first run. @@ -12,6 +12,7 @@ BASE="http://127.0.0.1:${E2E_PORT}" LLM_ALIAS="test-llm" EMBED_ALIAS="test-embed" WHISPER_ALIAS="test-whisper" +WHISPER_TRANSLATE_ALIAS="test-whisper-translate" # ── Server lifecycle (once per file) ────────────────────────────────── @@ -36,6 +37,11 @@ setup_file() { "test-whisper": { "model": "WHISPER_EN_TINY_Q8_0", "preload": true + }, + "test-whisper-translate": { + "model": "WHISPER_EN_TINY_Q8_0", + "type": "whispercpp-audio-translation", + "preload": true } } } @@ -65,7 +71,7 @@ CONF while [[ "${elapsed}" -lt "${max_wait}" ]]; do local count count=$(curl -sf "${BASE}/v1/models" 2>/dev/null | jq '.data | length' 2>/dev/null || echo 0) - [[ "${count}" -ge 3 ]] && break + [[ "${count}" -ge 4 ]] && break sleep 2 elapsed=$((elapsed + 2)) done @@ -97,15 +103,15 @@ json_post() { # ── Models ──────────────────────────────────────────────────────────── -@test "GET /v1/models lists all 3 loaded models" { +@test "GET /v1/models lists all 4 loaded models" { local body body=$(curl -sf "${BASE}/v1/models") echo "${body}" | jq -e '.object == "list"' >/dev/null - echo "${body}" | jq -e '.data | length == 3' >/dev/null + echo "${body}" | jq -e '.data | length == 4' >/dev/null local ids ids=$(echo "${body}" | jq -r '[.data[].id] | sort | join(",")') - [[ "${ids}" == "test-embed,test-llm,test-whisper" ]] + [[ "${ids}" == "test-embed,test-llm,test-whisper,test-whisper-translate" ]] echo "${body}" | jq -e '.data | all(.object == "model")' >/dev/null echo "${body}" | jq -e '.data | all(.owned_by == "qvac")' >/dev/null @@ -226,6 +232,35 @@ json_post() { ! echo "${body}" | jq -e '.' >/dev/null 2>&1 || [[ $(echo "${body}" | jq -r 'type' 2>/dev/null) == "string" ]] } +# ── Translations (Whisper translate-to-English) ───────────────────── + +@test "translations: returns JSON with text field" { + local body + body=$(curl -s "${BASE}/v1/audio/translations" \ + -F "model=${WHISPER_TRANSLATE_ALIAS}" \ + -F "file=@${BATS_FILE_TMPDIR}/silence.wav;filename=silence.wav") + + echo "${body}" | jq -e '.text | type == "string"' >/dev/null +} + +@test "translations: response_format=text returns plain text" { + local body + body=$(curl -s "${BASE}/v1/audio/translations" \ + -F "model=${WHISPER_TRANSLATE_ALIAS}" \ + -F "response_format=text" \ + -F "file=@${BATS_FILE_TMPDIR}/silence.wav;filename=silence.wav") + + ! echo "${body}" | jq -e '.' >/dev/null 2>&1 || [[ $(echo "${body}" | jq -r 'type' 2>/dev/null) == "string" ]] +} + +@test "translations: rejects transcription-only alias" { + local body + body=$(curl -s "${BASE}/v1/audio/translations" \ + -F "model=${WHISPER_ALIAS}" \ + -F "file=@${BATS_FILE_TMPDIR}/silence.wav;filename=silence.wav") + assert_error "${body}" "invalid_model_type" +} + # ── Cross-endpoint model type validation ────────────────────────────── @test "cross-type: chat endpoint rejects embedding model" { @@ -250,11 +285,23 @@ json_post() { assert_error "${body}" "invalid_model_type" } +@test "cross-type: translations endpoint rejects chat model" { + local body + body=$(curl -s "${BASE}/v1/audio/translations" \ + -F "model=${LLM_ALIAS}" \ + -F "file=@${BATS_FILE_TMPDIR}/silence.wav;filename=audio.wav") + assert_error "${body}" "invalid_model_type" +} + # ── Model lifecycle ─────────────────────────────────────────────────── # Run last — unloading a model affects subsequent tests. @test "DELETE /v1/models/:id unloads model" { local body + body=$(curl -s -X DELETE "${BASE}/v1/models/${WHISPER_TRANSLATE_ALIAS}") + echo "${body}" | jq -e ".id == \"${WHISPER_TRANSLATE_ALIAS}\"" >/dev/null + echo "${body}" | jq -e '.deleted == true' >/dev/null + body=$(curl -s -X DELETE "${BASE}/v1/models/${WHISPER_ALIAS}") echo "${body}" | jq -e ".id == \"${WHISPER_ALIAS}\"" >/dev/null echo "${body}" | jq -e '.deleted == true' >/dev/null @@ -263,4 +310,5 @@ json_post() { list=$(curl -sf "${BASE}/v1/models") echo "${list}" | jq -e '.data | length == 2' >/dev/null echo "${list}" | jq -e "[.data[].id] | index(\"${WHISPER_ALIAS}\") | not" >/dev/null + echo "${list}" | jq -e "[.data[].id] | index(\"${WHISPER_TRANSLATE_ALIAS}\") | not" >/dev/null } diff --git a/packages/cli/test/translations.test.ts b/packages/cli/test/translations.test.ts new file mode 100644 index 0000000000..79296aa394 --- /dev/null +++ b/packages/cli/test/translations.test.ts @@ -0,0 +1,404 @@ +import { describe, it } from 'node:test' +import assert from 'node:assert/strict' +import { PassThrough } from 'node:stream' +import type { IncomingMessage } from 'node:http' +import type { ServerResponse } from 'node:http' +import { handleTranslations } from '../src/serve/adapters/openai/routes/translations.js' +import { createModelRegistry } from '../src/serve/core/model-registry.js' +import type { ServeConfig, ResolvedModelEntry } from '../src/serve/core/model-registry.js' +import type { Logger } from '../src/logger.js' + +function buildMultipart ( + boundary: string, + fields: Record, + file?: { fieldName: string; fileName: string; data: Buffer } +): Buffer { + const parts: Buffer[] = [] + for (const [k, v] of Object.entries(fields)) { + parts.push(Buffer.from( + `--${boundary}\r\nContent-Disposition: form-data; name="${k}"\r\n\r\n${v}\r\n` + )) + } + if (file) { + parts.push(Buffer.from( + `--${boundary}\r\nContent-Disposition: form-data; name="${file.fieldName}"; filename="${file.fileName}"\r\n` + + 'Content-Type: application/octet-stream\r\n\r\n' + )) + parts.push(file.data) + parts.push(Buffer.from('\r\n')) + } + parts.push(Buffer.from(`--${boundary}--\r\n`)) + return Buffer.concat(parts) +} + +function makeMultipartRequest (body: Buffer, boundary: string): IncomingMessage { + const stream = new PassThrough() + const req = stream as unknown as IncomingMessage + req.headers = { + 'content-type': `multipart/form-data; boundary=${boundary}` + } + req.method = 'POST' + req.url = '/v1/audio/translations' + queueMicrotask(() => { + stream.end(body) + }) + return req +} + +function createMockRes (): ServerResponse & { getPayload: () => string; getStatus: () => number } { + let payload = '' + let status = 0 + const res = { + statusCode: 200, + headersSent: false, + writeHead (code: number) { + this.statusCode = code + status = code + }, + end (data?: string | Buffer) { + if (typeof data === 'string') payload = data + else if (Buffer.isBuffer(data)) payload = data.toString('utf8') + else payload = '' + }, + getPayload () { + return payload + }, + getStatus () { + return status + } + } + return res as unknown as ServerResponse & { getPayload: () => string; getStatus: () => number } +} + +function makeLogger (): Logger & { warns: string[] } { + const warns: string[] = [] + return { + error () {}, + warn (m: string) { + warns.push(m) + }, + info () {}, + debug () {}, + warns + } +} + +function resolvedEntry (overrides: Partial): ResolvedModelEntry { + return { + alias: 'en', + src: 'hyper://example/model', + sdkType: 'whispercpp-transcription', + endpointCategory: 'audio-translation', + isDefault: false, + preload: false, + config: {}, + ...overrides + } +} + +describe('handleTranslations', () => { + const boundary = 'qvac-test' + + it('rejects non-multipart Content-Type', async () => { + const stream = new PassThrough() + const req = stream as unknown as IncomingMessage + req.headers = { 'content-type': 'application/json' } + req.method = 'POST' + queueMicrotask(() => stream.end('{}')) + const res = createMockRes() + const registry = createModelRegistry() + const serveConfig: ServeConfig = { models: new Map(), defaults: new Map() } + await handleTranslations(req, res, { + registry, + serveConfig, + logger: makeLogger() + }) + assert.equal(res.getStatus(), 400) + const body = JSON.parse(res.getPayload()) as { error: { code: string } } + assert.equal(body.error.code, 'invalid_content_type') + }) + + it('rejects missing file field', async () => { + const body = buildMultipart(boundary, { model: 'en' }) + const req = makeMultipartRequest(body, boundary) + const res = createMockRes() + const registry = createModelRegistry() + const serveConfig: ServeConfig = { + models: new Map([['en', resolvedEntry({})]]), + defaults: new Map() + } + await handleTranslations(req, res, { + registry, + serveConfig, + logger: makeLogger() + }) + assert.equal(res.getStatus(), 400) + const j = JSON.parse(res.getPayload()) as { error: { code: string } } + assert.equal(j.error.code, 'missing_file') + }) + + it('rejects missing model field', async () => { + const body = buildMultipart(boundary, {}, { fieldName: 'file', fileName: 'a.wav', data: Buffer.alloc(4) }) + const req = makeMultipartRequest(body, boundary) + const res = createMockRes() + const registry = createModelRegistry() + const serveConfig: ServeConfig = { models: new Map(), defaults: new Map() } + await handleTranslations(req, res, { + registry, + serveConfig, + logger: makeLogger() + }) + assert.equal(res.getStatus(), 400) + const j = JSON.parse(res.getPayload()) as { error: { code: string } } + assert.equal(j.error.code, 'missing_model') + }) + + it('returns 404 for unknown model', async () => { + const body = buildMultipart(boundary, { model: 'missing' }, { fieldName: 'file', fileName: 'a.wav', data: Buffer.alloc(1) }) + const req = makeMultipartRequest(body, boundary) + const res = createMockRes() + const registry = createModelRegistry() + const serveConfig: ServeConfig = { models: new Map(), defaults: new Map() } + await handleTranslations(req, res, { + registry, + serveConfig, + logger: makeLogger() + }) + assert.equal(res.getStatus(), 404) + const j = JSON.parse(res.getPayload()) as { error: { code: string } } + assert.equal(j.error.code, 'model_not_found') + }) + + it('returns 400 when model is not audio-translation category', async () => { + const body = buildMultipart(boundary, { model: 'tr' }, { fieldName: 'file', fileName: 'a.wav', data: Buffer.alloc(1) }) + const req = makeMultipartRequest(body, boundary) + const res = createMockRes() + const registry = createModelRegistry() + const tr: ResolvedModelEntry = { + alias: 'tr', + src: 'hyper://x', + sdkType: 'whispercpp-transcription', + endpointCategory: 'transcription', + isDefault: false, + preload: false, + config: {} + } + const serveConfig: ServeConfig = { + models: new Map([['tr', tr]]), + defaults: new Map() + } + await handleTranslations(req, res, { + registry, + serveConfig, + logger: makeLogger() + }) + assert.equal(res.getStatus(), 400) + const j = JSON.parse(res.getPayload()) as { error: { code: string } } + assert.equal(j.error.code, 'invalid_model_type') + }) + + it('returns 503 when model is not ready', async () => { + const body = buildMultipart(boundary, { model: 'en' }, { fieldName: 'file', fileName: 'a.wav', data: Buffer.alloc(1) }) + const req = makeMultipartRequest(body, boundary) + const res = createMockRes() + const registry = createModelRegistry() + registry.register('en', { + src: 'hyper://x', + sdkType: 'whispercpp-transcription', + endpointCategory: 'audio-translation', + config: {} + }) + const serveConfig: ServeConfig = { + models: new Map([['en', resolvedEntry({})]]), + defaults: new Map() + } + await handleTranslations(req, res, { + registry, + serveConfig, + logger: makeLogger() + }) + assert.equal(res.getStatus(), 503) + const j = JSON.parse(res.getPayload()) as { error: { code: string } } + assert.equal(j.error.code, 'model_not_ready') + }) + + it('rejects language field', async () => { + const body = buildMultipart( + boundary, + { model: 'en', language: 'es' }, + { fieldName: 'file', fileName: 'a.wav', data: Buffer.alloc(1) } + ) + const req = makeMultipartRequest(body, boundary) + const res = createMockRes() + const registry = createModelRegistry() + registry.register('en', { + src: 'hyper://x', + sdkType: 'whispercpp-transcription', + endpointCategory: 'audio-translation', + config: {} + }) + registry.setReady('en', 'mid') + const serveConfig: ServeConfig = { + models: new Map([['en', resolvedEntry({})]]), + defaults: new Map() + } + await handleTranslations(req, res, { + registry, + serveConfig, + logger: makeLogger() + }) + assert.equal(res.getStatus(), 400) + const j = JSON.parse(res.getPayload()) as { error: { code: string } } + assert.equal(j.error.code, 'unsupported_param') + }) + + it('rejects unsupported response_format srt', async () => { + const body = buildMultipart( + boundary, + { model: 'en', response_format: 'srt' }, + { fieldName: 'file', fileName: 'a.wav', data: Buffer.alloc(1) } + ) + const req = makeMultipartRequest(body, boundary) + const res = createMockRes() + const registry = createModelRegistry() + registry.register('en', { + src: 'hyper://x', + sdkType: 'whispercpp-transcription', + endpointCategory: 'audio-translation', + config: {} + }) + registry.setReady('en', 'mid') + const serveConfig: ServeConfig = { + models: new Map([['en', resolvedEntry({})]]), + defaults: new Map() + } + await handleTranslations(req, res, { + registry, + serveConfig, + logger: makeLogger() + }) + assert.equal(res.getStatus(), 400) + const j = JSON.parse(res.getPayload()) as { error: { code: string } } + assert.equal(j.error.code, 'unsupported_response_format') + }) + + it('rejects invalid response_format', async () => { + const body = buildMultipart( + boundary, + { model: 'en', response_format: 'xml' }, + { fieldName: 'file', fileName: 'a.wav', data: Buffer.alloc(1) } + ) + const req = makeMultipartRequest(body, boundary) + const res = createMockRes() + const registry = createModelRegistry() + registry.register('en', { + src: 'hyper://x', + sdkType: 'whispercpp-transcription', + endpointCategory: 'audio-translation', + config: {} + }) + registry.setReady('en', 'mid') + const serveConfig: ServeConfig = { + models: new Map([['en', resolvedEntry({})]]), + defaults: new Map() + } + await handleTranslations(req, res, { + registry, + serveConfig, + logger: makeLogger() + }) + assert.equal(res.getStatus(), 400) + const j = JSON.parse(res.getPayload()) as { error: { code: string } } + assert.equal(j.error.code, 'invalid_response_format') + }) + + it('warns on temperature but still succeeds with override', async () => { + const body = buildMultipart( + boundary, + { model: 'en', temperature: '0.7' }, + { fieldName: 'file', fileName: 'a.wav', data: Buffer.alloc(1) } + ) + const req = makeMultipartRequest(body, boundary) + const res = createMockRes() + const registry = createModelRegistry() + registry.register('en', { + src: 'hyper://x', + sdkType: 'whispercpp-transcription', + endpointCategory: 'audio-translation', + config: {} + }) + registry.setReady('en', 'mid') + const serveConfig: ServeConfig = { + models: new Map([['en', resolvedEntry({})]]), + defaults: new Map() + } + const logger = makeLogger() + await handleTranslations(req, res, { + registry, + serveConfig, + logger, + transcribeOverride: async () => 'hello' + }) + assert.equal(res.getStatus(), 200) + const j = JSON.parse(res.getPayload()) as { text: string } + assert.equal(j.text, 'hello') + assert.ok((logger as Logger & { warns: string[] }).warns.some((w) => w.includes('temperature'))) + }) + + it('returns JSON text on success', async () => { + const body = buildMultipart(boundary, { model: 'en' }, { fieldName: 'file', fileName: 'a.wav', data: Buffer.alloc(2) }) + const req = makeMultipartRequest(body, boundary) + const res = createMockRes() + const registry = createModelRegistry() + registry.register('en', { + src: 'hyper://x', + sdkType: 'whispercpp-transcription', + endpointCategory: 'audio-translation', + config: {} + }) + registry.setReady('en', 'mid') + const serveConfig: ServeConfig = { + models: new Map([['en', resolvedEntry({})]]), + defaults: new Map() + } + await handleTranslations(req, res, { + registry, + serveConfig, + logger: makeLogger(), + transcribeOverride: async () => 'out' + }) + assert.equal(res.getStatus(), 200) + const j = JSON.parse(res.getPayload()) as { text: string } + assert.equal(j.text, 'out') + }) + + it('returns plain text when response_format is text', async () => { + const body = buildMultipart( + boundary, + { model: 'en', response_format: 'text' }, + { fieldName: 'file', fileName: 'a.wav', data: Buffer.alloc(1) } + ) + const req = makeMultipartRequest(body, boundary) + const res = createMockRes() + const registry = createModelRegistry() + registry.register('en', { + src: 'hyper://x', + sdkType: 'whispercpp-transcription', + endpointCategory: 'audio-translation', + config: {} + }) + registry.setReady('en', 'mid') + const serveConfig: ServeConfig = { + models: new Map([['en', resolvedEntry({})]]), + defaults: new Map() + } + await handleTranslations(req, res, { + registry, + serveConfig, + logger: makeLogger(), + transcribeOverride: async () => 'plain' + }) + assert.equal(res.getStatus(), 200) + assert.equal(res.getPayload(), 'plain') + }) +})