Skip to content

fix oai-compat embedding API#220

Merged
ngxson merged 3 commits into
masterfrom
xsn/fix_oai_embd
May 13, 2026
Merged

fix oai-compat embedding API#220
ngxson merged 3 commits into
masterfrom
xsn/fix_oai_embd

Conversation

@ngxson

@ngxson ngxson commented May 13, 2026

Copy link
Copy Markdown
Owner

Fix #219

Missing format_embeddings_response_oaicompatto convert llama.cpp-specific response to OAI-compat response

Summary by CodeRabbit

  • Bug Fixes

    • Embedding responses now follow an OpenAI-compatible shape (embedding vectors under data[0].embedding).
  • Examples & Tests

    • Updated examples and tests to parse embeddings from the new response shape and validate similarity checks.
  • Chores

    • Package version bumped.
    • Default CDN WebAssembly URL and native backend reference updated.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 094fd17c-1df3-404f-986c-d88febb00397

📥 Commits

Reviewing files that changed from the base of the PR and between 4fed0d9 and e086d89.

⛔ Files ignored due to path filters (1)
  • src/wasm/wllama.wasm is excluded by !**/*.wasm
📒 Files selected for processing (2)
  • cpp/wllama-context.h
  • src/wllama.test.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/wllama.test.ts
  • cpp/wllama-context.h

📝 Walkthrough

Walkthrough

C++ now returns typed server task results and formats embedding outputs into OpenAI-compatible responses; TypeScript uses GlueMsgEmbeddingRes for embeddings; examples and tests read vectors from response.data[0].embedding; package and generated version identifiers advanced.

Changes

Embedding Response Format Standardization

Layer / File(s) Summary
C++ Result Pointer Conversion and Embedding Formatting
cpp/wllama-context.h
get_next_result() now returns server_task_result_ptr and error flag; action_get_result() detects embedding results, uses format_embeddings_response_oaicompat with model metadata for embeddings, otherwise result->to_json(), and sets res.data_json.value (or "") and res.is_error.value.
TypeScript Embedding Type Correction
src/wllama.ts
GlueMsgEmbeddingRes imported and used as the response type for createEmbedding() (calls proxy.wllamaAction with GlueMsgEmbeddingRes).
Example and Test Consumer Updates
examples/basic/index.html, examples/embeddings/index.html, src/wllama.test.ts
Examples and tests updated to extract embedding vectors from response.data[0].embedding instead of response.embedding.
Version and Generated Artifact Updates
llama.cpp, package.json, src/wasm-from-cdn.ts, src/workers-code/generated.ts
llama.cpp subproject commit advanced; package.json version bumped to 3.1.1; CDN wasm URL updated to @wllama/wllama@3.1.1; LIBLLAMA_VERSION updated.

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • ngxson/wllama#187: also updates the llama.cpp subproject commit reference and overlaps at the submodule sync point.
  • ngxson/wllama#194: touches LIBLLAMA_VERSION and the llama.cpp submodule pointer similar to this PR.

"A rabbit hums in code so neat,
Embeddings nested, tidy and sweet.
Types aligned, examples cheer,
Versions bumped — release is near.
Hop, test, and ship! 🐇✨"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix oai-compat embedding API' accurately and concisely describes the main objective: fixing the OpenAI-compatible embedding API response format.
Linked Issues check ✅ Passed The pull request addresses issue #219 by implementing OpenAI-compatible embedding response formatting in the C++ backend and updating all frontend code to parse the corrected response structure.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the embedding API response format: C++ backend changes implement OAI-compatible formatting, TypeScript/HTML frontend changes consume the new response structure, and version bumps are appropriate for a bug fix.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch xsn/fix_oai_embd

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/wllama-context.h`:
- Around line 183-189: In get_next_result(), avoid calling result->is_error()
after moving result; compute the boolean before the move (e.g., bool is_err =
result->is_error()) and then return {std::move(result), is_err}; reference the
server_task_result_ptr variable result and the is_error() call so the check
happens prior to std::move.

In `@src/wllama.test.ts`:
- Around line 183-185: The test accidentally reads the second embedding from the
first response; update the extraction to use res2 instead of res when building
embedding2 so the comparison uses two separate calls. Locate the second call to
wllama.createEmbedding (res2) and change the line that sets embedding2 to read
from res2.data[0].embedding (keeping the cast to number[]), ensuring the dot
product uses embedding and embedding2 as intended.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fe04dd23-1921-45fe-ae66-01ff6bc653b4

📥 Commits

Reviewing files that changed from the base of the PR and between b19148a and cb50257.

⛔ Files ignored due to path filters (1)
  • src/wasm/wllama.wasm is excluded by !**/*.wasm
📒 Files selected for processing (5)
  • cpp/wllama-context.h
  • examples/basic/index.html
  • examples/embeddings/index.html
  • src/wllama.test.ts
  • src/wllama.ts

Comment thread cpp/wllama-context.h
Comment thread src/wllama.test.ts
@ngxson ngxson merged commit e923fba into master May 13, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug][v3.1.0] Return type of createEmbedding() doesn't meet the actual returns

1 participant