feat: add multi-GPU pipeline parallelism via split-mode config to embed addon by donriddo · Pull Request #1833 · tetherto/qvac

donriddo · 2026-04-30T12:59:45Z

🎯 What problem does this PR solve?

The embed addon currently pins model inference to a single GPU device. On multi-GPU systems this leaves additional GPUs idle, preventing users from leveraging pipeline (layer) or tensor (row) parallelism for embedding workloads.

📝 How does it solve it?

Adds split-mode config option (none | layer | row) to BertModel setupParams, accepting both hyphen and underscore variants and case-insensitive values. When set to layer or row, the --device flag is omitted so llama.cpp distributes embedding layers across all available GPUs.
Adds tensor-split config key (comma-separated proportions) for controlling per-GPU layer allocation in split mode.
Updates tryMainGpuFromMap in BackendSelection to accept both main-gpu and main_gpu variants with conflict detection, matching the LLM addon.
Adds getEffectiveGpuDeviceCount() to BackendSelection for GPU inventory queries.
Adds BertModel::getCommonParams() accessor to expose parsed params for testing.
Adds split-mode and tensor-split fields to GGMLConfig in index.d.ts.
Bumps version to 0.15.0.

🧪 How was it tested?

C++ unit tests (test_bert_model.cpp): 8 new split-mode cases covering none, layer, row, case-insensitive input, underscore variant, CPU fallback clears GPU params, invalid value rejection, and both keys reject.
C++ unit tests (test_backend_selection.cpp): 9 new getEffectiveGpuDeviceCount cases covering zero devices, CPU-only, single dGPU, single iGPU, two dGPUs, dGPU+iGPU, two dGPUs+iGPU, two iGPUs, and accel/CPU ignored.
JS integration tests (multi-gpu.test.js): 4 tests gated on QVAC_HAS_MULTI_GPU=1 exercising split-mode=layer, split-mode=row, default single-device, and split-mode=layer with tensor-split and main-gpu.

🔌 API Changes

// GGMLBertArgs.config now accepts:
{
  'split-mode': 'none' | 'layer' | 'row',    // optional, default none
  'tensor-split': '1,1',                     // optional, proportional split across GPUs
  'main-gpu': 0 | 'integrated' | 'dedicated' // optional, primary GPU selection
}

CPU fallback in setupParams was missing two details present in the final LLM implementation: - Set params.main_gpu = -1 on CPU fallback so llama.cpp does not retain a stale GPU index. - Reset the local splitMode variable to LLAMA_SPLIT_MODE_NONE after the CPU-fallback warning so the --device gate below emits --device correctly instead of silently suppressing it when the requested split mode was layer or row. Also add two missing BackendSelection unit tests for the main_gpu underscore alias and both-key rejection introduced in tryMainGpuFromMap, mirroring the coverage in the LLM package.

gianni-cor

Thanks for porting this over from the LLM addon. The core C++ changes look aligned with the upstream LLM implementation, but one important test-wiring difference needs fixing before merge.

packages/qvac-lib-infer-llamacpp-embed/test/integration/multi-gpu.test.js is added, but the embed package's test:integration script still runs only bare test/integration/addon.test.js. The embed PR workflow calls that script directly, so the new multi-GPU integration coverage will not run in desktop CI. In the LLM addon upstream change, test:integration generates/runs the full test/integration/*.test.js set, which is why its new multi-gpu.test.js is exercised.

Please update the embed integration test command/generation path so multi-gpu.test.js is included in CI, or otherwise wire this file into the existing integration test runner.

gianni-cor · 2026-05-01T06:28:38Z

Please also fix the failing/blocked cpp-lint check before this is re-reviewed.

github-actions · 2026-05-01T06:29:06Z

Tier-based Approval Status

**PR Tier:** TIER1

**Current Status:** ✅ APPROVED

**Requirements:**
- 1 Team Member approval ✅ (2/1)
- 1 Team Lead OR Management approval ✅ (2/1)

**Bypass rule:** Triggered (2+ Team Lead approvals (Tier 1 exception)). This PR is approved regardless of tier.

---
*This comment is automatically updated when reviews change.*

test:integration was hardcoded to addon.test.js, so multi-gpu.test.js and multi-instance.test.js were never executed in desktop CI. Switch to the same generate-then-run-all pattern used by the LLM addon: brittle -r generates test/integration/all.js from the full *.test.js glob, then bare runs it.

Apply clang-format and clang-tidy fixes flagged by the cpp-lint job: - Use std::ranges::transform in BackendSelection.cpp and BertModel.cpp - Drop else-after-return in parseMainGpu - Rename short iterator names (it -> foundIt/configIt/splitModeIt) - Use designated initializers for BackendInterface and BertEmbeddings::Layout - Drop redundant (void) on BackendInterface function pointer - Move pointer-arithmetic NOLINT to the diagnostic line in batchDecode - Extract parseSplitMode helper to bring setupParams cognitive complexity back under the threshold - Suppress non-const-global and macro-usage diagnostics in logging.hpp - Reorder includes in test_bert_model.cpp and collapse getCommonParams to a single line for clang-format

gianni-cor · 2026-05-02T08:23:45Z

/review

gianni-cor · 2026-05-02T14:02:16Z

/review

donriddo requested review from a team as code owners April 30, 2026 12:59

donriddo had a problem deploying to release April 30, 2026 13:00 — with GitHub Actions Failure

donriddo had a problem deploying to release April 30, 2026 13:01 — with GitHub Actions Failure

donriddo had a problem deploying to release April 30, 2026 13:13 — with GitHub Actions Failure

donriddo had a problem deploying to release April 30, 2026 13:14 — with GitHub Actions Failure

donriddo had a problem deploying to release April 30, 2026 13:15 — with GitHub Actions Failure

donriddo had a problem deploying to release April 30, 2026 13:21 — with GitHub Actions Failure

donriddo had a problem deploying to release April 30, 2026 13:35 — with GitHub Actions Failure

donriddo had a problem deploying to release April 30, 2026 13:38 — with GitHub Actions Failure

Merge branch 'main' into feat/embed-multi-gpu-pipeline-parallelism

b837901

gianni-cor requested changes May 1, 2026

View reviewed changes

donriddo added 5 commits May 1, 2026 12:06

Merge branch 'main' into feat/embed-multi-gpu-pipeline-parallelism

d983ea8

Merge branch 'main' into feat/embed-multi-gpu-pipeline-parallelism

58aea90

Merge branch 'main' into feat/embed-multi-gpu-pipeline-parallelism

6131b75

gianni-cor approved these changes May 2, 2026

View reviewed changes

moromisato approved these changes May 2, 2026

View reviewed changes

dev-nid approved these changes May 2, 2026

View reviewed changes

aegioscy approved these changes May 2, 2026

View reviewed changes

Merge branch 'main' into feat/embed-multi-gpu-pipeline-parallelism

de0a7f7

gianni-cor mentioned this pull request May 4, 2026

fix: improve mobile download robustness and prevent unhandled rejection crashes (embed) #1817

Merged

donriddo mentioned this pull request May 4, 2026

QVAC-17487 feat[api]: expose split-mode and tensor-split in SDK LLM config #1759

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add multi-GPU pipeline parallelism via split-mode config to embed addon#1833

feat: add multi-GPU pipeline parallelism via split-mode config to embed addon#1833
gianni-cor merged 10 commits into
tetherto:mainfrom
donriddo:feat/embed-multi-gpu-pipeline-parallelism

donriddo commented Apr 30, 2026 •

edited

Loading

Uh oh!

gianni-cor left a comment

Uh oh!

gianni-cor commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026 •

edited

Loading

Uh oh!

gianni-cor commented May 2, 2026

Uh oh!

gianni-cor commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

donriddo commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🎯 What problem does this PR solve?

📝 How does it solve it?

🧪 How was it tested?

🔌 API Changes

Uh oh!

gianni-cor left a comment

Choose a reason for hiding this comment

Uh oh!

gianni-cor commented May 1, 2026

Uh oh!

github-actions Bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tier-based Approval Status

Uh oh!

gianni-cor commented May 2, 2026

Uh oh!

gianni-cor commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

donriddo commented Apr 30, 2026 •

edited

Loading

github-actions Bot commented May 1, 2026 •

edited

Loading