feat[mod]: route MedPsy GGUF models through their embedded chat template by gianni-cor · Pull Request #1985 · tetherto/qvac

gianni-cor · 2026-05-11T16:44:57Z

Summary

MedPsy ships its own Jinja chat template inside the GGUF (it injects the "You are MedPsy..." persona system prompt). Without changes, the llm addon substituted the hardcoded Qwen3 templates whenever general.architecture == qwen3, which dropped MedPsy's persona and broke its identity at runtime.

This PR detects MedPsy via general.basename = MedPsy (case-insensitive) and:

ChatTemplateUtils::getChatTemplateForModel returns "" for MedPsy so common_chat_templates_init falls through to the embedded GGUF template instead of substituting the hardcoded Qwen3 templates. The Qwen3 reasoning state and EOS handling in TextLlmContext continue to apply because the architecture is still qwen3.
LlamaModel::commonParamsParse auto-enables params.use_jinja when it detects the MedPsy basename, so the embedded Jinja template applies even when callers do not pass tools: 'true'. The auto-enable is gated on !use_jinja, so passing tools: 'true' continues to work.

After the fix, MedPsy self-identifies correctly at runtime (e.g. "I'm MedPsy, a medical and healthcare AI assistant developed by QVAC.").

Bumps @qvac/llm-llamacpp to 0.20.1.

This is a port of 0560272 onto current main (0.20.0). The structural change is identical; the only adaptation was that the getModelName -> readMetadataString refactor part of the source PR was pruned to just adding readMetadataString + getModelBasename, since getModelName was already removed in 0.20.0 (Qwen3 detection became architecture-only).

Test plan

cd packages/llm-llamacpp && npm run test:cpp — new gtest cases (IsMedPsyModelWithNullptr, IsMedPsyBasenameNullopt, IsMedPsyBasenameEmpty, IsMedPsyBasenameExactMatch, IsMedPsyBasenameCaseInsensitive, IsMedPsyBasenameRejectsOtherNames) pass.
cd packages/llm-llamacpp && npm run test:integration — existing integration suites continue to pass.
Smoke load a MedPsy GGUF without tools: 'true'; verify the addon logs "[LlamaModel] MedPsy basename detected; auto-enabling jinja" and "[ChatTemplateUtils] MedPsy basename detected; using embedded chat template", and that the model self-identifies as MedPsy.
Load a Qwen3 model and confirm Qwen3 templates are still selected (no regression for non-MedPsy qwen3 architecture models).

Made with Cursor

MedPsy ships its own Jinja chat template inside the GGUF (it injects the "You are MedPsy..." persona system prompt). Without changes, the llm addon substituted the hardcoded Qwen3 templates whenever the model architecture was qwen3, dropping MedPsy's persona and breaking its identity. Detect models via `general.basename = MedPsy` (case-insensitive) and: - ChatTemplateUtils::getChatTemplateForModel returns "" for MedPsy so common_chat_templates_init falls through to the embedded GGUF template instead of substituting the hardcoded Qwen3 templates. - LlamaModel::commonParamsParse auto-enables `params.use_jinja` when it detects the MedPsy basename, so the embedded Jinja template applies even when callers do not pass `tools: 'true'`. Adds C++ unit tests for the new isMedPsyBasename helper (null, empty, exact, mixed case, near-misses such as `MedPsy-7B` and `NotMedPsy`). Bumps @qvac/llm-llamacpp to 0.20.1. Co-authored-by: Cursor <cursoragent@cursor.com>

- toLower: resize() + transform-from-source instead of copy + in-place transform, per reviewer suggestion (avoids the value->lowered char-by-char copy and gives the optimizer a cleaner pattern to vectorize). - isMedPsyBasename: collapse early-return + comparison into a single short-circuit return. - isMedPsyModel: drop the redundant `model == nullptr` guard (getModelBasename() -> readMetadataString() already returns nullopt for a null model, and isMedPsyBasename(nullopt) returns false). The unit test IsMedPsyModelWithNullptr still passes through this path. Co-authored-by: Cursor <cursoragent@cursor.com>

jpgaribotti · 2026-05-12T08:50:11Z

Minor cleanup

Run the clang-format command to clean up the cpp-lint formatting error: git-clang-format --binary clang-format --extensions c,cc,cpp,cxx,h,hh,hpp,hxx adcfe7254fbd476011932cea6bd908ccadf5eb7a -- packages/llm-llamacpp

Prefer `std::string_view` to `const std::string&`

1. The new `toLower` helper

std::string toLower(std::string_view value) { // std::string_view param instead of string ref
  std::string lowered(value.size(), '\0'); // initializes with sized buffer filled with zeros, instead of calling resize later
  std::ranges::transform(value, lowered.begin(), // using ranges avoids begin/end iterator pairs
                         [](unsigned char ch) { return std::tolower(ch); });
  return lowered;
}

2. The new `kMedPsyBasenameLower` constant (added by the PR)

It's currently constexpr const char*. Switching to string_view lets the comparison in isMedPsyBasename stay alloc-free:

inline constexpr std::string_view MEDPSY_BASENAME_LOWER{"medpsy"}; // also aligns with readability-identifier-naming checks

toLower(basename.value()) == MEDPSY_BASENAME_LOWER compiles unchanged because std::string::operator== overloads with string_view.

3. `isMedPsyBasename` itself (added by the PR) — optional API change

Currently:

bool isMedPsyBasename(const std::optional<std::string>& basename);

Both std::nullopt and std::string("") already return false, so the optional/empty distinction is purely cosmetic. Collapsing into:

bool isMedPsyBasename(std::string_view basename);

simplifies the call site in LlamaModel::commonParamsParse:

isMedPsyBasename(metadata_.tryGetString("general.basename").value_or(""))

…and reduces the unit-test surface (drop IsMedPsyBasenameNullopt, keep IsMedPsyBasenameEmpty). This is a public-ish helper exposed in the header, so it's an API call — fine to leave as optional<string> if the explicit unset/empty distinction has signalling value for future callers.

Pre-existing in `ChatTemplateUtils.cpp` (touched file, free wins)

These weren't introduced by the PR but live in the same translation unit and are uncovered as lint failures because the PR touched the file. Switching them now means no separate cleanup PR.

4. `normalizeArchitecture` — line 19

std::string normalizeArchitecture(const std::string& architecture) {
  std::string normalized = architecture;
  std::transform(
      normalized.begin(),
      normalized.end(),
      normalized.begin(),
      [](unsigned char c) { return std::tolower(c); });
  return normalized;
}

Already makes a copy (normalized = architecture). If you also do change #1 above, this whole function collapses to a single line:

std::string normalizeArchitecture(std::string_view architecture) {
  return toLower(architecture);
}

5. `isQwen3Architecture` — line 29 & `isHarmonyArchitecture` — line 34

Trivial parameter swap; both bodies just call normalizeArchitecture and compare against a literal.

bool isQwen3Architecture(std::string_view architecture) {
  return normalizeArchitecture(architecture) == "qwen3";
}

bool isHarmonyArchitecture(std::string_view architecture) {
  return normalizeArchitecture(architecture) == "gpt-oss";
}

The const std::string archStr = ... temporary becomes unnecessary.

Apply review feedback from PR tetherto#1985: - toLower(): take std::string_view and use std::ranges::transform; init the destination via std::string(size, '\0') so the buffer is sized in one allocation instead of resize() + transform(). - MEDPSY_BASENAME_LOWER: switch from `constexpr const char*` to `inline constexpr std::string_view` so the equality comparison stays alloc-free and the identifier matches the readability-identifier-naming GlobalConstantCase=UPPER_CASE convention. - isMedPsyBasename(): collapse `const std::optional<std::string>&` to `std::string_view`. Both std::nullopt and the empty string already returned false, so the optional/empty distinction was purely cosmetic. Update the LlamaModel.cpp call site to pass `value_or("")`. - normalizeArchitecture / isQwen3Architecture / isHarmonyArchitecture: pre-existing free wins in the same TU — switch to string_view, drop the temporary std::string copy in normalizeArchitecture, and make getModelArchitecture() pass the raw `arch` buffer through without an intermediate std::string allocation. - getModelArchitecture(): tighten the bounds check to static_cast<size_t>(len) < sizeof(arch) to mirror readMetadataString and silence the implicit-narrowing warning surfaced by clang-tidy 22. - Tests: drop IsMedPsyBasenameNullopt (subsumed by IsMedPsyBasenameEmpty) and rewrite the remaining isMedPsyBasename cases against string-view literals. Also re-run git-clang-format on the diff range to clear the cpp-lint format step. Co-authored-by: Cursor <cursoragent@cursor.com>

gianni-cor · 2026-05-12T09:04:54Z

Thanks for the careful pass — pushed 2e65be42f addressing all of your suggestions:

git-clang-format — re-ran on the diff range (adcfe7254...HEAD) and the format step now reports clang-format did not modify any files.
toLower → string_view + std::ranges::transform — applied verbatim, including the std::string lowered(value.size(), '\0') constructor so the buffer is sized in a single allocation instead of the previous resize() + std::transform.
MEDPSY_BASENAME_LOWER — switched from constexpr const char* to inline constexpr std::string_view. Bonus: also satisfies the readability-identifier-naming.GlobalConstantCase = UPPER_CASE rule from .clang-tidy.
isMedPsyBasename(std::string_view) — collapsed the optional/empty distinction (both already returned false). The LlamaModel.cpp call site now uses metadata_.tryGetString("general.basename").value_or(""), and the unit-test surface dropped IsMedPsyBasenameNullopt and rewrote the rest against string-view literals.
Pre-existing free wins in the same TU — applied to normalizeArchitecture, isQwen3Architecture, and isHarmonyArchitecture (all now string_view and one-liners), plus getModelArchitecture no longer wraps the local arch buffer in a temporary std::string. Tightened the bounds check there to static_cast<size_t>(len) < sizeof(arch) to mirror readMetadataString and silence the implicit-narrowing warning surfaced by clang-tidy 22.

The pre-existing clang-tidy errors that landed in the same job (cppcoreguidelines-avoid-do-while on QLOG_IF expansions, modernize-concat-nested-namespaces across half a dozen utils/*.hpp headers, the UTF8TokenBuffer / ScopeGuard / SharedSnapshot / LlamaFinetuningHelpers / CacheManager findings, etc.) are unrelated to this PR — they were dormant until clang-tidy 22 + --warnings-as-errors='*' started running PR-side, and --header-filter='.*' pulls them in via the touched TUs. Happy to follow up with a separate cleanup PR for those if you'd like, but I left them out of #1985 to keep the diff tight.

gianni-cor · 2026-05-12T09:47:43Z

/review

gianni-cor · 2026-05-12T13:11:59Z

/review

gianni-cor · 2026-05-12T13:32:34Z

/review

gianni-cor · 2026-05-12T13:58:07Z

/review

gianni-cor requested review from a team as code owners May 11, 2026 16:44

gianni-cor had a problem deploying to release May 11, 2026 16:45 — with GitHub Actions Failure

gianni-cor had a problem deploying to release May 11, 2026 16:49 — with GitHub Actions Failure

jesusmb1995 requested changes May 11, 2026

View reviewed changes

Comment thread packages/llm-llamacpp/addon/src/utils/ChatTemplateUtils.cpp Outdated

jesusmb1995 requested changes May 11, 2026

View reviewed changes

Comment thread packages/llm-llamacpp/addon/src/utils/ChatTemplateUtils.cpp Outdated

jesusmb1995 requested changes May 11, 2026

View reviewed changes

Comment thread packages/llm-llamacpp/addon/src/utils/ChatTemplateUtils.cpp Outdated

gianni-cor had a problem deploying to release May 11, 2026 17:41 — with GitHub Actions Failure

gianni-cor had a problem deploying to release May 11, 2026 17:43 — with GitHub Actions Failure

gianni-cor temporarily deployed to release May 11, 2026 19:00 — with GitHub Actions Inactive

gianni-cor temporarily deployed to release May 11, 2026 20:14 — with GitHub Actions Inactive

gianni-cor had a problem deploying to release May 11, 2026 20:14 — with GitHub Actions Failure

gianni-cor had a problem deploying to release May 11, 2026 20:14 — with GitHub Actions Error

gianni-cor temporarily deployed to release May 11, 2026 20:14 — with GitHub Actions Inactive

gianni-cor had a problem deploying to release May 11, 2026 20:14 — with GitHub Actions Failure

gianni-cor had a problem deploying to release May 12, 2026 07:33 — with GitHub Actions Failure

gianni-cor had a problem deploying to release May 12, 2026 07:33 — with GitHub Actions Error

gianni-cor temporarily deployed to release May 12, 2026 07:33 — with GitHub Actions Inactive

gianni-cor had a problem deploying to release May 12, 2026 07:33 — with GitHub Actions Failure

gianni-cor had a problem deploying to release May 12, 2026 07:33 — with GitHub Actions Error

jpgaribotti approved these changes May 12, 2026

View reviewed changes

jesusmb1995 approved these changes May 12, 2026

View reviewed changes

Merge branch 'main' into feat/medpsy-route-embedded-chat-template

3587285

Merge branch 'main' into feat/medpsy-route-embedded-chat-template

672387a

Merge branch 'main' into feat/medpsy-route-embedded-chat-template

dcc89df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat[mod]: route MedPsy GGUF models through their embedded chat template#1985

feat[mod]: route MedPsy GGUF models through their embedded chat template#1985
gianni-cor merged 8 commits into
tetherto:mainfrom
gianni-cor:feat/medpsy-route-embedded-chat-template

gianni-cor commented May 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jpgaribotti commented May 12, 2026 •

edited

Loading

Uh oh!

gianni-cor commented May 12, 2026

Uh oh!

gianni-cor commented May 12, 2026

Uh oh!

gianni-cor commented May 12, 2026

Uh oh!

gianni-cor commented May 12, 2026

Uh oh!

gianni-cor commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gianni-cor commented May 11, 2026

Summary

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jpgaribotti commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Minor cleanup

Prefer std::string_view to const std::string&

1. The new toLower helper

2. The new kMedPsyBasenameLower constant (added by the PR)

3. isMedPsyBasename itself (added by the PR) — optional API change

Pre-existing in ChatTemplateUtils.cpp (touched file, free wins)

4. normalizeArchitecture — line 19

5. isQwen3Architecture — line 29 & isHarmonyArchitecture — line 34

Uh oh!

gianni-cor commented May 12, 2026

Uh oh!

gianni-cor commented May 12, 2026

Uh oh!

gianni-cor commented May 12, 2026

Uh oh!

gianni-cor commented May 12, 2026

Uh oh!

gianni-cor commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jpgaribotti commented May 12, 2026 •

edited

Loading

Prefer `std::string_view` to `const std::string&`

1. The new `toLower` helper

2. The new `kMedPsyBasenameLower` constant (added by the PR)

3. `isMedPsyBasename` itself (added by the PR) — optional API change

Pre-existing in `ChatTemplateUtils.cpp` (touched file, free wins)

4. `normalizeArchitecture` — line 19

5. `isQwen3Architecture` — line 29 & `isHarmonyArchitecture` — line 34