Skip to content

feat[mod]: route MedPsy GGUF models through their embedded chat template#1985

Merged
gianni-cor merged 8 commits into
tetherto:mainfrom
gianni-cor:feat/medpsy-route-embedded-chat-template
May 12, 2026
Merged

feat[mod]: route MedPsy GGUF models through their embedded chat template#1985
gianni-cor merged 8 commits into
tetherto:mainfrom
gianni-cor:feat/medpsy-route-embedded-chat-template

Conversation

@gianni-cor

Copy link
Copy Markdown
Contributor

Summary

MedPsy ships its own Jinja chat template inside the GGUF (it injects the "You are MedPsy..." persona system prompt). Without changes, the llm addon substituted the hardcoded Qwen3 templates whenever general.architecture == qwen3, which dropped MedPsy's persona and broke its identity at runtime.

This PR detects MedPsy via general.basename = MedPsy (case-insensitive) and:

  • ChatTemplateUtils::getChatTemplateForModel returns "" for MedPsy so common_chat_templates_init falls through to the embedded GGUF template instead of substituting the hardcoded Qwen3 templates. The Qwen3 reasoning state and EOS handling in TextLlmContext continue to apply because the architecture is still qwen3.
  • LlamaModel::commonParamsParse auto-enables params.use_jinja when it detects the MedPsy basename, so the embedded Jinja template applies even when callers do not pass tools: 'true'. The auto-enable is gated on !use_jinja, so passing tools: 'true' continues to work.

After the fix, MedPsy self-identifies correctly at runtime (e.g. "I'm MedPsy, a medical and healthcare AI assistant developed by QVAC.").

Bumps @qvac/llm-llamacpp to 0.20.1.

This is a port of 0560272 onto current main (0.20.0). The structural change is identical; the only adaptation was that the getModelName -> readMetadataString refactor part of the source PR was pruned to just adding readMetadataString + getModelBasename, since getModelName was already removed in 0.20.0 (Qwen3 detection became architecture-only).

Test plan

  • cd packages/llm-llamacpp && npm run test:cpp — new gtest cases (IsMedPsyModelWithNullptr, IsMedPsyBasenameNullopt, IsMedPsyBasenameEmpty, IsMedPsyBasenameExactMatch, IsMedPsyBasenameCaseInsensitive, IsMedPsyBasenameRejectsOtherNames) pass.
  • cd packages/llm-llamacpp && npm run test:integration — existing integration suites continue to pass.
  • Smoke load a MedPsy GGUF without tools: 'true'; verify the addon logs "[LlamaModel] MedPsy basename detected; auto-enabling jinja" and "[ChatTemplateUtils] MedPsy basename detected; using embedded chat template", and that the model self-identifies as MedPsy.
  • Load a Qwen3 model and confirm Qwen3 templates are still selected (no regression for non-MedPsy qwen3 architecture models).

Made with Cursor

MedPsy ships its own Jinja chat template inside the GGUF (it injects the
"You are MedPsy..." persona system prompt). Without changes, the llm
addon substituted the hardcoded Qwen3 templates whenever the model
architecture was qwen3, dropping MedPsy's persona and breaking its
identity.

Detect models via `general.basename = MedPsy` (case-insensitive) and:

- ChatTemplateUtils::getChatTemplateForModel returns "" for MedPsy so
  common_chat_templates_init falls through to the embedded GGUF template
  instead of substituting the hardcoded Qwen3 templates.
- LlamaModel::commonParamsParse auto-enables `params.use_jinja` when it
  detects the MedPsy basename, so the embedded Jinja template applies
  even when callers do not pass `tools: 'true'`.

Adds C++ unit tests for the new isMedPsyBasename helper (null, empty,
exact, mixed case, near-misses such as `MedPsy-7B` and `NotMedPsy`).

Bumps @qvac/llm-llamacpp to 0.20.1.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread packages/llm-llamacpp/addon/src/utils/ChatTemplateUtils.cpp Outdated
Comment thread packages/llm-llamacpp/addon/src/utils/ChatTemplateUtils.cpp Outdated
Comment thread packages/llm-llamacpp/addon/src/utils/ChatTemplateUtils.cpp Outdated
- toLower: resize() + transform-from-source instead of copy + in-place
  transform, per reviewer suggestion (avoids the value->lowered char-by-char
  copy and gives the optimizer a cleaner pattern to vectorize).
- isMedPsyBasename: collapse early-return + comparison into a single
  short-circuit return.
- isMedPsyModel: drop the redundant `model == nullptr` guard
  (getModelBasename() -> readMetadataString() already returns nullopt for
  a null model, and isMedPsyBasename(nullopt) returns false). The unit
  test IsMedPsyModelWithNullptr still passes through this path.

Co-authored-by: Cursor <cursoragent@cursor.com>
@jpgaribotti

jpgaribotti commented May 12, 2026

Copy link
Copy Markdown
Contributor

Minor cleanup

Run the clang-format command to clean up the cpp-lint formatting error: git-clang-format --binary clang-format --extensions c,cc,cpp,cxx,h,hh,hpp,hxx adcfe7254fbd476011932cea6bd908ccadf5eb7a -- packages/llm-llamacpp

Prefer std::string_view to const std::string&

1. The new toLower helper

std::string toLower(std::string_view value) { // std::string_view param instead of string ref
  std::string lowered(value.size(), '\0'); // initializes with sized buffer filled with zeros, instead of calling resize later
  std::ranges::transform(value, lowered.begin(), // using ranges avoids begin/end iterator pairs
                         [](unsigned char ch) { return std::tolower(ch); });
  return lowered;
}

2. The new kMedPsyBasenameLower constant (added by the PR)

It's currently constexpr const char*. Switching to string_view lets the comparison in isMedPsyBasename stay alloc-free:

inline constexpr std::string_view MEDPSY_BASENAME_LOWER{"medpsy"}; // also aligns with readability-identifier-naming checks

toLower(basename.value()) == MEDPSY_BASENAME_LOWER compiles unchanged because std::string::operator== overloads with string_view.

3. isMedPsyBasename itself (added by the PR) — optional API change

Currently:

bool isMedPsyBasename(const std::optional<std::string>& basename);

Both std::nullopt and std::string("") already return false, so the optional/empty distinction is purely cosmetic. Collapsing into:

bool isMedPsyBasename(std::string_view basename);

simplifies the call site in LlamaModel::commonParamsParse:

isMedPsyBasename(metadata_.tryGetString("general.basename").value_or(""))

…and reduces the unit-test surface (drop IsMedPsyBasenameNullopt, keep IsMedPsyBasenameEmpty). This is a public-ish helper exposed in the header, so it's an API call — fine to leave as optional<string> if the explicit unset/empty distinction has signalling value for future callers.

Pre-existing in ChatTemplateUtils.cpp (touched file, free wins)

These weren't introduced by the PR but live in the same translation unit and are uncovered as lint failures because the PR touched the file. Switching them now means no separate cleanup PR.

4. normalizeArchitecture — line 19

std::string normalizeArchitecture(const std::string& architecture) {
  std::string normalized = architecture;
  std::transform(
      normalized.begin(),
      normalized.end(),
      normalized.begin(),
      [](unsigned char c) { return std::tolower(c); });
  return normalized;
}

Already makes a copy (normalized = architecture). If you also do change #1 above, this whole function collapses to a single line:

std::string normalizeArchitecture(std::string_view architecture) {
  return toLower(architecture);
}

5. isQwen3Architecture — line 29 & isHarmonyArchitecture — line 34

Trivial parameter swap; both bodies just call normalizeArchitecture and compare against a literal.

bool isQwen3Architecture(std::string_view architecture) {
  return normalizeArchitecture(architecture) == "qwen3";
}

bool isHarmonyArchitecture(std::string_view architecture) {
  return normalizeArchitecture(architecture) == "gpt-oss";
}

The const std::string archStr = ... temporary becomes unnecessary.

Apply review feedback from PR tetherto#1985:

- toLower(): take std::string_view and use std::ranges::transform; init
  the destination via std::string(size, '\0') so the buffer is sized in
  one allocation instead of resize() + transform().
- MEDPSY_BASENAME_LOWER: switch from `constexpr const char*` to
  `inline constexpr std::string_view` so the equality comparison stays
  alloc-free and the identifier matches the
  readability-identifier-naming GlobalConstantCase=UPPER_CASE convention.
- isMedPsyBasename(): collapse `const std::optional<std::string>&` to
  `std::string_view`. Both std::nullopt and the empty string already
  returned false, so the optional/empty distinction was purely cosmetic.
  Update the LlamaModel.cpp call site to pass `value_or("")`.
- normalizeArchitecture / isQwen3Architecture / isHarmonyArchitecture:
  pre-existing free wins in the same TU — switch to string_view, drop
  the temporary std::string copy in normalizeArchitecture, and make
  getModelArchitecture() pass the raw `arch` buffer through without an
  intermediate std::string allocation.
- getModelArchitecture(): tighten the bounds check to
  static_cast<size_t>(len) < sizeof(arch) to mirror readMetadataString
  and silence the implicit-narrowing warning surfaced by clang-tidy 22.
- Tests: drop IsMedPsyBasenameNullopt (subsumed by IsMedPsyBasenameEmpty)
  and rewrite the remaining isMedPsyBasename cases against string-view
  literals.

Also re-run git-clang-format on the diff range to clear the
cpp-lint format step.

Co-authored-by: Cursor <cursoragent@cursor.com>
@gianni-cor

Copy link
Copy Markdown
Contributor Author

Thanks for the careful pass — pushed 2e65be42f addressing all of your suggestions:

  1. git-clang-format — re-ran on the diff range (adcfe7254...HEAD) and the format step now reports clang-format did not modify any files.
  2. toLowerstring_view + std::ranges::transform — applied verbatim, including the std::string lowered(value.size(), '\0') constructor so the buffer is sized in a single allocation instead of the previous resize() + std::transform.
  3. MEDPSY_BASENAME_LOWER — switched from constexpr const char* to inline constexpr std::string_view. Bonus: also satisfies the readability-identifier-naming.GlobalConstantCase = UPPER_CASE rule from .clang-tidy.
  4. isMedPsyBasename(std::string_view) — collapsed the optional/empty distinction (both already returned false). The LlamaModel.cpp call site now uses metadata_.tryGetString("general.basename").value_or(""), and the unit-test surface dropped IsMedPsyBasenameNullopt and rewrote the rest against string-view literals.
  5. Pre-existing free wins in the same TU — applied to normalizeArchitecture, isQwen3Architecture, and isHarmonyArchitecture (all now string_view and one-liners), plus getModelArchitecture no longer wraps the local arch buffer in a temporary std::string. Tightened the bounds check there to static_cast<size_t>(len) < sizeof(arch) to mirror readMetadataString and silence the implicit-narrowing warning surfaced by clang-tidy 22.

The pre-existing clang-tidy errors that landed in the same job (cppcoreguidelines-avoid-do-while on QLOG_IF expansions, modernize-concat-nested-namespaces across half a dozen utils/*.hpp headers, the UTF8TokenBuffer / ScopeGuard / SharedSnapshot / LlamaFinetuningHelpers / CacheManager findings, etc.) are unrelated to this PR — they were dormant until clang-tidy 22 + --warnings-as-errors='*' started running PR-side, and --header-filter='.*' pulls them in via the touched TUs. Happy to follow up with a separate cleanup PR for those if you'd like, but I left them out of #1985 to keep the diff tight.

@gianni-cor

Copy link
Copy Markdown
Contributor Author

/review

@gianni-cor

Copy link
Copy Markdown
Contributor Author

/review

1 similar comment
@gianni-cor

Copy link
Copy Markdown
Contributor Author

/review

@gianni-cor

Copy link
Copy Markdown
Contributor Author

/review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants