feat[mod]: route MedPsy GGUF models through their embedded chat template#1985
Conversation
MedPsy ships its own Jinja chat template inside the GGUF (it injects the "You are MedPsy..." persona system prompt). Without changes, the llm addon substituted the hardcoded Qwen3 templates whenever the model architecture was qwen3, dropping MedPsy's persona and breaking its identity. Detect models via `general.basename = MedPsy` (case-insensitive) and: - ChatTemplateUtils::getChatTemplateForModel returns "" for MedPsy so common_chat_templates_init falls through to the embedded GGUF template instead of substituting the hardcoded Qwen3 templates. - LlamaModel::commonParamsParse auto-enables `params.use_jinja` when it detects the MedPsy basename, so the embedded Jinja template applies even when callers do not pass `tools: 'true'`. Adds C++ unit tests for the new isMedPsyBasename helper (null, empty, exact, mixed case, near-misses such as `MedPsy-7B` and `NotMedPsy`). Bumps @qvac/llm-llamacpp to 0.20.1. Co-authored-by: Cursor <cursoragent@cursor.com>
- toLower: resize() + transform-from-source instead of copy + in-place transform, per reviewer suggestion (avoids the value->lowered char-by-char copy and gives the optimizer a cleaner pattern to vectorize). - isMedPsyBasename: collapse early-return + comparison into a single short-circuit return. - isMedPsyModel: drop the redundant `model == nullptr` guard (getModelBasename() -> readMetadataString() already returns nullopt for a null model, and isMedPsyBasename(nullopt) returns false). The unit test IsMedPsyModelWithNullptr still passes through this path. Co-authored-by: Cursor <cursoragent@cursor.com>
Minor cleanupRun the clang-format command to clean up the cpp-lint formatting error: Prefer
|
Apply review feedback from PR tetherto#1985: - toLower(): take std::string_view and use std::ranges::transform; init the destination via std::string(size, '\0') so the buffer is sized in one allocation instead of resize() + transform(). - MEDPSY_BASENAME_LOWER: switch from `constexpr const char*` to `inline constexpr std::string_view` so the equality comparison stays alloc-free and the identifier matches the readability-identifier-naming GlobalConstantCase=UPPER_CASE convention. - isMedPsyBasename(): collapse `const std::optional<std::string>&` to `std::string_view`. Both std::nullopt and the empty string already returned false, so the optional/empty distinction was purely cosmetic. Update the LlamaModel.cpp call site to pass `value_or("")`. - normalizeArchitecture / isQwen3Architecture / isHarmonyArchitecture: pre-existing free wins in the same TU — switch to string_view, drop the temporary std::string copy in normalizeArchitecture, and make getModelArchitecture() pass the raw `arch` buffer through without an intermediate std::string allocation. - getModelArchitecture(): tighten the bounds check to static_cast<size_t>(len) < sizeof(arch) to mirror readMetadataString and silence the implicit-narrowing warning surfaced by clang-tidy 22. - Tests: drop IsMedPsyBasenameNullopt (subsumed by IsMedPsyBasenameEmpty) and rewrite the remaining isMedPsyBasename cases against string-view literals. Also re-run git-clang-format on the diff range to clear the cpp-lint format step. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Thanks for the careful pass — pushed
The pre-existing clang-tidy errors that landed in the same job ( |
|
/review |
|
/review |
1 similar comment
|
/review |
|
/review |
Summary
MedPsy ships its own Jinja chat template inside the GGUF (it injects the
"You are MedPsy..."persona system prompt). Without changes, the llm addon substituted the hardcoded Qwen3 templates whenevergeneral.architecture == qwen3, which dropped MedPsy's persona and broke its identity at runtime.This PR detects MedPsy via
general.basename = MedPsy(case-insensitive) and:ChatTemplateUtils::getChatTemplateForModelreturns""for MedPsy socommon_chat_templates_initfalls through to the embedded GGUF template instead of substituting the hardcoded Qwen3 templates. The Qwen3 reasoning state and EOS handling inTextLlmContextcontinue to apply because the architecture is stillqwen3.LlamaModel::commonParamsParseauto-enablesparams.use_jinjawhen it detects the MedPsy basename, so the embedded Jinja template applies even when callers do not passtools: 'true'. The auto-enable is gated on!use_jinja, so passingtools: 'true'continues to work.After the fix, MedPsy self-identifies correctly at runtime (e.g.
"I'm MedPsy, a medical and healthcare AI assistant developed by QVAC.").Bumps
@qvac/llm-llamacppto 0.20.1.This is a port of 0560272 onto current
main(0.20.0). The structural change is identical; the only adaptation was that thegetModelName->readMetadataStringrefactor part of the source PR was pruned to just addingreadMetadataString+getModelBasename, sincegetModelNamewas already removed in 0.20.0 (Qwen3 detection became architecture-only).Test plan
cd packages/llm-llamacpp && npm run test:cpp— new gtest cases (IsMedPsyModelWithNullptr,IsMedPsyBasenameNullopt,IsMedPsyBasenameEmpty,IsMedPsyBasenameExactMatch,IsMedPsyBasenameCaseInsensitive,IsMedPsyBasenameRejectsOtherNames) pass.cd packages/llm-llamacpp && npm run test:integration— existing integration suites continue to pass.tools: 'true'; verify the addon logs"[LlamaModel] MedPsy basename detected; auto-enabling jinja"and"[ChatTemplateUtils] MedPsy basename detected; using embedded chat template", and that the model self-identifies as MedPsy.qwen3architecture models).Made with Cursor