ci: unblock Studio Windows + Linux + Mac smoke (supersedes #5733, #5734, #5738) by danielhanchen · Pull Request #5741 · unslothai/unsloth

danielhanchen · 2026-05-23T13:20:36Z

Bundles three independent CI regressions hitting every maintainer PR in the recent backlog. Each fix is verified end-to-end on a staging fork (danielhanchen/unsloth-staging-2 PR #141) against real Ubuntu / macOS / Windows GitHub-hosted runners before this lands.

Findings on the last 20 PRs

Across PRs #5712–#5739, the same Windows Studio CI jobs fail with the same signature:

SystemError: The installed pydantic-core version (2.47.0) is incompatible with
the current pydantic version, which requires 2.46.4.

The five always-failing Windows checks are:

Windows Studio API CI / Studio API & Auth Tests
Windows Studio GGUF CI / OpenAI, Anthropic API tests
Windows Studio GGUF CI / Tool calling Tests
Windows Studio GGUF CI / JSON, images
Windows Studio UI CI / Chat UI Tests

PR #5733 already targets exactly this with a deterministic-install fix; PR #5734 then removes the four Patch Studio venv with full typer / pydantic dep trees workaround steps. Neither has merged yet, so every newer PR carries the same failure. This PR is the consolidated landing of both, plus two additional regressions that #5733 + #5734 do not cover.

Bundled fixes

1. Windows --no-torch install: pydantic-core mismatch (supersedes #5733 + #5734)

studio/backend/requirements/no-torch-runtime.txt listed pydantic and pydantic-core unpinned. Every installer entry point then ran:

uv pip install --no-deps -r no-torch-runtime.txt

so pip resolved each independently from latest and picked:

pydantic 2.13.4 (latest, requires pydantic-core==2.46.4)
pydantic-core 2.47.0 (latest, mismatched)

The next import pydantic ran _ensure_pydantic_core_version() and raised. Linux and Mac runners frequently had a matching pair cached from a prior unrelated install, so the failure was Windows-canonical but not exclusive.

Fix: Resolve pydantic WITH deps in a focused pip call before the --no-deps no-torch-runtime pass. pip then pins pydantic-core to the exact version pydantic's metadata declares, and the next import works. Applied in install.sh (migrated + first-install branches), install.ps1 (same), and studio/install_python_stack.py (update path). Removed pydantic and pydantic-core from no-torch-runtime.txt with a header comment explaining the split. Dropped the now-redundant Patch Studio venv with full typer / pydantic dep trees step from studio-windows-{api,inference,ui,update}-smoke.yml (six occurrences across the four files).

pydantic's transitive deps (annotated-types, pydantic-core, typing-extensions, typing-inspection) are torch-free, so installing WITH deps does not pull torch.

2. Linux Studio Update CI: llama.cpp falls back to source build

Upstream llama.cpp b9261+ split each binary's entry code into a paired libllama-<binary>-impl.so shared library. llama-server and llama-quantize are NEEDED-linked against libllama-server-impl.so / libllama-quantize-impl.so with RUNPATH=\$ORIGIN. The Linux entry in runtime_patterns_for_choice didn't glob the impl libs, so copy_globs skipped them, ldd reported them missing, preflight raised PrebuiltFallback, the installer fell back to source build, and studio-update-smoke annotated setup.sh idempotency regressed.

I verified by extracting llama-b9291-bin-ubuntu-x64.tar.gz directly:

$ readelf -d build/bin/llama-server | grep -E "NEEDED|RUNPATH"
NEEDED  [libllama-server-impl.so]
RUNPATH [\$ORIGIN]

Fix: Add libllama-*-impl.so* to the Linux runtime overlay pattern. Locked in tests/studio/install/test_rocm_support.py::TestRuntimePatterns::test_linux_cpu_patterns. The macOS pattern is already lib*.dylib (broad enough), and Windows is *.dll (also broad), so no change needed there.

3. Mac Studio UI Chat: change-password submit clicked while disabled (supersedes #5738)

The disable gate in auth-form.tsx only checked newPassword.length + match + non-equal-to-current. Playwright's first click landed before the current-password field's React state had committed, so the form was simultaneously logically-invalid (current_password empty) and the button was disabled. Playwright then timed out with <button disabled type="submit">.

Fix: Tighten the disable gate to require currentPassword.length >= 8 and mirror the same check in the submit handler so Enter / autofill cannot bypass.

Validation

Staging fork PR runs three minimal jobs that exercise each fix end-to-end:

✅ Staging Install (Linux) - asserts the installer never falls back to source build, confirms libllama-server-impl.so + libllama-quantize-impl.so landed in the binary's directory, ldd is clean on llama-server / llama-quantize, and llama-server --version runs purely on RUNPATH=\$ORIGIN.
✅ Staging Install (Windows) - asserts no incompatible with the current pydantic version log line and that pydantic.VERSION + pydantic_core.__version__ import to 2.13.4 / 2.46.4 (matched pair).
✅ Staging Install (macOS) - sanity-checks the shared install.sh change does not regress macOS.

Closes

Bundles three independent CI regressions hitting the maintainer PR backlog. Each one is verified end-to-end on a staging fork against real Ubuntu / macOS / Windows GitHub-hosted runners before this lands. 1. Windows --no-torch install: pydantic + pydantic-core drift to incompatible versions under `uv pip install --no-deps -r no-torch-runtime.txt` because pip resolves each independently from latest. pydantic.VERSION 2.13.4 pins pydantic-core==2.46.4 but pydantic-core 2.47.0 was the freshest published wheel, so `import pydantic` raised `SystemError: pydantic-core 2.47.0 is incompatible with the current pydantic version`. Resolve pydantic WITH deps in a focused pip call (install.sh, install.ps1, install_python_stack.py) before the --no-deps no-torch-runtime pass so pip pins pydantic-core to the version pydantic declares. pydantic's transitive deps (annotated-types, pydantic-core, typing-extensions, typing-inspection) are torch-free. Drop the redundant `Patch Studio venv with full typer / pydantic dep trees` workaround from the four Windows smoke YAMLs. Supersedes #5733 + #5734. 2. Linux Studio Update CI: upstream llama.cpp b9261+ split each binary's entry code into a paired `libllama-<binary>-impl.so` shared library. `llama-server` and `llama-quantize` NEEDED-link against `libllama-server-impl.so` / `libllama-quantize-impl.so` with RUNPATH `$ORIGIN`, so the prebuilt overlay must copy those alongside the binaries. Without that, ldd reports them missing, preflight rejects, the installer falls back to source build, and studio-update-smoke annotates `setup.sh idempotency regressed`. Add `libllama-*-impl.so*` to the Linux runtime patterns and lock the pattern in test_rocm_support.TestRuntimePatterns. 3. Mac Studio UI Chat: change-password submit clicked while disabled. The disable gate only checked new + confirm password length, but Playwright's first click landed before the current-password field's React state had committed, so the form was simultaneously logically-invalid (current_password empty) and the button was disabled. Tighten the gate to require `currentPassword.length >= 8` and mirror the same check in the submit handler so Enter / autofill cannot bypass. Supersedes #5738.

gemini-code-assist

Code Review

This pull request updates the installation scripts for Windows, Linux, and Python to install pydantic with its dependencies separately. This change ensures that pydantic-core is pinned to a compatible version, preventing errors that occur when installing with --no-deps. Correspondingly, pydantic and its dependencies were removed from the no-torch-runtime.txt requirements file. The PR also updates the llama.cpp prebuilt installer to include libllama-*-impl.so* files, which are now required by upstream changes to prevent installation fallbacks. Additionally, the frontend AuthForm was updated to enforce a minimum length for the current password during password changes. I have no feedback to provide as there were no review comments to evaluate.

#5741 already)

Comments added in #5741 explained the original bug in full each time. They are mostly redundant with the commit message and the PR. Trim them to one short paragraph per site. No behavior change.

…5741 comments (#5746) * ci: broaden Linux llama.cpp runtime pattern to lib*.so* #5741 patched the explicit Linux pattern list to add ``libllama-*-impl.so*`` after ggml-org/llama.cpp#23462 (between b9279 and b9283) split each binary's entry code into a paired ``lib<binary>-impl.so`` shared library. Same class of upstream repackaging will hit us again whenever a new shared lib is added. Mirror what macOS already does and replace the per-lib list with a single ``lib*.so*`` glob. ``copy_globs`` (line 3614) unions patterns, so the per-variant ``libggml-cuda.so*`` / ``libggml-hip.so*`` entries were never filtering anything; the spec lives in ``runtime_payload_health_groups`` (line 5209) which keeps the explicit minimum-required list per variant. Dry-run against b9296-bin-ubuntu-x64.tar.gz: 40 files copied (all ggml, llama, mtmd, impl variants + the two binaries we ship), 22 skipped (other CLIs, rpc-server, LICENSE). Functionally equal to the post-#5741 set. * cleanup: trim #5741 comments on the pydantic split Comments added in #5741 explained the original bug in full each time. They are mostly redundant with the commit message and the PR. Trim them to one short paragraph per site. No behavior change. * ci: narrow Windows runtime pattern to llama-server.exe + llama-quantize.exe Studio only invokes llama-server and llama-quantize. Mac and Linux already filter to those two binaries; Windows was the odd one out with ``*.exe`` copying every CLI upstream ships (llama-cli, llama-bench, llama-mtmd-cli, ...). Dry-run on b9296 (win cpu-x64, cpu-arm64, cuda-13.1, hip-radeon): 20 unused EXEs skipped per variant, all DLLs (incl. the new llama-*-impl.dll family) still copied via ``*.dll``. ``existing_install_matches_choice`` already checks llama-server.exe exists explicitly (line 5297), so the health gate is unchanged.

…nslothai#5741 comments (unslothai#5746) * ci: broaden Linux llama.cpp runtime pattern to lib*.so* unslothai#5741 patched the explicit Linux pattern list to add ``libllama-*-impl.so*`` after ggml-org/llama.cpp#23462 (between b9279 and b9283) split each binary's entry code into a paired ``lib<binary>-impl.so`` shared library. Same class of upstream repackaging will hit us again whenever a new shared lib is added. Mirror what macOS already does and replace the per-lib list with a single ``lib*.so*`` glob. ``copy_globs`` (line 3614) unions patterns, so the per-variant ``libggml-cuda.so*`` / ``libggml-hip.so*`` entries were never filtering anything; the spec lives in ``runtime_payload_health_groups`` (line 5209) which keeps the explicit minimum-required list per variant. Dry-run against b9296-bin-ubuntu-x64.tar.gz: 40 files copied (all ggml, llama, mtmd, impl variants + the two binaries we ship), 22 skipped (other CLIs, rpc-server, LICENSE). Functionally equal to the post-unslothai#5741 set. * cleanup: trim unslothai#5741 comments on the pydantic split Comments added in unslothai#5741 explained the original bug in full each time. They are mostly redundant with the commit message and the PR. Trim them to one short paragraph per site. No behavior change. * ci: narrow Windows runtime pattern to llama-server.exe + llama-quantize.exe Studio only invokes llama-server and llama-quantize. Mac and Linux already filter to those two binaries; Windows was the odd one out with ``*.exe`` copying every CLI upstream ships (llama-cli, llama-bench, llama-mtmd-cli, ...). Dry-run on b9296 (win cpu-x64, cpu-arm64, cuda-13.1, hip-radeon): 20 unused EXEs skipped per variant, all DLLs (incl. the new llama-*-impl.dll family) still copied via ``*.dll``. ``existing_install_matches_choice`` already checks llama-server.exe exists explicitly (line 5297), so the health gate is unchanged.

…5713) * Studio: rewrite OpenAI Responses citation markers to markdown links OpenAI's /v1/responses stream interleaves text deltas with inline citation markers built from private-use codepoints (U+E200 / U+E201 / U+E202) shaped like `citeSOURCE_ID`. The codepoints render as garbled "E202" glyphs or empty boxes in most fonts, and the markdown layer further strips them, leaving run-on text like "citeturn1view0turn1view1turn3view0...". The url list still arrived in the Sources panel via url_citation annotations, but the inline cite hand-off into the prose was unreadable. Rewrite each marker into `[N](URL)` when the matching url_citation has already been recorded on this stream, and drop the marker silently otherwise. The lookup uses a new `source_id` field captured on `_record_url_citation` (accepts source_id / id / locator across Responses API revisions). Annotations are now applied BEFORE the delta text is rewritten so that markers and their resolving annotation arriving in the same SSE event still resolve. Reference: https://developers.openai.com/api/docs/guides/citation-formatting * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Preserve every source_id alias for a deduplicated url_citation OpenAI's Responses stream cites the same URL under multiple source_id markers when the model references different spans of the same page. The previous dedup-by-URL kept only the first alias and dropped the rest, so subsequent markers for the same URL never resolved and got stripped from the prose. Switch the citation record to a ``source_ids`` list and append new aliases on every duplicate. The rewriter resolves any alias back to the same citation number so the inline markers all collapse onto one footnote rather than fanning out into bogus repeats. Also collapse the two passes over ``all_url_citations`` in ``_record_url_citation`` into a single loop for clarity. Adds two regression tests covering the alias-collision and mixed-shape cases. * ci: re-trigger after flake in Studio GGUF Tool calling (rebased on main #5741 already) * ci: re-run after transient CodeQL Python checkout auth flake * Fix split-marker buffer + multi-source ids for PR #5713 The original rewriter only handles markers that arrive whole inside a single response.output_text.delta event. OpenAI's stream chunks text on byte-buffer boundaries with no awareness of the marker grammar, so a marker can straddle two deltas (delta-1 ends with "citetu", delta-2 starts with "rn0view0"). Each delta was rewritten in isolation, so the half-marker leaked as garbled "E200/E202" glyphs in the rendered prose. Buffer the unterminated tail across deltas and concatenate it onto the front of the next one so the rewriter sees a complete marker. Flush the held-over tail on response.completed / response.incomplete / [DONE], stripping any leftover private-use bytes so a never-closed marker (truncated stream, missing annotation) never leaks. Also handle the multi-source marker shape from the OpenAI docs -- citeid1id2 should expand to one bracket link per resolvable id. The previous regex captured only the first source id and silently dropped id2/id3. Reference: https://developers.openai.com/api/docs/guides/citation-formatting Tests: 21 new cases covering multi-source, locator suffix, marker split across two and three deltas, unterminated marker on truncation, late annotation resolving a buffered marker, idempotency, and the head/tail split helper directly. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Defer citation segments until url_citation annotation arrives The split-marker buffer already concatenates a marker that straddles two response.output_text.delta events. But when the annotation event for a url_citation arrives AFTER the delta that contains its inline marker (the typical OpenAI Responses ordering), the rewriter still saw an empty lookup table at delta time and silently stripped the marker. The URL kept showing up in the sources panel but the inline link reference was permanently gone. Add _rewrite_citation_markers_partial which leaves an unresolved marker verbatim and reports has_unresolved=True. The streaming loop buffers any closed segment that contains an unresolved marker into a pending_citation_segments FIFO and drains the queue on every later annotation event, on response.completed, on response.incomplete, and on the [DONE] sentinel. Drain order is preserved so later clean text does not leapfrog an earlier deferred segment. End-of-stream forces a strip so no codepoint leaks if the annotation never arrived. Add six regression tests covering single-pass resolution, the late- annotation two-pass case, multi-source markers with partial resolution, mixed known and pending markers in one segment, and idempotency on marker-free input. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Drop unterminated citation tail to prevent cite-prefix plain-text leak `_flush_pending_marker_tail` stripped the three private-use citation codepoints from the held-over buffer, but left the literal ``cite`` keyword plus the source id behind as plain text. A stream ending mid-marker therefore emitted user-visible garbage like ``Some text citeturn0view0`` instead of the intended clean prose. ``pending_marker_tail`` is by construction the suffix that starts at an unclosed ``\\ue200`` opener -- the split helper guarantees there is no closing ``\\ue201`` byte. Without that close the marker is meaningless: the source id cannot be resolved to a URL and the user prose before the opener was already emitted as ``head`` on the originating delta. Bail out before the strip step and return the empty string. As a belt-and-braces measure also drop any orphan ``cite<sid>`` literal at the head of the buffer in case a future caller passes a partially-terminated tail. Update the matching ``_simulate_delta_stream`` harness in the edge tests so it mirrors the new flush logic, and add four regression tests covering unterminated marker with surrounding prose, marker- only inputs, prefix-only outputs, and the split-then-close path that still must resolve to a link. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Defer multi-source markers until all ids resolve for PR #5713 `_rewrite_citation_markers_partial` previously treated a marker as resolved when even one token in a multi-source marker resolved, dropping any still-pending source ids. In streamed Responses events the annotations for a multi-source marker can arrive across separate `annotation.added` chunks, so the caller no longer buffered that segment for retry and the late source id was lost from the inline citation entirely. Flag the marker unresolved whenever any token misses the lookup so the streamer keeps the segment pending. End-of-stream force flush still drops unresolved tokens through `_replace_openai_citation_markers` so locator-style suffixes (which look like unresolved ids at the token level but only appear at end-of-stream) render cleanly. Updated the multi-source test to assert the new pending-then-flush behavior; locator output now lands at force-flush rather than mid stream. * Shorten citation marker comments for PR #5713 --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Studio: expose --parallel / -np on `unsloth studio run` The CLI was hardcoding `llama_parallel_slots=4` in `run_kwargs` at `unsloth_cli/commands/studio.py`, leaving users unable to tune the concurrent decode slot count even though the engine, KV-cache math, and `studio.backend.run.run_server(llama_parallel_slots=...)` plumbing all already accepted any N. This change adds a `--parallel` / `--n-parallel` / `-np` typer option (default 4 -- matches the previous hardcoded value), forwards it into `run_kwargs`, and pins the new surface with 4 unit tests. Per-request state in `routes/inference.py` is already isolated (`cancel_event` and `prev_text` are per-request locals in every streaming handler; the `_lock` / `_serial_load_lock` only wrap load/unload, not chat completions), so no concurrency refactor is needed alongside this -- the engine layer already handles N concurrent requests on one loaded model when llama-server is told to. Range guards: 1 <= N <= 64. With higher N each slot gets ctx/N KV cache; users tuning this should be aware that per-call context shrinks proportionally. `unsloth studio` (the bare default command, no subcommand) still defaults to llama_parallel_slots=1 via `run_server`'s own default; this PR does not change that path -- it only exposes the knob on the one-liner `studio run` command that already silently used 4. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Forward --parallel through venv re-exec and drop colliding short aliases `unsloth studio run` re-execs into the Studio venv when invoked from outside it (the common path). The arg-builder forwards every typer option but the new --parallel, so the child re-execs at the default 4 and any user value is silently dropped. Worse: pre-PR users who already pass `-np N` as a pass-through extra (where llama.cpp's last-wins parsing made it stick) silently lose N after this PR lands. Forward --parallel explicitly in the re-exec arg list. While auditing the re-exec path, also drop the colliding 1-char short aliases -m (--model) and -f (--frontend) plus the redundant -hfr. Click's short-option clustering had been silently mis-parsing ~11 llama-server short flags via the pass-through path: -fa as `-f a`, -mg 0 as `-m g` + stray 0, -fitt 1024 as `-f itt` + stray 1024, -hff path as `-f f` + stray `-h path`, -cmoe / -cram / -sm / -ncmoe etc. The docstring promise ("any flag this command does not recognize is forwarded verbatim") was silently violated. -hf (2-char) is kept because Click treats multi-char shorts atomically (no clustering of -hff / -hfv / -hffv / -hft) and -hf is documented in basics/api/README.md. --model / --hf-repo / --frontend long forms all unchanged. studio_default keeps -f because it has no pass-through. Tests: - test_studio_run_parallel_flag.py: 8 new re-exec coverage cases (all 3 aliases, 3 platforms via sys.platform mock, pre-PR `-np` regression, mixed with pass-through extras). - test_studio_run_short_alias_clashes.py (new): surface checks that the removed shorts cannot reappear, plus 11 parametrized cases proving each previously-broken llama-server short flag now passes through verbatim, plus a happy-path test that documented -hf still works for `org/repo:variant` syntax. All 27 tests pass. Negative test (revert either fix) shows the new tests catch the regression. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix stale studio run docstring describing rejected llama-server flags The pre-PR docstring listed --port, -c / --ctx-size, --api-key, -ngl, --jinja, --flash-attn, --no-context-shift as "rejected with HTTP 400", but only --port and --api-key (plus other networking / auth / model identity / single-model UI flags) are actually in studio/backend/core/inference/llama_server_args.py's denylist. -c / -ngl / --jinja / --flash-attn / --no-context-shift are pass-through and last-wins-override Studio's auto-set value. Rewrite the docstring to match the real denylist groups and point at the canonical source. Also add --parallel to one of the examples now that it is a first-class flag. * ci: broaden Linux + narrow Windows llama.cpp runtime patterns + trim #5741 comments (#5746) * ci: broaden Linux llama.cpp runtime pattern to lib*.so* #5741 patched the explicit Linux pattern list to add ``libllama-*-impl.so*`` after ggml-org/llama.cpp#23462 (between b9279 and b9283) split each binary's entry code into a paired ``lib<binary>-impl.so`` shared library. Same class of upstream repackaging will hit us again whenever a new shared lib is added. Mirror what macOS already does and replace the per-lib list with a single ``lib*.so*`` glob. ``copy_globs`` (line 3614) unions patterns, so the per-variant ``libggml-cuda.so*`` / ``libggml-hip.so*`` entries were never filtering anything; the spec lives in ``runtime_payload_health_groups`` (line 5209) which keeps the explicit minimum-required list per variant. Dry-run against b9296-bin-ubuntu-x64.tar.gz: 40 files copied (all ggml, llama, mtmd, impl variants + the two binaries we ship), 22 skipped (other CLIs, rpc-server, LICENSE). Functionally equal to the post-#5741 set. * cleanup: trim #5741 comments on the pydantic split Comments added in #5741 explained the original bug in full each time. They are mostly redundant with the commit message and the PR. Trim them to one short paragraph per site. No behavior change. * ci: narrow Windows runtime pattern to llama-server.exe + llama-quantize.exe Studio only invokes llama-server and llama-quantize. Mac and Linux already filter to those two binaries; Windows was the odd one out with ``*.exe`` copying every CLI upstream ships (llama-cli, llama-bench, llama-mtmd-cli, ...). Dry-run on b9296 (win cpu-x64, cpu-arm64, cuda-13.1, hip-radeon): 20 unused EXEs skipped per variant, all DLLs (incl. the new llama-*-impl.dll family) still copied via ``*.dll``. ``existing_install_matches_choice`` already checks llama-server.exe exists explicitly (line 5297), so the health gate is unchanged. * Lower default weight_decay in RL config from 0.01 to 0.001 (#5747) In full FT, AdamW weight decay shrinks the parameter directly so the implicit prior is W -> 0. In LoRA the trained parameters are A and B while the effective weight is W = W_init + (alpha/r) * B @ A; decaying A and B separately drives BA -> 0, hence W -> W_init rather than 0. The previous default of 0.01 inherited from full-FT recipes adds a measurable pull on the merged adapter back toward the base model over a few thousand steps. 0.001 keeps a small Frobenius-norm prior on ||A||^2 + ||B||^2 for numerical stability without meaningfully biasing the merged weight toward init, and aligns with the value used across the unsloth notebook templates. * Studio: strip orphan tool_call XML leaking into visible content (#5735) * Studio: strip orphan tool_call XML from streamed visible content The speculative-buffer state machine in `studio/backend/core/inference/llama_cpp.py` can slice a tool_call XML block between the silent DRAINING path and the user-visible content_accum, depending on when in the model's emission the BUFFERING -> STREAMING -> DRAINING transitions fire. Three leak shapes were observed in a 2026-05-22 sweep of 900 Qwen3.5 / Qwen3.6 GGUF runs: Pre-fix XML leak rate: 20/900 (2.22%), concentrated 6.7% on the larger Q8 / MTP configs: Qwen3.6-35B-A3B Q8_0 4/60 (6.7%) Qwen3.6-35B-A3B-MTP Q4 4/60 (6.7%) Qwen3.5-35B-A3B Q8_0 3/60 (5.0%) Qwen3.6-27B Q8_0 3/60 (5.0%) The existing `_TOOL_XML_RE` only matched well-formed `<tool_call>...</tool_call>` and `<function=...></function>` pairs, so unterminated openings (close was DRAINED) and orphan closes (opening was DRAINED) survived the strip and reached the user. Fix relaxes the regex to also strip: 1. Orphan opening up to end-of-string: `(?:</tool_call>|\Z)` 2. Orphan closing tag: bare `</tool_call>` / `</function>` Verified on the full sweep: 20/900 -> 0/900 (100% of detected leaks eliminated). 16 unit tests in `test_tool_xml_strip.py` pin all three leak shapes plus the well-formed cases, plus parametrised checks on the 5 actual real-world leak samples from the sweep data. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: strip tail-only </parameter> orphan + tighten regex The 2026-05-22 gdpval sweep surfaced a 4th XML-leak shape not caught by the earlier regex: a bare `</parameter>\n\n` at end-of-buffer (7 of 192 trials, all Qwen3.5-27B + a few Qwen3.6-27B). The model emits the full `<tool_call><function=...><parameter=...>...content... </parameter></function></tool_call>` envelope, the speculative buffer DRAINS the opening tags as intended, but EOS (max_tokens cutoff) truncates the outer `</function></tool_call>` close, leaving just `</parameter>` as the visible tail. We strip this ONLY when end-anchored (`\s*\Z`) so legitimate mid-text uses (user code samples, documentation discussing the Qwen tool-call XML shape) survive. Verified on the 192-trial gdpval corpus: before=7, after=0. While at it, fold the five top-level alternations into three by sharing tag-name and prefix subgroups: <tool_call>... + <function=\w+>... + --> <(?:tool_call|function=\w+)>... </tool_call> | </function> --> </(?:tool_call|function)> Semantically identical (verified by replay over the 192-trial corpus + adversarial inputs, 0 diffs) and 1.34x faster on real workloads. Backtracking-safety pinned by two new perf guards (256KB '<' spam, 1000x orphan opens). Tests: 16 -> 28 (6 new functional + 4 well-formed-vs-orphan + 2 perf guards). * Tighten comments in XML-strip regex and tests Code says what it does; comments were repeating it. Strip the verbose explanations down to the WHY-only bits (engine quirk, tail-anchor rationale, real-world source of each test sample). No code changes. inference.py: 21 -> 12 lines around _TOOL_XML_RE test_tool_xml_strip.py: 343 -> 259 lines (-84) Tests: 28/28 still pass. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Address review: deny pass-through --parallel, preserve legacy short aliases, fix test harness Round 1 review fixes for #5737: 1. Deny --parallel / --n-parallel / -np in the pass-through validator. Without this, `unsloth studio run --model X --parallel 8 -- --parallel 999` would last-win-override the running llama-server slot count while Studio's app.state.llama_parallel_slots and KV-cache fitting stay at the typer value (8), so the resource plan and the running process disagree. Also bypasses the typer 1..64 range guard. Reject so the only path is the first-class typer flag. 2. Backwards-compat shim for -m / -hfr / -f. Dropping the short aliases from typer broke any script using `unsloth studio run -m X` or `-hfr Y` or `-f dist`. Add _consume_legacy_short_aliases which pops EXACT whole-token matches (or `-x=value` inline form) from ctx.args into the corresponding typer parameter. Clustered tokens (`-fa`, `-mg`, `-fitt`, ...) are left in the pass-through tail unchanged. --model becomes Optional with an explicit missing-required check after the preprocessor so legacy `-m X` still satisfies the "must specify a model" requirement. 3. Drop mix_stderr from CliRunner. Typer 0.25.1 / Click 8.4.1 removed the kwarg; the test harness raised TypeError before exercising the PR behaviour. Tests run cleanly on current and older Typer/Click. 4. Correct the -np regression test docstring. Pre-PR `-np 8` was clustered by Click as `-p 8` (port=8) + stray `-n`, silently breaking the port binding -- not "passed through as 8 slots". The post-PR assertion (child gets --parallel 8) is unchanged. 5. Update studio run docstring listing rejected flags so it now correctly includes --parallel / -np / --n-parallel. New tests: - test_llama_server_args.py: parametrized denylist coverage for --parallel / --n-parallel / -np including equals-form, including out-of-range bypass attempts (999, 0). is_managed_flag flips True. - test_studio_run_short_alias_clashes.py: legacy -m / -hfr / -f promote to typer params; --model X + -m Y conflict errors; clustered -mg / -fa / -fitt still pass through (the original bug fix holds). 132 tests pass (98 backend + 34 cli). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Extend legacy-alias shim tests for repo:variant, inline value form, and missing model Three additional edge cases for the -m / -hfr / -f preprocessor: - `-m unsloth/foo:UD-Q4_K_XL` round-trips through both the preprocessor and _split_repo_variant so the child sees --model + --gguf-variant. - `-m=foo` inline value form is promoted just like `-m foo`. - Missing --model after the preprocessor raises typer.Exit(2) cleanly (replacing typer's pre-PR required-flag enforcement now that --model is Optional to allow the legacy promotion path). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Scrub .github/workflows for staging push (matches staging base) * Fix studio CLI argv handling and pass-through docstring drift - studio/backend/core/inference/llama_server_args.py: drop the stale ``-np``/``--parallel`` entry from the docstring's pass-through tunable list. These flags moved into _DENYLIST_GROUPS so the docstring now contradicts the validator and would mislead future maintainers debugging the ValueError from validate_extra_args(["--parallel","8"]). The deleted wording was introduced by dbea77e ("Studio: forward llama-server args from `unsloth studio run`, activate `unsloth run`, and allow passing model:quant to load models") when --parallel was still a documented pass-through; the same commit's "quant" reference is about the model:quant syntax, unrelated to the parallel slot wording being deleted here. - unsloth_cli/commands/studio.py: add _expand_attached_np_short next to _consume_legacy_short_aliases. Both work around Click's short-option clustering for this command -- the legacy preprocessor for `-m` / `-f` / `-hfr` and this one for the attached `-np<N>` form. Click clusters `-np8` as `-n -p 8` because `-p` is the typer short for `--port`, silently setting port=8 and dropping the parallel value; rewriting the attached form into separated `-np <N>` in sys.argv before Click parses preserves the user's value. Space/equals forms (`-np 8`, `-np=8`) already work and are left alone. - unsloth_cli/__init__.py: import _expand_attached_np_short from the studio command and run it only when argv[0] looks like the unsloth console-script or workspace cli.py, so importing this module from a notebook or pytest run does not mutate the caller's argv. * Tighten the -np canonicaliser comments Drop the helper's co-location sentence (location is self-evident from grep) and shorten the entry-gate rationale to one short sentence covering the why. * Sync .github/workflows with upstream author branch * Sync .github/workflows with upstream author branch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Bump install.sh / install.ps1 pin to unsloth>=2026.5.7 (#5753) PyPI release unsloth 2026.5.7 is now live. Bumps the pinned floor in install.sh and install.ps1 from unsloth>=2026.5.6 to unsloth>=2026.5.7 so fresh installs resolve to the new wheel. Tagged on main as v0.1.416-beta. * Catch attached `-np<N>` form in backend pass-through validator The CLI-side `_expand_attached_np_short` rewrites `-np8` to `-np 8` before Click parses, but HTTP /load `llama_extra_args=["-np8"]` goes straight to `validate_extra_args` which only matched the exact token. Reproducer: `validate_extra_args(["-np8"])` previously returned `["-np8"]` instead of raising; once forwarded to llama-server it last-win-overrode Studio's slot count while `app.state.llama_parallel_slots` stayed at the typer value. Normalise `-np<digits>` to `-np` in `_flag_name` so the denylist catches the attached form alongside `-np`, `-np=8`, `--parallel`, `--parallel=8`, and `--n-parallel`. Tests parametrize the new form including out-of-range values. * Restore _consume_legacy_short_aliases unit tests + _expand_attached_np_short tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Restore .github/workflows from origin/main Earlier merge from claude_review's staging-scrub commits accidentally deleted production CI workflows. Restore them to main's state. * Scrub .github/workflows for staging push (matches staging base) * Sync .github/workflows with upstream author branch * Round 5+6: broaden -np gate to exact basenames + runtime parallel test Reviewer-flagged improvements squashed into one commit so the auto-push review bot doesn't keep stomping the branch: - unsloth_cli/__init__.py: exact-basename match instead of endswith('cli.py'). Covers unsloth, unsloth.exe, unsloth-cli, unsloth-cli.exe, cli.py, unsloth-cli.py. A third-party mycli.py that happens to import unsloth_cli no longer has its argv mutated. - unsloth_cli/tests/test_studio_run_parallel_flag.py: parametrised runtime test (N in {1, 4, 8, 64}) that fakes the in-venv path and asserts run_server is invoked with llama_parallel_slots=N. Complements the existing source-text check so refactors that preserve runtime semantics don't trip a false failure. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Round 7: respect '--' end-of-options and reject flag-as-value Round 7 reviewer flagged three legitimate edge cases: - _expand_attached_np_short rewrote post-'--' tokens. Convention: '--' ends option processing; payload after it is raw. Stop the loop there. - _consume_legacy_short_aliases promoted post-'--' legacy aliases for the same reason. Treat post-'--' tail as raw. - Legacy '-m -fa' silently consumed '-fa' as the model name, hiding the real CLI shape error. Reject any next-token that starts with '-' (except the lone '-' stdin/path sentinel) with a clear BadParameter. Also expanded the missing-model error string to mention the still- supported legacy '-m' / '-hfr' aliases so users hitting that diagnostic on legacy scripts get the right migration hint. Added four regression tests covering each new behaviour. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Round 8: soften flag-as-value to long-form only + normalise is_managed_flag Round 8 reviewer flagged two cleanups: - _consume_legacy_short_aliases rejected any next token starting with '-' as a flag, which would break legitimate values like '-foo' (path or model name with leading dash). Narrow the rejection to '--long' tokens only; '-x' short forms still pass through. - is_managed_flag did raw _DENYLIST membership while validate_extra_args goes through _flag_name first, so '-np8' / '--parallel=8' / '--port=9000' classified as not-managed by the helper but rejected by the validator. Route is_managed_flag through _flag_name so the two helpers agree on every form callers might use. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Round 9: also catch -np-1 / -np+1 signed attached forms in denylist Round 9 reviewer noticed _flag_name normalised -np<digits> but missed signed variants -np-1 and -np+1, so validate_extra_args waved them through while rejecting --parallel -1. llama.cpp would error out on negative slot counts anyway, but the validator should classify every form of the managed flag identically so the boundary is consistent. * Round 10: signed -np in CLI canonicaliser + reject empty inline aliases Round 10 reviewer flagged two real issues: - _expand_attached_np_short rewrote only -np<digits>; signed forms -np-1 / -np+1 fell through. Backend _flag_name already classifies them as managed, so the CLI rewriter must too -- otherwise Click clusters -np-1 into -n -p -1 (port=-1) and never reaches the backend validator at all. - -m= / -hfr= / -f= empty inline forms were accepted and produced --model '' / --frontend '' (then Path('') silently became '.') on re-exec. Reject empty inline values at the preprocessor with a clear BadParameter so the malformed input fails fast. Both behaviours pinned with parametrised regression tests. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Expose --parallel on plain `unsloth studio` for API-path parity The PR added --parallel to `unsloth studio run` but the plain `unsloth studio` callback (used for API-only / bare-server launches) still hardcoded llama_parallel_slots to its run_server default. With --parallel now denied as a llama_extra_args pass-through, that flow had no first-class way to raise concurrency. - unsloth_cli/commands/studio.py: add --parallel / --n-parallel typer Option (default 4, range 1..64) to studio_default, forward through the venv re-exec, and pass llama_parallel_slots= to run_server in the in-venv path. - studio/backend/run.py: argparse --parallel / --n-parallel with the same range guard so the spawned child accepts the forwarded flag. - unsloth_cli/tests/test_studio_run_parallel_flag.py: test pins the new option presence, aliases, default and range guards. * Round 12: narrow entry-point gate, preserve pre-PR plain-studio default, drop brittle source-text test Three Opus subagent reviewers (security / backcompat / code-quality) flagged the same handful of real issues. Consensus fixes: - unsloth_cli/__init__.py: narrow the -np canonicaliser gate to just {unsloth, unsloth.exe} (the only pyproject-declared console_script). The previous cli.py / unsloth-cli.py entries would silently rewrite sys.argv for any third-party myproj/cli.py that happens to import unsloth_cli. Dev users running python cli.py ... -np N still work via the space form, which parses without the rewrite. - unsloth_cli/commands/studio.py + studio/backend/run.py: restore the pre-PR llama_parallel_slots default of 1 on plain unsloth studio and python studio/backend/run.py. unsloth studio run keeps its hardcoded-pre-PR default of 4. Without this, my earlier API-path parity commit silently dropped per-call context to ctx/4 for the plain-studio flow. - unsloth_cli/tests/test_studio_run_parallel_flag.py: drop the brittle source-text grep test (test_run_kwargs_use_parallel_value). The parametrised runtime test test_in_venv_path_passes_parallel_to_run_server already pins the same intent against actual behaviour. - unsloth_cli/tests/test_studio_run_short_alias_clashes.py: pin the narrow entry-point gate with a parametrised negative test covering seven third-party argv[0] basenames (cli.py, /path/myproj/cli.py, pytest, unsloth-cli, etc.). Re-broadening the gate now trips a test instead of silently mutating an unrelated CLI's argv. * Round 13: shared parallel constants, denylist invariant test, defence-in-depth Three Opus subagent reviewers (adversarial-user / maintenance / cross-file consistency) flagged a consistent set of cleanups; folded into one commit to avoid the pre-commit.ci force-push race. unsloth_cli/commands/studio.py: - Extract _PARALLEL_MIN / _PARALLEL_MAX / _PARALLEL_DEFAULT_RUN / _PARALLEL_DEFAULT_PLAIN module-level constants and use them in both typer Options (plain studio_default = 1, studio run = 4). - _expand_attached_np_short now rewrites -np<junk> when the suffix starts with a digit (or signed digit) so '-np8x' surfaces as a clean '-np takes an int' typer error instead of a baffling '--port invalid' complaint after Click clusters '-n -p 8x'. - Re-exec forwarding emits --load-in-4bit / --no-load-in-4bit explicitly in both directions; previously the True default relied on both layers sharing the same default forever. - run() docstring now explicitly says --parallel / -np pass-through via llama_extra_args is denied (use the typer flag above). studio/backend/run.py: - Mirror the parallel constants and route the argparse default, range check, and error message through them. Help text mentions the asymmetry with 'unsloth studio run' so direct-launch dev users aren't confused by Default 1 in isolation. studio/backend/core/inference/llama_server_args.py: - _flag_name strips surrounding whitespace before denylist lookup so a caller can't slip a managed flag past the boundary with a trailing space (the trimmed form is what downstream parsers see). Tests: - New typer-aliases-subset-of-denylist invariant: every alias the typer Option claims as --parallel on run() MUST be in the backend parallel denylist group. Catches the failure mode where someone adds a new alias and forgets the boundary. - Extended denylist parametrize to cover ~14 previously untested aliases (-mu, -dr, -hfv/-hfrv/-hffv family, -mmu, full --ui group, --models-preset / --models-autoload / --no-models-autoload). - Whitespace-padded denylist rejection (' --parallel', '-np ', etc). - --load-in-4bit re-exec test pinning both polarities + default. - -np<junk> argv rewriter regression tests. - Cross-reference headers between the two test files. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: repair mlx studio base export save_method (#5727) * Round 14: align backend -np recogniser with CLI rewriter + reject parent --parallel Round 14 (reviewer.py --parallel 20 with gpt-5.3-codex-spark) flagged two real P1s and a stale-rebase warning. All three addressed. - studio/backend/core/inference/llama_server_args.py: widen _flag_name so -np<digit-prefix> with trailing junk (-np8x, -np-1foo, -np+1bar, -np9zzz) classifies as managed flag -np, matching the CLI _expand_attached_np_short rewriter. Without this, POST /api/inference/load with llama_extra_args=['-np8x'] slipped past the boundary while the CLI canonicalised the same form. The two sides now agree on every digit-prefix form. - unsloth_cli/commands/studio.py: reject --parallel on the studio group when a subcommand is invoked. Pre-PR the studio callback had no --parallel; my Round 12 addition made 'unsloth studio --parallel 8 run ...' silently drop the 8 because typer doesn't propagate parent options into subcommand kwargs. Now errors with exit 2 and a message pointing the operator at the correct invocation ('unsloth studio run --parallel 8 ...'). - Picked up origin/main via merge (parent commit 0caf0526): the pre-flight stale-rebase detector found 2 lines on main in studio/backend/core/export/export.py missing from PR HEAD. Merged cleanly with no conflicts. Tests: - Parametrised denylist coverage for -np<digit-prefix>+junk forms. - New runtime test confirms exit 2 + helpful error when the group --parallel is supplied alongside an invoked subcommand. - Test that the default group --parallel value still lets a subcommand resolve (no false-positive rejection). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: tighten code comments across --parallel PR Comment-only pass over the seven PR-touched files; trim verbose docstrings, collapse multi-line section dividers, and drop redundant prose that the code already conveys. No behaviour change. * Studio: trim remaining verbose docstrings missed in last pass Shorten the test_studio_run_parallel_flag.py module docstring and the `Re-exec arg-builder coverage` block. No behaviour change. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: second comment-tightening pass across PR-touched code Trim docstrings and inline comments in studio.py, run.py, llama_server_args.py, and unsloth_cli/__init__.py. No behaviour change; all 215 tests still pass. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: deny --embedding / --rerank / --tools pass-through `--embedding` and `--rerank` flip llama-server into single-endpoint mode, which breaks Studio's /v1/chat/completions hop. llama-server's own `--tools` flag silently stacks on top of Studio's tool policy resolved by `--enable-tools` / `--disable-tools`. Add all three (plus the `--embeddings` / `--reranking` plural aliases) to the boundary denylist so HTTP /load and pass-through extras both reject them cleanly instead of silently desyncing the server surface. Test added to the existing `test_denylist_rejects_all_aliases` parametrize. 220 tests pass. * Studio: make PR-touched tests robust to minimal envs + Windows Two cross-OS CI findings: 1. `test_typer_parallel_aliases_are_subset_of_backend_denylist` was doing `from core.inference.llama_server_args import _DENYLIST_GROUPS` which triggers `core/inference/__init__.py` and pulls in the full backend chain (fastapi / structlog / loggers / utils.hardware). The invariant only needs the constants tuple, so load the module directly via `importlib.util.spec_from_file_location` -- the test now runs with just typer + pytest installed. 2. `test_legacy_frontend_alias_still_promotes_to_frontend` asserted the literal string `"/tmp/dist"` after the value round-trips through `Path()`. On Windows `str(Path("/tmp/dist"))` is `"\tmp\dist"`, so the assertion tripped on the same logical path. Compare via `Path(x) == Path("/tmp/dist")` so the test passes on every OS. Both surfaced by the staging-4 cross-OS CI; no production-code change. 220 tests still pass locally. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: load llama_server_args.py directly in its unit tests Same fix as the previous CLI-test commit: import the module via `importlib.util.spec_from_file_location` instead of `from core.inference.llama_server_args import ...`, so the test no longer needs the full backend chain (fastapi / structlog / loggers / utils.hardware) installed via `core/inference/__init__.py`. The boundary validator is intentionally dependency-free; its unit tests should reflect that. * Fix test_main_composer_has_dir_auto anchor after PR #5784 PR #5784 ("Improve image generation UI") rewrote the message-input textarea's static `aria-label="Message input"` into a JSX conditional `aria-label={overlay ? "Image edit instructions" : "Message input"}` but did not update the RTL bidi-attribute regression test, leaving the literal-string `find('aria-label="Message input"')` anchor with no match. The `Repo tests (CPU)` job has been red on main since. Anchor on the inner `"Message input"` string literal instead -- it survives both spellings and still pins the same textarea element so the `dir="auto"` assertion has the right block to inspect. Verified by re-running the exact CI command: 954 passed, 3 skipped, 23 deselected (was 948 passed, 1 failed). --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Long Yixing <longyixing331@gmail.com>

Bundles three independent CI regressions hitting the maintainer PR backlog. Each one is verified end-to-end on a staging fork against real Ubuntu / macOS / Windows GitHub-hosted runners before this lands. 1. Windows --no-torch install: pydantic + pydantic-core drift to incompatible versions under `uv pip install --no-deps -r no-torch-runtime.txt` because pip resolves each independently from latest. pydantic.VERSION 2.13.4 pins pydantic-core==2.46.4 but pydantic-core 2.47.0 was the freshest published wheel, so `import pydantic` raised `SystemError: pydantic-core 2.47.0 is incompatible with the current pydantic version`. Resolve pydantic WITH deps in a focused pip call (install.sh, install.ps1, install_python_stack.py) before the --no-deps no-torch-runtime pass so pip pins pydantic-core to the version pydantic declares. pydantic's transitive deps (annotated-types, pydantic-core, typing-extensions, typing-inspection) are torch-free. Drop the redundant `Patch Studio venv with full typer / pydantic dep trees` workaround from the four Windows smoke YAMLs. Supersedes unslothai#5733 + unslothai#5734. 2. Linux Studio Update CI: upstream llama.cpp b9261+ split each binary's entry code into a paired `libllama-<binary>-impl.so` shared library. `llama-server` and `llama-quantize` NEEDED-link against `libllama-server-impl.so` / `libllama-quantize-impl.so` with RUNPATH `$ORIGIN`, so the prebuilt overlay must copy those alongside the binaries. Without that, ldd reports them missing, preflight rejects, the installer falls back to source build, and studio-update-smoke annotates `setup.sh idempotency regressed`. Add `libllama-*-impl.so*` to the Linux runtime patterns and lock the pattern in test_rocm_support.TestRuntimePatterns. 3. Mac Studio UI Chat: change-password submit clicked while disabled. The disable gate only checked new + confirm password length, but Playwright's first click landed before the current-password field's React state had committed, so the form was simultaneously logically-invalid (current_password empty) and the button was disabled. Tighten the gate to require `currentPassword.length >= 8` and mirror the same check in the submit handler so Enter / autofill cannot bypass. Supersedes unslothai#5738.

…nslothai#5741 comments (unslothai#5746) * ci: broaden Linux llama.cpp runtime pattern to lib*.so* unslothai#5741 patched the explicit Linux pattern list to add ``libllama-*-impl.so*`` after ggml-org/llama.cpp#23462 (between b9279 and b9283) split each binary's entry code into a paired ``lib<binary>-impl.so`` shared library. Same class of upstream repackaging will hit us again whenever a new shared lib is added. Mirror what macOS already does and replace the per-lib list with a single ``lib*.so*`` glob. ``copy_globs`` (line 3614) unions patterns, so the per-variant ``libggml-cuda.so*`` / ``libggml-hip.so*`` entries were never filtering anything; the spec lives in ``runtime_payload_health_groups`` (line 5209) which keeps the explicit minimum-required list per variant. Dry-run against b9296-bin-ubuntu-x64.tar.gz: 40 files copied (all ggml, llama, mtmd, impl variants + the two binaries we ship), 22 skipped (other CLIs, rpc-server, LICENSE). Functionally equal to the post-unslothai#5741 set. * cleanup: trim unslothai#5741 comments on the pydantic split Comments added in unslothai#5741 explained the original bug in full each time. They are mostly redundant with the commit message and the PR. Trim them to one short paragraph per site. No behavior change. * ci: narrow Windows runtime pattern to llama-server.exe + llama-quantize.exe Studio only invokes llama-server and llama-quantize. Mac and Linux already filter to those two binaries; Windows was the odd one out with ``*.exe`` copying every CLI upstream ships (llama-cli, llama-bench, llama-mtmd-cli, ...). Dry-run on b9296 (win cpu-x64, cpu-arm64, cuda-13.1, hip-radeon): 20 unused EXEs skipped per variant, all DLLs (incl. the new llama-*-impl.dll family) still copied via ``*.dll``. ``existing_install_matches_choice`` already checks llama-server.exe exists explicitly (line 5297), so the health gate is unchanged.

danielhanchen requested a review from rolandtannous as a code owner May 23, 2026 13:20

gemini-code-assist Bot reviewed May 23, 2026

View reviewed changes

danielhanchen merged commit 83b2097 into main May 23, 2026
49 of 51 checks passed

danielhanchen deleted the fix/ci-pydantic-llamacpp-changepw branch May 23, 2026 13:59

This was referenced May 23, 2026

Studio install: deterministic pydantic install on no-torch path #5733

Closed

ci: drop redundant pydantic/typer patch step from Windows smoke jobs #5734

Closed

Studio: gate change-password submit on current_password length #5738

Closed

danielhanchen added a commit that referenced this pull request May 23, 2026

ci: re-trigger after flake in Studio GGUF Tool calling (rebased on main

a2416fc

#5741 already)

danielhanchen mentioned this pull request May 24, 2026

ci: broaden Linux + narrow Windows llama.cpp runtime patterns + trim #5741 comments #5746

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: unblock Studio Windows + Linux + Mac smoke (supersedes #5733, #5734, #5738)#5741

ci: unblock Studio Windows + Linux + Mac smoke (supersedes #5733, #5734, #5738)#5741
danielhanchen merged 1 commit into
mainfrom
fix/ci-pydantic-llamacpp-changepw

danielhanchen commented May 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

danielhanchen commented May 23, 2026

Findings on the last 20 PRs

Bundled fixes

1. Windows --no-torch install: pydantic-core mismatch (supersedes #5733 + #5734)

2. Linux Studio Update CI: llama.cpp falls back to source build

3. Mac Studio UI Chat: change-password submit clicked while disabled (supersedes #5738)

Validation

Closes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant