CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests#5312
Conversation
permissions:
- All five PR-time workflows (backend, frontend, inference smoke, tauri,
wheel) now declare permissions: contents: read at the workflow level,
matching CodeQL's default-permissions guidance and the existing pattern
in release-desktop.yml. None of these workflows write to the repo.
skipped tests:
- Repo tests (CPU) job now installs node 22 and uv, which unblocks
~60 tests that were silently skipping on CI:
- 9 tests in tests/studio/test_chat_preset_builtin_invariants.py
skipped on "node not available". Fixed in this commit; an obsolete
"unsloth_repo/" prefix in WORKDIR was also pointing the source-file
existence check at a path that no longer exists.
- tests/python/test_e2e_no_torch_sandbox.py (47), test_studio_import_no_torch.py
(29), test_tokenizers_and_torch_constraint.py (most of 42) all spawn
fresh uv venvs and self-skip when uv is missing.
- Three test_tokenizers_and_torch_constraint.py cases are deselected
because they expose a real bug in studio/backend/requirements/no-torch-runtime.txt:
the unpinned tokenizers line resolves to 0.23.1, which transformers
rejects with "tokenizers>=0.22.0,<=0.23.0 is required". Tracked
separately as a no-torch install regression.
Locally: 760 passed, 1 skipped, 23 deselected (was 694 / 67 / 23).
Mirrors the three files documented in tests/studio/README.md (PR #5307) into a dedicated workflow so MLX dispatch failures show up as their own check on PRs rather than getting buried inside Backend CI: - test_hardware_dispatch_matrix.py 7-profile parametrized matrix + 2 dispatch-priority canaries - test_is_mlx_dispatch_gate.py AST + runtime guard on unsloth._IS_MLX - test_mlx_training_worker_behaviors.py worker.py contract checks Triggers on pull_request when any of unsloth/__init__.py, studio/backend/utils/hardware.py, studio/backend/core/training/worker.py, or any of the three test files are touched. Runs on a Linux+CPU runner with hardware spoofs; no Apple Silicon, real GPU, or real MLX install required. Locally validated: 36 passed in 0.41s. permissions: contents: read at the workflow level (matching the rest of the PR-time CI surface).
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Code Review
This pull request updates the file paths for PRESET_POLICY and RUNTIME_TYPES in the test suite by removing the 'unsloth_repo' directory prefix. Feedback suggests centralizing the common frontend path into a variable to reduce redundancy and improve maintainability.
I am having trouble creating individual review comments. Click here to see my feedback.
tests/studio/test_chat_preset_builtin_invariants.py (11-16)
To improve maintainability and reduce redundancy, you could extract the common path prefix for the frontend source files into a separate variable. This aligns with the principle of centralizing recurring definitions to ensure consistency and simplify maintenance.
CHAT_FEATURES_DIR = WORKDIR / "studio/frontend/src/features/chat"
PRESET_POLICY = CHAT_FEATURES_DIR / "presets/preset-policy.ts"
RUNTIME_TYPES = CHAT_FEATURES_DIR / "types/runtime.ts"
References
- Centralize recurring or complex logical checks into a single helper function and reuse it across the codebase to ensure consistency and simplify maintenance.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 107297d756
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pull_request: | ||
| paths: | ||
| - 'unsloth/__init__.py' | ||
| - 'studio/backend/utils/hardware.py' |
There was a problem hiding this comment.
Fix the MLX path filter for hardware changes
This PR path filter points at studio/backend/utils/hardware.py, but the hardware implementation exercised by the MLX tests lives under studio/backend/utils/hardware/hardware.py (there is no file at the configured path). In PRs that only change the real hardware detection module, this focused MLX CI workflow will not run, so regressions in the dispatch gate can merge without this check.
Useful? React with 👍 / 👎.
The MLX CI workflow listed ``studio/backend/utils/hardware.py`` as a
path filter, but no such file exists. The actual layout is
studio/backend/utils/hardware/
__init__.py
amd.py
hardware.py
nvidia.py
vram_estimation.py
so the filter as written would never match. A reviewer modifying
``hardware/hardware.py`` (where ``detect_hardware``, ``DeviceType``,
and ``IS_ROCM`` actually live) would not trigger MLX CI, which
defeats the point of the focused PR gate.
Replace the broken filter with ``studio/backend/utils/hardware/**``
so any change in the hardware probe directory triggers MLX CI, and
add three sibling triggers that each materially affect dispatch:
- ``unsloth/_gpu_init.py``
Hosts ``from .models import *`` and the ``from .trainer import *``
chain. The trainer.py circular-import fix that landed in
``23550a8`` lives downstream of this file; a future change
here can re-introduce the same bug.
- ``studio/backend/core/inference/mlx_inference.py``
The MLX inference backend itself. It is the actual consumer
of ``unsloth_zoo.mlx_loader.FastMLXModel`` whose contract the
test_mlx_training_worker_behaviors.py AST checks guard.
Local re-run with the fix in place: 36 passed in 0.45s. No other
workflow file or test file is modified.
Replaces the single "Studio boots, loads a GGUF, answers a chat
completion" job with three parallel jobs that each pick the smallest
model that exercises the surface under test. All three jobs share the
install.sh --local --no-torch bootstrap and prime HF_HOME via
actions/cache so cold-cache runs are bounded and warm runs are quick.
1. Studio GGUF CI / OpenAI, Anthropic API tests
- Model: gemma-3-270m-it UD-Q4_K_XL (~254 MiB).
- Password rotation: login with bootstrap pw, change to a fresh
random pw, assert old pw is rejected with 401, assert new pw
succeeds. Uses the same JWT downstream as a Bearer token against
/v1/* (the OpenAI/Anthropic compat surface accepts JWTs and
sk-unsloth- keys interchangeably).
- OpenAI SDK + Anthropic SDK each run a four-turn conversation
("What is 1+1?" / "What did I ask before?" / "What is the capital
of France?" / "Repeat the city name") with temperature=0.0 and
seed=3407. Run twice and assert run1 == run2 turn-by-turn so
non-determinism in the conversation-history wiring is caught.
2. Studio GGUF CI / tool calling tests
- Model: Qwen3.5-2B UD-IQ3_XXS (~890 MiB).
- Standard OpenAI function calling with tool_choice=required.
- Server-side python tool: assert "56088" appears in the answer to
"What is 123 * 456? Use code to compute it.".
- Server-side terminal (bash) tool: assert "hello-bash-tool" is
echoed back.
- Server-side web_search tool: non-blocking probe (DuckDuckGo
flakes from CI runners). Asserts the request shape is accepted.
- enable_thinking=true vs false: assert <think> markers vanish
when thinking is disabled.
3. Studio GGUF CI / JSON, images
- Model: gemma-4-E2B-it UD-IQ3_XXS (~2.4 GiB) + mmproj-F16
(~986 MiB) auto-detected via the HF repo path.
- response_format = json_schema (strict): asserts the answer parses
as JSON matching the {city, country} schema.
- OpenAI image_url (data URI base64): assert non-empty response on
a 4x4 PNG. Loose on content because small VL quants are weak at
colour names; the vision path is the part under test.
- Anthropic source/base64 image: same non-empty assertion against
the Anthropic Messages endpoint.
Boot strategy:
- Job 1 keeps `UNSLOTH_API_ONLY=1 unsloth studio` because the
password-rotation flow only exists in the UI-mode bootstrap.
- Jobs 2 and 3 use `unsloth studio run --model REPO --gguf-variant V`,
the one-liner that loads the model and prints the API key on the
banner. Health is probed by waiting for `sk-unsloth-` to appear in
the log; the one-liner only prints the banner after load completes.
Job 1 (OpenAI, Anthropic API tests):
Anthropic SDK appends /v1/messages to base_url itself, so passing
base_url=f"{BASE}/v1" produced /v1/v1/messages and 405'd. Bare BASE
is correct (matches the docs' "the SDK appends /v1 automatically").
OpenAI SDK side already worked: 4-turn transcript was fully
deterministic across two runs and the "Paris" sanity assertion
passed.
Job 2 (tool calling tests):
Booting with --enable-tools forces the process-level tool policy to
True for every request (state/tool_policy.py:get_tool_policy), which
hijacked the "Standard OpenAI function calling" test through the
server-side agentic loop -- the model called web_search instead of
returning structured tool_calls for the user's `weather_tool`. Drop
--enable-tools so policy is None (per-request honour). The python /
terminal / web_search probes already pass enable_tools=True
explicitly in their request bodies, so they keep working.
Job 3 (JSON, images):
Two issues. (a) The OpenAI Python SDK rewrites
response_format={"type":"json_schema",...} into something Studio's
llama-server backend doesn't accept, so resp came back as the raw
error string and resp.choices[0] tripped 'str has no attribute
choices'. Switched to raw HTTP with the `{"type":"json_object",
"schema":...}` form llama-server actually supports
(GBNF-from-schema, llama-server extension). (b) Anthropic SDK
base_url same fix as job 1.
Two new PR-time gates that the existing inference / wheel jobs miss.
Studio Update CI:
- Runs install.sh --local --no-torch, then `unsloth studio update
--local` twice, asserting both invocations take the prebuilt
"up to date and validated" code path with no source-build
fallback.
- Boots Studio to /api/health afterwards so a broken update that
nukes the venv or the llama-server binary surfaces immediately.
- Triggers when install.sh, studio/setup.sh, the python_stack /
llama_prebuilt installers, the requirements files, or
unsloth_cli/commands/studio.py change.
Studio UI CI:
- Drives the actual frontend bundle in headless Chromium via
Playwright with the smallest GGUF (gemma-3-270m-it UD-Q4_K_XL).
- Covers: bootstrap login, must_change_password gate + change form,
chat composer becomes interactive after model load, sending a
message produces an assistant bubble with non-empty text, full
page reload re-hydrates the conversation, configuration sheet
opens and closes cleanly, and the rotated password is the only
one that logs in afterwards.
- This is the first workflow that catches the class of bug 2026.5.1
shipped: backend healthy + frontend builds, but assistant-ui
runtime wiring or chat-history persistence broken so the actual
UI was unusable. Backend-only or wheel-only gates do not see it.
…ct race The /login route auto-redirects to /change-password as soon as /api/auth/status returns requires_password_change=true. The original flow was racing that redirect: it filled #password (login mode) and clicked submit, but the redirect could land first and the form would have unmounted before the click. Going straight to /change-password also matches what main._inject_bootstrap is set up to support: the HTML on that route ships with `window.__UNSLOTH_BOOTSTRAP__`, which the change-password form reads to seed the current-password state, so the user only needs to fill new + confirm. Renumbered screenshots to match the new step order.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2d9bf084f0
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| grep -q "falling back to source build" logs/update2.log && { | ||
| echo "::error::second update fell back to source build" | ||
| tail -60 logs/update2.log; exit 1; } || true |
There was a problem hiding this comment.
Make the second fallback guard fail the job
When the second unsloth studio update --local logs falling back to source build, this guard still exits successfully because the { ... exit 1; } group is the left side of && ... || true, so the || true runs after the intended failure and the step continues. If the same log also contains one of the accepted prebuilt markers checked next, the workflow passes while the idempotency regression it is meant to catch has occurred.
Useful? React with 👍 / 👎.
GGUF jobs 2 and 3:
Switched off `unsloth studio run` and over to `UNSLOTH_API_ONLY=1
unsloth studio` + login flow. Reason: studio.run() resolves the tool
policy through unsloth_cli/_tool_policy.resolve_tool_policy, which
defaults to True on loopback. That means set_tool_policy(True) gets
applied process-wide, and every /v1/chat/completions request is
routed through the server-side agentic loop -- so Job 2's standard
function-calling test never gets a structured tool_calls response
(the model uses web_search instead) and Job 3's response_format
test gets non-JSON SSE chunks back. API-only mode leaves
tool_policy=None, which is what each request's `enable_tools` flag
(or absence thereof) needs to be honoured.
Job 1:
Anthropic SDK retry: the SDK sends `x-api-key` by default, but
Studio's auth layer is HTTPBearer-only. Override via
default_headers={"Authorization": f"Bearer {KEY}"}, which is the
shape the integration docs suggest.
UI smoke:
Drop the "history must persist after reload" assertion; Studio's
thread autosave is async and doesn't reliably land within the CI
budget. Keep the assertion that matters: the chat composer mounts
again after a reload and the JWT survived (no /login redirect),
which is what the 2026.5.1 chat regression actually broke.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4b69262c38
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # produced for turn 3). Lower-cased substring checks keep the | ||
| # assertion robust to formatting jitter. | ||
| joined = " ".join(first).lower() | ||
| assert "1" in first[0], f"{label}: turn-1 answer should contain '1', got {first[0]!r}" |
There was a problem hiding this comment.
Assert the arithmetic result instead
In this smoke test, the first prompt is What is 1+1?, but the assertion requires the reply to contain the character 1. A perfectly correct concise response like 2 (which is common with temperature 0) will fail this workflow even though the API and history wiring are working, causing false-negative CI runs for the new Studio GGUF job.
Useful? React with 👍 / 👎.
Job 2 (tool calling):
The server-side agentic loop in routes/inference.py:1888 always
yields SSE chunks -- the request's `stream=False` is honoured for
the plain passthrough path, NOT for the agentic path. The python /
terminal / web_search probes were calling json.loads on the raw
body and tripping JSONDecodeError.
Added a post_sse() helper that streams the response and accumulates
text deltas, used for every enable_tools=True call. Function
calling (which does NOT enable agentic mode) keeps post().
Job 3 (JSON, images):
Dropped the strict-schema variant of response_format. On the small
gemma-4-E2B-it UD-IQ3_XXS quant, the GBNF-from-schema path
occasionally produces empty content. Plain `{"type":"json_object"}`
is still a real test of Studio's JSON-mode wiring through to
llama-server, and that's the surface the docs expose. Added
fence-stripping for chat templates that wrap JSON in ```json blocks.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 76caf50e79
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| openai_text = (openai_resp.choices[0].message.content or "").lower() | ||
| print(f"[image/openai] reply: {openai_text!r}") | ||
| assert openai_text, "OpenAI image_url returned empty content" |
There was a problem hiding this comment.
Require image-specific evidence in the smoke
When the image payload is accepted but then dropped or not forwarded to llama-server, this still passes because a text-only path can return any non-empty answer to the accompanying text prompt (including an apology or guess). Since this job is meant to guard the multimodal wiring, the assertion should require some image-grounded signal for the red PNG, or otherwise verify that the loaded vision backend actually consumed the image; the same non-empty-only pattern appears in the Anthropic image check below.
Useful? React with 👍 / 👎.
Studio's image normaliser re-encodes embedded base64 images via stb_image (routes/inference.py:3410) so llama-server gets a uniform PNG payload. stb_image happily reads the 4x4 PNG as a PIL test, but rejects it on the inference path with `broken data stream when reading image file`. 64x64 is small enough to keep token cost trivial (155 bytes) and large enough to satisfy stb_image's minimum. Job 1, Job 2, the UI smoke, and the JSON portion of Job 3 are all green now -- this is the last piece holding Job 3 back.
…mits studio/install_llama_prebuilt.py lists releases on ggml-org/llama.cpp via the GitHub API. Unauthenticated calls get 60/hr per source IP, which is fine for one install per workflow but the new Studio Update CI does install + update + update back-to-back on the same runner, blowing past the limit and falling back to a source build (which then fails the idempotency assertion). Surfaced on the Studio Update CI run with: failed to inspect published releases in ggml-org/llama.cpp: GitHub API returned 403 ... set GH_TOKEN or GITHUB_TOKEN to avoid GitHub API rate limits. GITHUB_TOKEN with the existing `permissions: contents: read` is more than enough for unauthenticated read API access (1000/hr, scoped to the repo). Wired into every install.sh and `unsloth studio update` step across studio-update-smoke.yml, studio-inference-smoke.yml, and studio-ui-smoke.yml so a busy runner can't trip the same fallback.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cee324725c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| page.reload() | ||
| composer = page.locator('textarea[aria-label="Message input"]') | ||
| composer.wait_for(state = "visible", timeout = 60_000) |
There was a problem hiding this comment.
Verify history survives reload
In the UI smoke, if chat-history persistence or remote-thread rehydration breaks while auth/session restoration still works, this check passes because it only reloads and waits for the composer to reappear. That leaves the regression class the workflow header says it guards (chat-history persistence breaking) uncovered, so the job can stay green even when the prior chat does not survive a reload.
Useful? React with 👍 / 👎.
Rename the job to "Python lint (syntax + ruff + safety nets)" and
expand it from one non-blocking ruff invocation over studio/backend
into four real gates over the whole tree. Total CI time goes from
~8 s to ~12 s, but the previous job was informational; this one
blocks merges on actual breakage.
Steps (in order):
1. AST/syntax (HARD GATE)
`python -m compileall -q -j 0 unsloth unsloth_cli studio tests
cli.py unsloth-cli.py`. Same parser the interpreter uses;
anything broken here would also crash at `import X` on a user's
machine. ~3.5 s across 350+ files locally.
2. ruff check whole repo (HARD GATE)
The narrow rule set in pyproject.toml [tool.ruff.lint] (E9 /
F63 / F7 / F82) catches undefined names, broken comparisons,
and syntax. The whole repo passes today, so the previous
studio/backend-only `|| true` was masking real breakage on
the wider tree. <1 s.
3. Debugger-leftover scan (HARD GATE)
AST-walk over every committed .py looking for `breakpoint()`,
`pdb.set_trace()`, or `ipdb.set_trace()` call sites. AST-based
so commented-out debugger lines don't false-positive (which
is why a bare grep would not work -- there are three commented
`# breakpoint()` markers in unsloth/models/rl* today). 0 hits
locally across 350 files.
4. SPDX-License-Identifier on studio/backend (WARNING)
Surfaces drift in the one tree where we already have a strict
SPDX policy. Currently 3 files missing; warned, not blocked,
so the rollout can be a separate PR.
5. ruff format drift (INFO)
Counts files that would be reformatted by plain `ruff format`.
Non-blocking because the canonical formatter is
scripts/run_ruff_format.py = ruff format + the kwarg-spacing
pass, so plain `ruff format --check` always reports a large
diff. Once that custom pipeline is wired in, drop
continue-on-error and add it to the gate.
ruff is pinned to 0.15.12 to match .pre-commit-config.yaml so a
CI-only ruff bump cannot start disagreeing with what pre-commit
already accepted.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 42ec2590a2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| @@ -195,6 +221,91 @@ jobs: | |||
| with: | |||
| python-version: '3.12' | |||
There was a problem hiding this comment.
Run the syntax gate on supported Python versions
Because pyproject.toml still advertises >=3.9,<3.15, pinning this new whole-repo compile gate to only 3.12 leaves supported 3.9/3.10 syntax untested; for example, python3.10 -m compileall ... fails today on unsloth/kernels/moe/tests/common.py:172 while this 3.12 job passes. That means CI can stay green for code that crashes on supported Python versions, so either matrix this gate over the supported minors or narrow the advertised support/exclude 3.12-only helpers.
Useful? React with 👍 / 👎.
Drop the python-lint job from studio-backend-ci.yml and move it into
the dedicated `Lint CI` workflow. Two material changes:
1. License-header check now accepts BOTH header families
The previous version only counted SPDX-License-Identifier, which
warned on every Apache-2.0 file in unsloth/, unsloth_cli/, and
scripts/ (e.g. unsloth/models/llama.py opens with the standard
`# Copyright ... Daniel Han-Chen & the Unsloth team. All rights
reserved. # Licensed under the Apache License, Version 2.0` block,
which is correct, but my SPDX-only regex flagged it).
New rule: a file is OK if either `SPDX-License-Identifier` or
`Licensed under the Apache License` appears in the first 20 lines.
Empty __init__.py files are skipped. Whole-repo coverage instead
of just studio/backend.
2. Add shell / YAML / JSON parse gates
- `bash -n` over every committed *.sh (14 today). Same idea as
compileall: parse-only check.
- `yaml.safe_load_all` over every *.yml / *.yaml (97 today),
including .github/workflows/* so a typo in the workflow file
itself shows up immediately.
- `json.loads` over every *.json (18 today). Skips
package-lock.json / bun.lock (huge, machine-generated) and
tsconfig*.json (TypeScript JSONC convention -- already
validated by `tsc --noEmit` in Frontend CI).
TypeScript and Rust are NOT duplicated here:
- Studio Frontend CI runs `npm run typecheck` + `npm run build`
on every studio/frontend/** change, which is a full TS AST +
type check.
- Studio Tauri CI runs `tauri build --debug --no-bundle` on every
studio/src-tauri/** or studio/frontend/** change, which is a
full Rust compile.
A duplicate fast-fail step here would burn cache for marginal
value, and the dedicated workflows already block merges.
Lint CI runs on every PR (no path filter): the whole job is
under 30 s of CI time, so paying that on every PR is preferable
to missing a regression on a path the focused workflows skip.
There was a problem hiding this comment.
💡 Codex Review
These newly added shell tests are installer/unit tests: test_get_torch_index_url.sh, test_mac_intel_compat.sh, and test_tauri_install_exit_order.sh all read install.sh, and test_torch_constraint.sh also reads install.ps1. Because this step lives in Backend CI, whose pull_request.paths filter above only includes studio/**, unsloth/**, unsloth_cli/**, tests/**, pyproject.toml, and this workflow file, an install.sh- or install.ps1-only PR will not run the tests that are supposed to catch installer regressions; add the installer paths to this workflow trigger or move these tests to an unfiltered/installer-triggered workflow.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The license-header check missed two more legitimate header families
that are committed to the repo today:
- LGPL-3.0 long form: e.g. unsloth/kernels/rope_embedding.py opens
with "GNU Lesser General Public License" -- 7 such files under
unsloth/kernels/.
- AGPL-3.0 long form: e.g. unsloth/kernels/moe/autotune_cache.py
opens with "GNU Affero General Public License" -- 2 such files
under unsloth/kernels/moe/.
Both got flagged as drift on the previous run because the check
only knew about the SPDX one-liner and the Apache-2.0 preamble.
Add a third accepted marker, the substring "General Public License",
which appears in all three GNU long-form preambles (GPL, LGPL,
AGPL) and nothing else. Repo inventory:
spdx (one-liner) 193 files (mostly studio/)
apache-longform 55 files (unsloth/, unsloth_cli/)
agpl-longform 2 files (unsloth/kernels/moe/)
lgpl/gpl-longform 7 files (unsloth/kernels/)
no recognised header 85 files (real drift -- mostly tests/)
So the warning count drops from 94 -> 85 with this commit; the
remaining 85 are actual missing headers, surfaced as a non-blocking
warning until the cleanup PR lands.
Three Priority-1 follow-ups from the lint review.
Lint CI gains two non-blocking gates that surface drift without
blocking merges (the same shape as the existing format-drift step):
- codespell: typo catcher across source / comments / docs. Skips
lockfiles, generated assets, binary artefacts, LICENSE files.
ignore-words-list pulls out short identifiers and PyTorch
idioms (parm/parms, ans, hist, etc.) the default dictionary
would flag. Local run finds 16 real typos to fix in a follow-up.
- shellcheck: catches subtle shell bugs `bash -n` doesn't see --
unquoted expansions, useless cat, `[[ ]]` command substitution,
etc. SC1090 + SC2034 muted because install/setup scripts
legitimately source runtime paths and use export-only
assignments. Critical-path coverage: install.sh, setup.sh,
tests/sh/.
Both pinned for reproducibility (codespell>=2.3,<3 in pip,
shellcheck via apt-get). Both surface findings in PR annotations
without failing the run; drop continue-on-error after the cleanup
PRs land.
New workflow: Security audit. Runs `pip-audit` against the same
dep set Studio's backend pytest matrix installs, so we audit what
the runtime actually loads (not what pyproject.toml's transitive
resolution might pull in differently). Triggers:
- PRs touching requirements / pyproject.toml,
- push to main / pip,
- nightly @ 04:13 UTC (off-the-hour to dodge cron rush),
- workflow_dispatch.
The default branch already carries 17 known vulnerabilities per
the dependabot banner, so a hard gate today would block every PR
on a baseline we have not triaged. Non-blocking; full table goes
to GITHUB_STEP_SUMMARY for grep-ability and a 30-day artefact for
historical comparison.
The custom AST anti-pattern scan I prototyped was dropped: every
class of CPU-import-time bug we hit in this PR (bitsandbytes,
torchvision, _cuda_getCurrentRawStream, DEVICE_COUNT==0 stream
init) is already caught by the Repo tests (CPU) job exercising
the actual import on a CPU torch wheel. Restating the rule
in AST form would only add noise.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 518fa38e16
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if proc.returncode != 0: | ||
| print( | ||
| f" [ERROR] pip download (with deps) failed: {proc.stderr.strip()[:500]}", | ||
| file = sys.stderr, | ||
| ) |
There was a problem hiding this comment.
Fail scan when transitive download resolution fails
In --with-deps mode, a non-zero pip download only logs an error and then the script continues scanning whatever files happen to exist in dest, which can be an incomplete or empty set. In that case main() still reports findings only from partial data and can exit 0, so the new security-audit.yml job can report a clean scan even when dependency resolution failed (e.g., conflict/network failure) and much of the tree was never scanned.
Useful? React with 👍 / 👎.
| key = e["name"].lower().replace("-", "_").replace(".", "_") | ||
| if key not in seen: | ||
| seen.add(key) | ||
| unique_entries.append(e) |
There was a problem hiding this comment.
Keep all duplicate package constraints in scan input
This dedup keeps only the first spec per normalized name, dropping later constraints from other requirement sources. The security workflow passes multiple manifests that intentionally overlap (for example studio/backend/requirements/studio.txt pins datasets==4.3.0 while studio/backend/requirements/no-torch-runtime.txt allows a range), so removing later specs changes the resolved dependency set and can scan different versions than the ones actually installed in production paths.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2af52abdf8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "pip", | ||
| "download", | ||
| "--dest", |
There was a problem hiding this comment.
Avoid executing sdists during pre-install scans
When a requirement or transitive dependency only has an sdist (or lacks reusable metadata), this pip download can run build/metadata hooks before the archive scanner gets to inspect the files; I checked the official pip docs, which describe pip download as using the same resolver behavior as install, and pip install documents setup.py egg_info for sdists. Since this workflow is explicitly meant to scan compromised packages without executing them, a malicious sdist can still run code on the CI runner here; use a metadata/API-based fetch path or fail closed on packages that require local metadata generation.
Useful? React with 👍 / 👎.
| for f in studio.txt extras.txt extras-no-deps.txt \ | ||
| no-torch-runtime.txt overrides.txt triton-kernels.txt; do |
There was a problem hiding this comment.
Include single-env Studio requirements in the audit
This allowlist misses studio/backend/requirements/single-env/*.txt, even though studio/install_python_stack.py installs SINGLE_ENV / "data-designer-deps.txt" and SINGLE_ENV / "data-designer.txt" in the normal Studio install flow. A PR changing those production dependency manifests still triggers this workflow via requirements/**, but neither pip-audit nor the scan-packages job audits their packages, so a vulnerable or malicious Data Designer dependency can be added without this new security check seeing it; include those files in this list and the matching scan list below.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bd49dd7524
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for f in studio.txt extras.txt extras-no-deps.txt \ | ||
| no-torch-runtime.txt overrides.txt triton-kernels.txt; do |
There was a problem hiding this comment.
Include base requirements in the audit
When a PR changes studio/backend/requirements/base.txt, this workflow is triggered by studio/backend/requirements/**, but the generated audit inputs only copy the six files listed here and omit base.txt; studio/install_python_stack.py installs that file in the normal local/update flows (req = REQ_ROOT / "base.txt"), so a new vulnerable or malicious base dependency can be added while both the pip-audit and scan-packages shards still inspect none of it. Add base.txt to this list and the matching scan matrix/input list below.
Useful? React with 👍 / 👎.
bd49dd7 to
9b0f5ff
Compare
The previous Security audit only covered Studio's backend requirements. The unsloth pip package itself ships its own dep set via pyproject.toml (typer/pydantic/pyyaml/nest-asyncio core, plus the huggingfacenotorch extras: transformers/peft/accelerate/trl/datasets/diffusers/etc.) -- a malicious upload to any of those would slip past us today. Build a combined dep list from pyproject.toml + the six Studio requirements files and feed it to both pip-audit and scan_packages. Add scan_packages.py at scripts/scan_packages.py so the scanner ships with the repo and CI does not depend on a network fetch at job time. Pass --with-deps to scan_packages so the pre-install pattern scan walks the full transitive closure -- supply-chain attacks usually land several hops down (litellm 1.82.7 was a dep of a dep for most users; top-level-only scanning would have missed it). No installation in either job. pip-audit's -r mode resolves through PyPI metadata, scan_packages downloads sdist/wheel archives raw and inspects them without running install hooks. An attacker who has compromised a transitive dep cannot execute code in this workflow.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a41a315d7e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| STUDIO_OLD_PW: ${{ env.STUDIO_EXTRA_OLD_PW }} | ||
| STUDIO_NEW_PW: ${{ env.STUDIO_EXTRA_NEW_PW }} |
There was a problem hiding this comment.
Pass extra UI passwords from the runner environment
This step maps STUDIO_OLD_PW/STUDIO_NEW_PW from ${{ env.STUDIO_EXTRA_* }} even though the prior step sets those values via $GITHUB_ENV; these expressions are resolved from the workflow/job env context, so they expand empty here and override the runtime values that tests/studio/playwright_extra_ui.py needs. On the Linux extra-UI smoke this submits blank passwords and fails before exercising Compare/Recipes/Export/Settings; either let the script consume STUDIO_EXTRA_* directly or write STUDIO_OLD_PW/STUDIO_NEW_PW into $GITHUB_ENV in the previous step.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 00f3e325ce
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "download", | ||
| "--dest", | ||
| dest, | ||
| ] + specs |
There was a problem hiding this comment.
Avoid executing sdists in the package scanner
When the security-audit job runs scan_packages.py --with-deps, this pip download invocation can still execute untrusted sdist build hooks while resolving/downloading packages. I checked python -m pip download --help: --no-build-isolation is described in terms of building a source distribution, and --only-binary :all: is the option that disables source packages; without that guard, a malicious transitive dependency that only ships an sdist (or whose wheel is unavailable for this runner) can run code before the scanner inspects the archive, defeating the workflow’s “download without executing” threat model. Add --only-binary=:all: or otherwise fetch archives through a path that never builds metadata.
Useful? React with 👍 / 👎.
Mac Chat UI Tests on commit 00f3e32 timed out 60s into page.fill('#confirm-password') (run 25578374480 / job 75091072289). The previous fix (3274f72) wrapped the post-submit composer wait but left the form-fill sequence single-shot. Same root cause as the original 25497245250 / 74820324136 case but a step deeper: pw_field.fill('#new-password') succeeds, then a re-render between the two locators detaches '#confirm-password' and the second fill burns the 60s ceiling. Wrap the entire goto + settle + locator + fill + submit sequence in a 3-attempt retry. Each retry re-navigates page.goto() with wait_until='domcontentloaded' (fresh DOM, fresh form) and spawns a new page in the same context if the old one died. Diagnostics on each failed attempt: page.url, page_errors, console_errors, screenshot. Same hardening applied to playwright_extra_ui.py which has the same change-password flow.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c876296dca
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| STUDIO_OLD_PW: ${{ env.STUDIO_EXTRA_OLD_PW }} | ||
| STUDIO_NEW_PW: ${{ env.STUDIO_EXTRA_NEW_PW }} |
There was a problem hiding this comment.
Read extra-UI passwords from runtime env on macOS
This step sets STUDIO_OLD_PW/STUDIO_NEW_PW from ${{ env.STUDIO_EXTRA_* }}, but those STUDIO_EXTRA_* values are produced in the prior step via $GITHUB_ENV (runner environment), not declared in workflow/job/step env. In Actions, env context expressions are resolved from the workflow env context, so these resolve empty and override the variables the Playwright script expects, causing the macOS extra-UI smoke to submit blank passwords and fail before exercising Compare/Recipes/Export/Settings.
Useful? React with 👍 / 👎.
Extend the cross-version compat canary to catch ~80% of upstream drift before a user hits it. Static checks only (GitHub raw fetch + grep), CPU-only, runs PR-time + daily cron. 906 pass, 73 skipped. TRL coverage extended: - TRL_TAGS expanded from 12 to 28 (every stable release >=0.18.2, including the broken 0.19.0, plus main). Anchors: 0.22.2 / 0.27.1 / 1.0.0 marked. - Fix `__version__` parser to handle the TRL 0.22.x pattern (`__version__ = f.read()` from sibling VERSION file). - Fix `has_def` in _fetch.py to allow indented matches so class methods are detected (the original anchored ^def only matched module-scope definitions). - New tests for symbols the audit found we touch but didn't check: is_conversational, sft_trainer module + neftune_post_forward_hook, dpo_trainer module + MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES, trl.trainer.utils.ConstantLengthDataset (gated), trl.models.utils.disable_gradient_checkpointing (gated >=1.0.0), trl.import_utils + _*_available cache pattern, trl.experimental.openenv.utils generators (one of two names), GRPOTrainer required methods (_prepare_inputs, _generate_and_score_completions, compute_loss; per-token-logps legacy/new dispatch), GRPOTrainer source must contain torch.inference_mode + accelerator.unwrap_model fingerprints, KTOTrainer.get_batch_logps (now lives at trl.experimental.kto on TRL 0.27+ — accept either path), SFTTrainer class existence, DPOTrainer methods (informational), chat-template propagation (legacy maybe_apply_chat_template OR successor apply_chat_template + chat_template_kwargs), truncate_with_protected_tokens informational. - Tighten test_unwrap_model_for_generation_either_path to mirror the prod fallback exactly (drop unused trl/extras/profiling.py candidate). - Replace test_trl_generation_vllm_generation_gated symbol set with the actual unsloth dependency (VLLMGeneration class + _init_vllm / sync_weights / generate methods, not VLLMClient/etc). PEFT coverage extended (driven by the 8 PR audit unsloth#5015, #5167, #5036, #4807 + unsloth-zoo#618, #596, #482, #430): - VARIANT_KWARG_KEYS const (peft 0.18+; injected by zoo#430) - ParamWrapper class + members (peft 0.18+; needed by zoo#618) - LoraConfig.target_parameters (peft 0.19+) - LoraModel._create_and_replace (signature pin for unsloth#4807) - transformers_weight_conversion module + build_peft_weight_mapping (unsloth#5167 wraps this) - integrations.dequantize_module_weight (3 callsites) - PeftType.LORA (vllm_utils.py:2520) - ModulesToSaveWrapper (both peft.utils.* paths) - PeftModel.from_pretrained method exists - peft.__version__ parseable Transformers coverage added (driven by the 16-PR audit): - New file test_transformers_pinned_symbols.py with 19 test categories x 12 transformers tags (4.57.6 floor + 5.0..5.8 + main). Anchors: 4.57.6 + 5.5.0. - Trainer surface (compute_loss num_items_in_batch param, training_step grad-accum fingerprints, get_batch_samples num_items contract, inner_training_loop _tr_loss inplace v5) - modeling_utils.checkpoint alias for unsloth-zoo#549 - PushToHubMixin._create_repo presence (unsloth-zoo#393) - integrations.bitsandbytes module + Linear4bit reference - quantizers.should_convert_module signature (zoo#491/#488) - FP8Linear bias/has_bias rename (zoo#572) - processing_utils.Unpack importable (zoo#583/584) - gemma3 Gemma3Attention class + gpt_oss GptOssModel class - auto_factory _LazyAutoMapping private API (unsloth#5155) - configuration_utils PretrainedConfig/PreTrainedConfig alias - tokenization_utils_base.apply_chat_template - modeling_attn_mask_utils symbols - cache_utils Cache + DynamicCache classes - training_args.ParallelMode importable Wire the new transformers job into version-compat-ci.yml (matrix of 5 PR-time symbol jobs + zoo-imports under spoof + daily fresh- fetch cron). Local smoke: 906 pass, 73 skipped (gated optional features) across vLLM + TRL + PEFT + ST + bnb + transformers suites.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 200822cf93
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| print( | ||
| f" [ERROR] pip download (with deps) failed: {proc.stderr.strip()[:500]}", | ||
| file = sys.stderr, | ||
| ) |
There was a problem hiding this comment.
Fail scan when transitive pip download errors out
When --with-deps resolution fails, this branch only logs an error and continues, so downloaded can be empty and main() still exits 0 because it only fails on HIGH/CRITICAL findings. In the security-audit.yml pip-scan-packages job (which always uses --with-deps), a transient PyPI failure or resolver conflict can therefore produce a green scan result even though no archives were actually inspected.
Useful? React with 👍 / 👎.
Two coverage extensions per follow-up:
bnb matrix: from 2 tests to 12 categories per tag, derived from a
full grep of unsloth + unsloth-zoo. Adds:
- bitsandbytes.matmul_4bit (top-level export)
- bnb.functional 4-bit kernel path: legacy `lib.cdequantize_*` (bnb
<=0.48) OR new torch.ops.bitsandbytes.dequantize_* (bnb >=0.49) —
passes either, fails if neither is wired
- bnb.functional.get_ptr (binding at unsloth/kernels/utils.py:233)
- bnb.functional.QuantState class + from_dict classmethod
(zoo monkey-patches `QuantState.from_dict = ...`)
- bnb.nn.modules.fix_4bit_weight_quant_state_from_module (optional)
- bnb.nn.Linear8bitLt (legacy load_in_8bit path)
- bnb.optim.optimizer.Optimizer2State (PagedAdamW32bit base)
- bnb.utils.{pack_dict_to_tensor, unpack_tensor_to_dict}
(state-dict save/load)
- bnb.cextension.ROCM_WARP_SIZE_64 (optional, AMD ROCm path)
- bnb.autograd._functions.matmul_4bit (dynamo-disable probe site)
- bnb.__version__ exported via any known mechanism (the 6 floor
gates at 0.43.3, 0.46.0, 0.48.2.dev0, 0.49.0, 0.49.2 all read it)
Extended zoo-import smoke: from 5 narrow tests in
tests/vllm_compat/test_unsloth_zoo_imports.py to 32 tests in the
new tests/vllm_compat/test_extended_module_imports.py:
- 20 unsloth_zoo modules sweep (compiler, dataset_utils,
device_type, empty_model, gradient_checkpointing, hf_utils,
llama_cpp, logging_utils, loss_utils, patching_utils,
patch_torch_functions, peft_utils, rl_replacements,
saving_utils, tiled_mlp, tokenizer_utils, training_utils,
utils, vision_utils, compiler_replacements). Each must import
cleanly under the existing _zoo_aggressive_cuda_spoof harness;
drift in transformers / peft / bnb symbols pinned at module-top
trips here BEFORE any user-visible call.
- 7 unsloth.models.* core modules sweep (rl, rl_replacements,
sentence_transformer, _utils, loader, loader_utils, mapper).
- _IS_MLX must be False on a non-Apple-Silicon spoof runner
(catches MLX gate logic too lax in unsloth/__init__.py).
- FastLanguageModel/Vision/Model surface dump: from_pretrained +
get_peft_model methods must be reachable on the dumped class.
- RL_FUNCTIONS dispatch table populated with grpo_trainer +
sft_trainer + dpo_trainer keys (catches "imports cleanly but
silently empty dispatch").
- unsloth_zoo.compiler.test_apply_fused_lm_head must be callable.
- FastModel.from_pretrained signature has model_name +
max_seq_length + load_in_4bit kwargs (every Colab notebook
calls these by name).
Wired into the existing zoo-imports-under-spoof job in
.github/workflows/version-compat-ci.yml.
Local smoke: 49 bnb pass, 28 extended-import pass + 4 skipped (env
quirks). Full version_compat suite: 947 pass, 76 skipped.
for more information, see https://pre-commit.ci
…ac buffer) Run 25586582979 + 25586583008 + 25586583024 surfaced three real issues on commit a975d58. All addressed: 1. version-compat-ci.yml `zoo-imports-under-spoof` job — every `import unsloth_zoo.<module>` failed with `Exception: No package metadata was found for torchcodec` transformers 5.x's `audio_utils.py:55` does `version.parse(importlib.metadata.version("torchcodec"))` UNCONDITIONALLY at module top, which trickles up through transformers.processing_utils -> unsloth_zoo.vision_utils -> the whole zoo import path. Fix: pip install `torchcodec<0.10` in the workflow alongside torch + torchvision (CPU wheel exists; the <0.10 cap mirrors the torch 2.10 / torchvision 0.26 ABI window already pinned). 2. studio-backend-ci.yml "Repo tests (CPU)" job — pytest's auto-discovery pulled in the new tests/vllm_compat/ + tests/version_compat/ files which require a heavier dep set (transformers/peft/bnb pins, torchcodec) than the Backend CI install line provides. Failed with `ImportError: cannot import name 'IterableDataset' from 'datasets'` (datasets 4.x removed the legacy export from the package root). Fix: --ignore=tests/vllm_compat + --ignore=tests/version_compat in the auto-discovery step. Both directories have a dedicated job in version-compat-ci.yml that installs the right dep set. 3. tests/studio/playwright_chat_ui.py — Mac Chat UI hit `net::ERR_NO_BUFFER_SPACE` after the change-password POST under --single-process Chromium on the macos-14 free runner; the page stayed on /change-password and BOTH composer.wait_for retries timed out at 60s each. The page.goto(BASE) recovery couldn't recover because the auth state never persisted. Fix: wrap the submit-button click in `page.expect_response("/api/auth/change-password" + POST, timeout=30_000)` so the buffer-error surfaces immediately in the failing attempt rather than at the next composer.wait_for. The next retry iteration starts cleanly with a known-bad initial state. Falls back to fire-and-forget click if the response wait itself throws (so we don't introduce a new failure mode). Local smoke after fixes: 975 pass, 80 skipped across version_compat + vllm_compat suites.
… throttling
Both playwright_chat_ui.py and playwright_extra_ui.py reimplemented the
same set of CI-runner workarounds (Chromium launch flags, view-transition
CSS killer, change-password retry, page-recovery). When one diverged the
other slowly rotted: the macos-14 / windows-latest / ubuntu-latest
failure modes are mostly identical so the cure is the same.
New module tests/studio/_playwright_robust.py is the single point of
truth, providing:
- chromium_launch_args(platform): bundles macos-14 stability set
(--single-process for the pipeTransport JSON-RPC crash) PLUS new
throttling-kill flags (--disable-background-timer-throttling,
--disable-renderer-backgrounding, --disable-backgrounding-occluded-
windows, --disable-features=TranslateUI, --disable-ipc-flooding-
protection) that prevent Chromium from deprioritising the headless
context's CPU/timers when it thinks the window is backgrounded --
which CI runners routinely flag.
- install_view_transition_killer(ctx): the duplicated init script.
- wait_for_health(base_url): pre-flight server probe inside the
script -- catches the macos-14 gap where /api/health responds 200
while the auth DB hasn't finished migrating.
- recover_or_replace_page(page, ctx): canonical "page died mid-test"
helper. Replaces the page if closed, optionally re-navigates +
waits for networkidle.
- click_and_wait_for_response(page, url_substr, do_click): generic
POST-and-wait pattern that surfaces server-side 4xx / buffer-fail
immediately. Now used by both files' change-password submit
(parity -- previously only chat_ui had this).
- dump_diagnostics(page, art_dir, name): screenshot + DOM excerpt +
URL + localStorage keys JSON sidecar. Available for any future
failure dump site.
- BENIGN_PAGE_ERROR_PATTERNS / BENIGN_CONSOLE_ERROR_PATTERNS shared
between the two files. Adds net::ERR_NO_BUFFER_SPACE +
AbortError + chunk-load to the console-side filter so the
diagnostic dump count tracks real signal.
Net effect: ~230 lines drop from chat_ui, ~146 from extra_ui, +401
shared. Total LOC down slightly. Behaviour preserved -- existing
retry windows / timeouts / fail conditions all unchanged.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 16c4f655dc
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| --ignore=tests/vllm_compat \ | ||
| --ignore=tests/version_compat \ |
There was a problem hiding this comment.
Keep compat suites in a PR-time workflow for all model changes
Ignoring tests/vllm_compat and tests/version_compat here removes these checks from the always-on backend PR gate, but the replacement workflow only triggers for a narrow path set (version-compat-ci.yml watches rl.py, rl_replacements.py, and sentence_transformer.py). That means PRs changing other modules covered by the compat suite (for example unsloth.models.loader/loader_utils imported by tests/vllm_compat/test_extended_module_imports.py) will now merge without running the compatibility canaries, so symbol/import regressions can slip through until the daily cron.
Useful? React with 👍 / 👎.
- actions/checkout v4.3.1 -> v6.0.2 - actions/setup-python v5.6.0 -> v6.2.0 - actions/setup-node v4.4.0 -> v6.4.0 - actions/upload-artifact v4.6.2 -> v7.0.1 - actions/cache @v4 (mutable) -> @27d5ce7f... # v5.0.5 SHA-pinned (15 sites) - actions/upload-artifact @v4 in wheel-smoke.yml -> SHA-pinned to v7.0.1 The 16 mutable @v4 references were exactly the @v0 / @v2 / @latest class of reference the security-audit.yml comments call out as the litellm / tj-actions attack surface, so they should never have shipped as bare tags alongside the other SHA pins in this PR. actions/cache v4 -> v5 regenerates the internal cache version hash, so existing v4-saved caches (including the GGUF cache reused across the studio smokes) miss once on first run after merge and then re-populate. No semantic change beyond that. Also corrects the dtolnay/rust-toolchain comment in security-audit.yml and studio-tauri-smoke.yml: 29eef336d9 is the current stable branch tip but its commit date is 2026-03-27, not 2026-05-07 as the comment claimed. release-desktop.yml intentionally left untouched (still on v4.3.1 checkout + v4.4.0 setup-node + older swatinem/rust-cache and unpinned tauri-action). That file is outside the scope of this PR and should get its own bump in a follow-up.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 960e68d98c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| semgrep scan \ | ||
| --config p/supply-chain \ | ||
| --config p/python \ | ||
| --config p/javascript \ | ||
| --config p/security-audit \ |
There was a problem hiding this comment.
Load the custom Semgrep rules in CI
The new .semgrep/unsloth-rules.yml says it is wired into this workflow, but this Semgrep invocation only loads the stock packs, so none of the custom CVE-shape rules in that file ever run. Since the PR path filter also omits .semgrep/**, edits to those rules would not even trigger the audit workflow; add --config .semgrep/unsloth-rules.yml here and include the rules path in the trigger.
Useful? React with 👍 / 👎.
| - 'studio/backend/core/inference/mlx_inference.py' | ||
| - 'tests/studio/test_hardware_dispatch_matrix.py' | ||
| - 'tests/studio/test_is_mlx_dispatch_gate.py' | ||
| - 'tests/studio/test_mlx_training_worker_behaviors.py' | ||
| - 'tests/studio/run_real_mlx_smoke.py' |
There was a problem hiding this comment.
Trigger MLX prebuilt smoke for installer changes
This workflow later runs the real macOS prebuilt path via python studio/install_llama_prebuilt.py ..., but the PR path filter does not include studio/install_llama_prebuilt.py (or the setup script that passes the simple-policy flags). When a PR changes only that installer code, the Apple Silicon prebuilt GGUF smoke is skipped even though it is the only job validating that path; add the installer/setup files to this trigger list.
Useful? React with 👍 / 👎.
* fix: unblock 4 tests deselected/skipped in #5312 (real bugs) PR #5312 surfaced two real regressions by turning previously-silent skips into explicit `--deselect` / `pytest.skip(...)` blocks. Both were left as follow-ups rather than fixed in that PR. This PR fixes the underlying bugs so the suppressions can be dropped. 1. studio/backend/requirements/no-torch-runtime.txt: pin tokenizers Installing with `--no-deps -r no-torch-runtime.txt` (the path install.sh takes for the no-torch / GGUF-only mode) resolves transformers to 5.3.0 and tokenizers to the latest available (0.23.1). transformers 5.3.0 requires `tokenizers>=0.22.0,<=0.23.0`, so `from transformers import AutoConfig` then fails at import time: ImportError: tokenizers>=0.22.0,<=0.23.0 is required for a normal functioning of this module, but found tokenizers==0.23.1. Pin `tokenizers>=0.22.0,<=0.23.0` to match the constraint embedded inside every transformers version in the allowed window (4.56.0..5.3.0). Verified locally: a fresh `uv venv` + `uv pip install --no-deps -r no-torch-runtime.txt` followed by `from transformers import AutoConfig` now succeeds. Unblocks 3 deselected cases in studio-backend-ci.yml: - TestE2ETokenizersFix::test_autoconfig_works_with_no_torch_runtime (parametrized py 3.12 + 3.13 -> 2 cases) - TestE2EFullNoTorchSandbox::test_autoconfig_succeeds 2. unsloth/models/rl.py: defensive wrapper for _patch_trl_rl_trainers _patch_trl_rl_trainers has many internal `try: ... except: ... return` branches, but several paths (notably inspect.getsource on the thin wrappers TRL 1.x leaves in trl.trainer for trainers that moved to trl.experimental) can still propagate exceptions. The umbrella patch_trl_rl_trainers() ring-fences each call with try/except + warning_once, but direct callers (the CI shim in consolidated-tests-ci.yml, downstream tools, end-user scripts) used to see the raw exception, which forced #5312's CI heredoc to ring-fence with: except Exception as e: # TRL 1.x renames break the patch helper internally; we # accept that here and skip rather than fail the cell. pytest.skip(f"_patch_trl_rl_trainers raised: ...") Rename the existing implementation to _patch_trl_rl_trainers_impl and make _patch_trl_rl_trainers a thin wrapper that catches any uncaught exception and routes it through logger.info, matching the umbrella wrapper's behaviour. Power users who want the raw raising behaviour for their own diagnostics can still call _patch_trl_rl_trainers_impl directly. Adds tests/python/test_patch_trl_rl_trainers_defensive.py to lock the contract: the wrapper must never raise, and it must delegate to the impl on the happy path. Unblocks 1 skip in consolidated-tests-ci.yml's test_compile_sft_trainer_patch. Follow-up for #5312 once this lands: drop the two `--deselect` lines in studio-backend-ci.yml's repo-cpu-tests step and drop the `except Exception ... pytest.skip(f"_patch_trl_rl_trainers raised: ")` block in consolidated-tests-ci.yml's test_compile_sft_trainer_patch. * chore: tighten comments and docstrings in the new code Drop verbose justifications down to one or two lines per site. The PR description carries the full context; in-file comments only need to point at the WHY. * chore(no-torch-runtime): drop redundant lower bound on tokenizers tokenizers 0.23.0 was never published to PyPI (versions go 0.22.2 -> 0.23.1), so `tokenizers<=0.23.0` resolves to 0.22.2 in practice, the same version the explicit >=0.22.0,<=0.23.0 pin resolved to. Verified on Python 3.12 and 3.13.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4b8e886fac
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| STUDIO_OLD_PW: ${{ env.STUDIO_EXTRA_OLD_PW }} | ||
| STUDIO_NEW_PW: ${{ env.STUDIO_EXTRA_NEW_PW }} |
There was a problem hiding this comment.
Read Windows extra-UI passwords from the runner env
In the Windows extra-UI smoke, these ${{ env.STUDIO_EXTRA_* }} expressions are evaluated before the job runs and do not see the values written to $GITHUB_ENV in the previous step, so they expand to empty strings and override the runtime environment. When this step runs, playwright_extra_ui.py receives blank STUDIO_OLD_PW/STUDIO_NEW_PW and fails at login/password rotation instead of exercising the Compare/Recipes/Export/Settings paths; let the script consume the STUDIO_EXTRA_* variables directly or write STUDIO_OLD_PW/STUDIO_NEW_PW to $GITHUB_ENV before this step.
Useful? React with 👍 / 👎.
The previous gate triggered only on changes to rl.py, rl_replacements.py, and sentence_transformer.py, but the symbol-existence tests cover EVERY pinned upstream reference in unsloth. A new `from peft.foo import Bar` added in unsloth/kernels/whatever.py is the same class of compat regression as one added in unsloth/models/rl.py, and was previously slipping through this gate. Cost is small: the job is CPU-only raw-fetch + grep against pinned upstream tags, ~1 minute end-to-end.
* fix(tests/sh): accept pinned tokenizers line after #5359 #5359 pinned the tokenizers line in studio/backend/requirements/no-torch-runtime.txt from bare `tokenizers` to `tokenizers<=0.23.0` to stop pip from resolving to 0.23.1+ (which transformers rejects at import time). The shell test in tests/sh/test_torch_constraint.sh was still asserting the literal `^tokenizers$` regex, which fails on the pinned form. Surfaced as a hard fail on PR #5312's Backend CI Repo tests (CPU) step: === Structural: tokenizers in no-torch-runtime.txt === FAIL: tokenizers present as standalone line (expected '1', got '0') FAIL: tokenizers before transformers (expected 'yes', got 'no') Relax the regex to `^tokenizers([<>=!,~ ]|$)` so it matches both bare and version-constrained forms, which preserves the original intent (verify tokenizers is present in the file, before transformers). Verified locally: 24 PASS, 0 FAIL. * fixup(tests/sh): tighten tokenizers check to guard the safe bound Address bot review feedback on #5361: * Codex (P2): the previous relaxed regex `^tokenizers([<>=!,~ ]|$)` accepted any version operator, so `tokenizers>=0.23.1` would still pass the test even though that line would re-introduce the import failure #5359 fixed. * Gemini (medium): the boundary char class did not cover all PEP 508 separators (`[`, `;`, `@`). Replace the single check with two: 1. Loose: `^tokenizers([^a-zA-Z0-9._-]|$)` confirms the package is listed (covers extras, env markers, URLs, bare line). 2. Tight regression guard: pipe those lines through a second grep that requires `<=0.23.0` or the functionally equivalent `<0.23.1`. Rejects bare `tokenizers`, `>=0.22.0` (no upper bound), `>=0.23.1`, `!=0.23.0`, `<=0.24.0`, etc. Verified locally: - Current main (tokenizers<=0.23.0): 25 PASS, 0 FAIL. - Spot-check with the bug reverted (bare `tokenizers`): the new "tokenizers pinned with upper bound excluding 0.23.1+" check FAILS as intended; the original "listed" check still passes.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5506d8c169
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| python unsloth/scripts/notebook_validator.py refresh-colab \ | ||
| --out unsloth/scripts/data/colab_pip_freeze.gpu.txt |
There was a problem hiding this comment.
Preserve committed Colab snapshot before strict diff
In the scheduled/dispatch colab-oracle job, this refresh writes the live Colab pip-freeze.gpu.txt directly over unsloth/scripts/data/colab_pip_freeze.gpu.txt immediately before colab-diff --strict compares --snapshot-dir unsloth/scripts/data. When Colab's pip image drifts, the strict diff reads the already-updated file as the “committed” snapshot and reports no pip-freeze drift, so the cron check that is supposed to fail within ~24h stays green. Write the live freeze to a temp path for linting, or run the strict diff before mutating the checkout.
Useful? React with 👍 / 👎.
…sses (unslothai#628) * tests: skip MoE LoRA extractor coverage when discovery finds zero classes test_every_patched_moe_experts_class_has_lora_extractor asserts that every class flagged `_unsloth_already_patched=True` whose source defines 3D `gate_up_proj`/`down_proj` parameters also has `_unsloth_lora_extractor_fn` registered. The defensive precondition was "discovery produced at least one patched class"; failing that, the test raised AssertionError("Discovery produced zero patched MoE classes ... Investigate."). That precondition fires on any transformers version older than the MoE surface unsloth_zoo patches. Specifically, with transformers==4.57.6 none of the gemma4 / llama4 / qwen3_moe / mxfp4 classes the TEMPORARY_PATCHES list targets exist, so every patch fn no-ops and discovery returns []. The "zero discoveries" outcome on an older transformers is not a regression of the bug being prevented (a class patched-but-missing-extractor); it just means the relevant classes aren't there to patch. Switch the precondition from `assert patched, ...` to `pytest.skip(...)` with a comment explaining the two indistinguishable cases. CI cells running newer transformers (where classes do exist) will still exercise the real assertion below; the older-transformers cell skips cleanly. Surfaced by the consolidated CPU-CI matrix on unslothai/unsloth#5312 in the cell pinned to `transformers==4.57.6`. * tests: distinguish missing surface from discovery regression Per codex/gemini review on PR unslothai#628: replacing the original assert with an unconditional skip created a silent-failure path -- a real discovery regression on a transformers that DOES ship the grouped-MoE surface would now skip rather than fail, defeating the test. Disambiguate the two zero-discovery causes by walking transformers once and asking whether ANY class anywhere in transformers.models.* declares `gate_up_proj` + `down_proj` as 3D `nn.Parameter`s -- the exact filter `_looks_like_grouped_moe_experts` runs against discovered classes. If at least one such class exists AND discovery returned zero, that is a real regression: fail loud. If no such class exists, the patch surface this test targets is genuinely absent on this transformers version: skip. This avoids hard-coding model names that change as transformers grows new MoE families. Also update terminology in the comment + skip / fail messages: the load-bearing marker `_has_unsloth_patched_forward` reads is `_original_<modeling_tail>_<ClassName>_forward`, not the per-family `_unsloth_already_patched` flag.
Without a guard, re-running `higher_precision_softmax` on already-
rewritten source appends another `.to(<variable>.dtype)` cast each
pass. The regex matched the inner `softmax(...)` but the existing
`.to(x.dtype)` suffix was outside the matched span, so
`source.replace(full_match, new)` left the suffix in place and the
new replacement added a fresh `.to(x.dtype)`, producing:
softmax(x, dim=-1, dtype=torch.float32).to(x.dtype).to(x.dtype)
after the second pass.
Add a negative lookahead `(?!\s*\.to\(\s*\2\s*\.dtype\s*\))` so the
finditer skips matches whose softmax(...) is already followed by
`.to(<variable>.dtype)`. Idempotent in all three cases:
- no-dtype: `softmax(x, dim=-1)`
- with-dtype: `softmax(x, dim=-1, dtype=torch.bfloat16)`
- mixed `nn.functional.softmax` and `F.softmax`
Why this matters: defensive idempotency prevents source-bloat when a
single function passes through `create_new_function` more than once
(e.g. a re-compile triggered by `UNSLOTH_COMPILE_OVERWRITE=0` +
transformers version mismatch); also makes the upstream test
suite's CI assertion `f(f(s)) == f(s)` (added in
unslothai/unsloth#5312's consolidated CI) hold.
Verified: full `tests/` suite still passes (172 / 172) plus the new
idempotency local check.
Follow up to #5298. Three changes, all CI-only.
1. Scope
GITHUB_TOKENpermissionsCodeQL flagged the five new PR-time workflows for not declaring an explicit
permissions:block (security/code-scanning/67-80, surfaced on #5298). Each ofstudio-backend-ci.yml,studio-frontend-ci.yml,studio-inference-smoke.yml,studio-tauri-smoke.yml, andwheel-smoke.ymlnow declarespermissions: contents: readat the workflow level. None of these workflows write to the repo, push releases, or comment on issues, socontents: readis the minimum required and matches the existing pattern inrelease-desktop.yml.2. Add MLX CI
Mirrors the three test files documented in #5307's
tests/studio/README.mdinto a dedicated workflow so MLX dispatch failures surface as their own check in the PR UI rather than getting buried in Backend CI:test_hardware_dispatch_matrix.py7-profile parametrized matrix + 2 dispatch-priority canariestest_is_mlx_dispatch_gate.pyAST and runtime guard onunsloth._IS_MLXtest_mlx_training_worker_behaviors.pycontract checks againststudio/backend/core/training/worker.pyTriggers when any of
unsloth/__init__.py,studio/backend/utils/hardware.py,studio/backend/core/training/worker.py, or one of the three test files changes. Runs on a Linux+CPU runner with all hardware probes spoofed; no Apple Silicon, real GPU, or real MLX install required. Locally validated: 36 passed in 0.41s.3. Unblock ~60 skipped tests in Repo tests (CPU)
The Backend CI run on #5298 finished with
694 passed, 67 skipped, 23 deselected. Most of those skips were silent: the tests self-skip when prerequisites are missing. This PR installs the missing prerequisites:actions/setup-node@v4unblocks 9 tests intests/studio/test_chat_preset_builtin_invariants.pythat compile a tiny TypeScript harness against the frontend chat sources. While doing this I noticed the existence checks on those source files used an obsoleteunsloth_repo/studio/...path; fixed alongside.pip install uvunblocks thetests/python/*cluster (test_e2e_no_torch_sandbox.py47 tests,test_studio_import_no_torch.py29 tests,test_tokenizers_and_torch_constraint.pymost of 42 tests) that spawn freshuv venvs to verify the no-torch install path.Three
test_tokenizers_and_torch_constraint.pycases are deselected because they expose a real bug instudio/backend/requirements/no-torch-runtime.txt: the unpinnedtokenizersline resolves to 0.23.1, which transformers rejects withtokenizers>=0.22.0,<=0.23.0 is required. That is a genuine no-torch install regression worth tracking on its own; not surfacing it here would have meant continuing to silently skip the test that catches it. Left as a--deselectwith a comment so the next PR that touchesno-torch-runtime.txtcan drop it.Locally:
760 passed, 1 skipped, 23 deselected(was 694 / 67 / 23). The remaining 1 skip istests/studio/install/test_rocm_support.py:1092which legitimately can't run without proper ROCm imports.Test plan
yaml.safe_load.mlx-ci.ymlvalidated locally (36 passed in 0.41s).