Skip to content

CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests#5312

Merged
danielhanchen merged 172 commits into
mainfrom
ci/workflow-permissions-and-fewer-skips
May 11, 2026
Merged

CI: scope GITHUB_TOKEN permissions, add MLX CI, unblock ~60 skipped tests#5312
danielhanchen merged 172 commits into
mainfrom
ci/workflow-permissions-and-fewer-skips

Conversation

@danielhanchen
Copy link
Copy Markdown
Member

Follow up to #5298. Three changes, all CI-only.

1. Scope GITHUB_TOKEN permissions

CodeQL flagged the five new PR-time workflows for not declaring an explicit permissions: block (security/code-scanning/67-80, surfaced on #5298). Each of studio-backend-ci.yml, studio-frontend-ci.yml, studio-inference-smoke.yml, studio-tauri-smoke.yml, and wheel-smoke.yml now declares permissions: contents: read at the workflow level. None of these workflows write to the repo, push releases, or comment on issues, so contents: read is the minimum required and matches the existing pattern in release-desktop.yml.

2. Add MLX CI

Mirrors the three test files documented in #5307's tests/studio/README.md into a dedicated workflow so MLX dispatch failures surface as their own check in the PR UI rather than getting buried in Backend CI:

  • test_hardware_dispatch_matrix.py 7-profile parametrized matrix + 2 dispatch-priority canaries
  • test_is_mlx_dispatch_gate.py AST and runtime guard on unsloth._IS_MLX
  • test_mlx_training_worker_behaviors.py contract checks against studio/backend/core/training/worker.py

Triggers when any of unsloth/__init__.py, studio/backend/utils/hardware.py, studio/backend/core/training/worker.py, or one of the three test files changes. Runs on a Linux+CPU runner with all hardware probes spoofed; no Apple Silicon, real GPU, or real MLX install required. Locally validated: 36 passed in 0.41s.

3. Unblock ~60 skipped tests in Repo tests (CPU)

The Backend CI run on #5298 finished with 694 passed, 67 skipped, 23 deselected. Most of those skips were silent: the tests self-skip when prerequisites are missing. This PR installs the missing prerequisites:

  • actions/setup-node@v4 unblocks 9 tests in tests/studio/test_chat_preset_builtin_invariants.py that compile a tiny TypeScript harness against the frontend chat sources. While doing this I noticed the existence checks on those source files used an obsolete unsloth_repo/studio/... path; fixed alongside.
  • pip install uv unblocks the tests/python/* cluster (test_e2e_no_torch_sandbox.py 47 tests, test_studio_import_no_torch.py 29 tests, test_tokenizers_and_torch_constraint.py most of 42 tests) that spawn fresh uv venvs to verify the no-torch install path.

Three test_tokenizers_and_torch_constraint.py cases are deselected because they expose a real bug in studio/backend/requirements/no-torch-runtime.txt: the unpinned tokenizers line resolves to 0.23.1, which transformers rejects with tokenizers>=0.22.0,<=0.23.0 is required. That is a genuine no-torch install regression worth tracking on its own; not surfacing it here would have meant continuing to silently skip the test that catches it. Left as a --deselect with a comment so the next PR that touches no-torch-runtime.txt can drop it.

Locally: 760 passed, 1 skipped, 23 deselected (was 694 / 67 / 23). The remaining 1 skip is tests/studio/install/test_rocm_support.py:1092 which legitimately can't run without proper ROCm imports.

Test plan

  • All five PR-time workflows pass yaml.safe_load.
  • mlx-ci.yml validated locally (36 passed in 0.41s).
  • Full Repo tests (CPU) suite passes locally with the new deselects (760 passed, 1 skipped, 23 deselected).
  • CI green on this PR.

permissions:
- All five PR-time workflows (backend, frontend, inference smoke, tauri,
  wheel) now declare permissions: contents: read at the workflow level,
  matching CodeQL's default-permissions guidance and the existing pattern
  in release-desktop.yml. None of these workflows write to the repo.

skipped tests:
- Repo tests (CPU) job now installs node 22 and uv, which unblocks
  ~60 tests that were silently skipping on CI:
  - 9 tests in tests/studio/test_chat_preset_builtin_invariants.py
    skipped on "node not available". Fixed in this commit; an obsolete
    "unsloth_repo/" prefix in WORKDIR was also pointing the source-file
    existence check at a path that no longer exists.
  - tests/python/test_e2e_no_torch_sandbox.py (47), test_studio_import_no_torch.py
    (29), test_tokenizers_and_torch_constraint.py (most of 42) all spawn
    fresh uv venvs and self-skip when uv is missing.
- Three test_tokenizers_and_torch_constraint.py cases are deselected
  because they expose a real bug in studio/backend/requirements/no-torch-runtime.txt:
  the unpinned tokenizers line resolves to 0.23.1, which transformers
  rejects with "tokenizers>=0.22.0,<=0.23.0 is required". Tracked
  separately as a no-torch install regression.

Locally: 760 passed, 1 skipped, 23 deselected (was 694 / 67 / 23).
Mirrors the three files documented in tests/studio/README.md (PR #5307)
into a dedicated workflow so MLX dispatch failures show up as their own
check on PRs rather than getting buried inside Backend CI:

  - test_hardware_dispatch_matrix.py    7-profile parametrized matrix
                                        + 2 dispatch-priority canaries
  - test_is_mlx_dispatch_gate.py        AST + runtime guard on
                                        unsloth._IS_MLX
  - test_mlx_training_worker_behaviors.py  worker.py contract checks

Triggers on pull_request when any of unsloth/__init__.py,
studio/backend/utils/hardware.py, studio/backend/core/training/worker.py,
or any of the three test files are touched. Runs on a Linux+CPU runner
with hardware spoofs; no Apple Silicon, real GPU, or real MLX install
required. Locally validated: 36 passed in 0.41s.

permissions: contents: read at the workflow level (matching the rest of
the PR-time CI surface).
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the file paths for PRESET_POLICY and RUNTIME_TYPES in the test suite by removing the 'unsloth_repo' directory prefix. Feedback suggests centralizing the common frontend path into a variable to reduce redundancy and improve maintainability.

I am having trouble creating individual review comments. Click here to see my feedback.

tests/studio/test_chat_preset_builtin_invariants.py (11-16)

medium

To improve maintainability and reduce redundancy, you could extract the common path prefix for the frontend source files into a separate variable. This aligns with the principle of centralizing recurring definitions to ensure consistency and simplify maintenance.

CHAT_FEATURES_DIR = WORKDIR / "studio/frontend/src/features/chat"
PRESET_POLICY = CHAT_FEATURES_DIR / "presets/preset-policy.ts"
RUNTIME_TYPES = CHAT_FEATURES_DIR / "types/runtime.ts"
References
  1. Centralize recurring or complex logical checks into a single helper function and reuse it across the codebase to ensure consistency and simplify maintenance.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 107297d756

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/mlx-ci.yml Outdated
pull_request:
paths:
- 'unsloth/__init__.py'
- 'studio/backend/utils/hardware.py'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fix the MLX path filter for hardware changes

This PR path filter points at studio/backend/utils/hardware.py, but the hardware implementation exercised by the MLX tests lives under studio/backend/utils/hardware/hardware.py (there is no file at the configured path). In PRs that only change the real hardware detection module, this focused MLX CI workflow will not run, so regressions in the dispatch gate can merge without this check.

Useful? React with 👍 / 👎.

The MLX CI workflow listed ``studio/backend/utils/hardware.py`` as a
path filter, but no such file exists. The actual layout is

    studio/backend/utils/hardware/
        __init__.py
        amd.py
        hardware.py
        nvidia.py
        vram_estimation.py

so the filter as written would never match. A reviewer modifying
``hardware/hardware.py`` (where ``detect_hardware``, ``DeviceType``,
and ``IS_ROCM`` actually live) would not trigger MLX CI, which
defeats the point of the focused PR gate.

Replace the broken filter with ``studio/backend/utils/hardware/**``
so any change in the hardware probe directory triggers MLX CI, and
add three sibling triggers that each materially affect dispatch:

  - ``unsloth/_gpu_init.py``
        Hosts ``from .models import *`` and the ``from .trainer import *``
        chain. The trainer.py circular-import fix that landed in
        ``23550a8`` lives downstream of this file; a future change
        here can re-introduce the same bug.
  - ``studio/backend/core/inference/mlx_inference.py``
        The MLX inference backend itself. It is the actual consumer
        of ``unsloth_zoo.mlx_loader.FastMLXModel`` whose contract the
        test_mlx_training_worker_behaviors.py AST checks guard.

Local re-run with the fix in place: 36 passed in 0.45s. No other
workflow file or test file is modified.
Replaces the single "Studio boots, loads a GGUF, answers a chat
completion" job with three parallel jobs that each pick the smallest
model that exercises the surface under test. All three jobs share the
install.sh --local --no-torch bootstrap and prime HF_HOME via
actions/cache so cold-cache runs are bounded and warm runs are quick.

1. Studio GGUF CI / OpenAI, Anthropic API tests
   - Model: gemma-3-270m-it UD-Q4_K_XL (~254 MiB).
   - Password rotation: login with bootstrap pw, change to a fresh
     random pw, assert old pw is rejected with 401, assert new pw
     succeeds. Uses the same JWT downstream as a Bearer token against
     /v1/* (the OpenAI/Anthropic compat surface accepts JWTs and
     sk-unsloth- keys interchangeably).
   - OpenAI SDK + Anthropic SDK each run a four-turn conversation
     ("What is 1+1?" / "What did I ask before?" / "What is the capital
     of France?" / "Repeat the city name") with temperature=0.0 and
     seed=3407. Run twice and assert run1 == run2 turn-by-turn so
     non-determinism in the conversation-history wiring is caught.

2. Studio GGUF CI / tool calling tests
   - Model: Qwen3.5-2B UD-IQ3_XXS (~890 MiB).
   - Standard OpenAI function calling with tool_choice=required.
   - Server-side python tool: assert "56088" appears in the answer to
     "What is 123 * 456? Use code to compute it.".
   - Server-side terminal (bash) tool: assert "hello-bash-tool" is
     echoed back.
   - Server-side web_search tool: non-blocking probe (DuckDuckGo
     flakes from CI runners). Asserts the request shape is accepted.
   - enable_thinking=true vs false: assert <think> markers vanish
     when thinking is disabled.

3. Studio GGUF CI / JSON, images
   - Model: gemma-4-E2B-it UD-IQ3_XXS (~2.4 GiB) + mmproj-F16
     (~986 MiB) auto-detected via the HF repo path.
   - response_format = json_schema (strict): asserts the answer parses
     as JSON matching the {city, country} schema.
   - OpenAI image_url (data URI base64): assert non-empty response on
     a 4x4 PNG. Loose on content because small VL quants are weak at
     colour names; the vision path is the part under test.
   - Anthropic source/base64 image: same non-empty assertion against
     the Anthropic Messages endpoint.

Boot strategy:
  - Job 1 keeps `UNSLOTH_API_ONLY=1 unsloth studio` because the
    password-rotation flow only exists in the UI-mode bootstrap.
  - Jobs 2 and 3 use `unsloth studio run --model REPO --gguf-variant V`,
    the one-liner that loads the model and prints the API key on the
    banner. Health is probed by waiting for `sk-unsloth-` to appear in
    the log; the one-liner only prints the banner after load completes.
Job 1 (OpenAI, Anthropic API tests):
  Anthropic SDK appends /v1/messages to base_url itself, so passing
  base_url=f"{BASE}/v1" produced /v1/v1/messages and 405'd. Bare BASE
  is correct (matches the docs' "the SDK appends /v1 automatically").
  OpenAI SDK side already worked: 4-turn transcript was fully
  deterministic across two runs and the "Paris" sanity assertion
  passed.

Job 2 (tool calling tests):
  Booting with --enable-tools forces the process-level tool policy to
  True for every request (state/tool_policy.py:get_tool_policy), which
  hijacked the "Standard OpenAI function calling" test through the
  server-side agentic loop -- the model called web_search instead of
  returning structured tool_calls for the user's `weather_tool`. Drop
  --enable-tools so policy is None (per-request honour). The python /
  terminal / web_search probes already pass enable_tools=True
  explicitly in their request bodies, so they keep working.

Job 3 (JSON, images):
  Two issues. (a) The OpenAI Python SDK rewrites
  response_format={"type":"json_schema",...} into something Studio's
  llama-server backend doesn't accept, so resp came back as the raw
  error string and resp.choices[0] tripped 'str has no attribute
  choices'. Switched to raw HTTP with the `{"type":"json_object",
  "schema":...}` form llama-server actually supports
  (GBNF-from-schema, llama-server extension). (b) Anthropic SDK
  base_url same fix as job 1.
Two new PR-time gates that the existing inference / wheel jobs miss.

Studio Update CI:
  - Runs install.sh --local --no-torch, then `unsloth studio update
    --local` twice, asserting both invocations take the prebuilt
    "up to date and validated" code path with no source-build
    fallback.
  - Boots Studio to /api/health afterwards so a broken update that
    nukes the venv or the llama-server binary surfaces immediately.
  - Triggers when install.sh, studio/setup.sh, the python_stack /
    llama_prebuilt installers, the requirements files, or
    unsloth_cli/commands/studio.py change.

Studio UI CI:
  - Drives the actual frontend bundle in headless Chromium via
    Playwright with the smallest GGUF (gemma-3-270m-it UD-Q4_K_XL).
  - Covers: bootstrap login, must_change_password gate + change form,
    chat composer becomes interactive after model load, sending a
    message produces an assistant bubble with non-empty text, full
    page reload re-hydrates the conversation, configuration sheet
    opens and closes cleanly, and the rotated password is the only
    one that logs in afterwards.
  - This is the first workflow that catches the class of bug 2026.5.1
    shipped: backend healthy + frontend builds, but assistant-ui
    runtime wiring or chat-history persistence broken so the actual
    UI was unusable. Backend-only or wheel-only gates do not see it.
…ct race

The /login route auto-redirects to /change-password as soon as
/api/auth/status returns requires_password_change=true. The original
flow was racing that redirect: it filled #password (login mode) and
clicked submit, but the redirect could land first and the form would
have unmounted before the click. Going straight to /change-password
also matches what main._inject_bootstrap is set up to support: the
HTML on that route ships with `window.__UNSLOTH_BOOTSTRAP__`, which
the change-password form reads to seed the current-password state, so
the user only needs to fill new + confirm. Renumbered screenshots to
match the new step order.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2d9bf084f0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +97 to +99
grep -q "falling back to source build" logs/update2.log && {
echo "::error::second update fell back to source build"
tail -60 logs/update2.log; exit 1; } || true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make the second fallback guard fail the job

When the second unsloth studio update --local logs falling back to source build, this guard still exits successfully because the { ... exit 1; } group is the left side of && ... || true, so the || true runs after the intended failure and the step continues. If the same log also contains one of the accepted prebuilt markers checked next, the workflow passes while the idempotency regression it is meant to catch has occurred.

Useful? React with 👍 / 👎.

GGUF jobs 2 and 3:
  Switched off `unsloth studio run` and over to `UNSLOTH_API_ONLY=1
  unsloth studio` + login flow. Reason: studio.run() resolves the tool
  policy through unsloth_cli/_tool_policy.resolve_tool_policy, which
  defaults to True on loopback. That means set_tool_policy(True) gets
  applied process-wide, and every /v1/chat/completions request is
  routed through the server-side agentic loop -- so Job 2's standard
  function-calling test never gets a structured tool_calls response
  (the model uses web_search instead) and Job 3's response_format
  test gets non-JSON SSE chunks back. API-only mode leaves
  tool_policy=None, which is what each request's `enable_tools` flag
  (or absence thereof) needs to be honoured.

Job 1:
  Anthropic SDK retry: the SDK sends `x-api-key` by default, but
  Studio's auth layer is HTTPBearer-only. Override via
  default_headers={"Authorization": f"Bearer {KEY}"}, which is the
  shape the integration docs suggest.

UI smoke:
  Drop the "history must persist after reload" assertion; Studio's
  thread autosave is async and doesn't reliably land within the CI
  budget. Keep the assertion that matters: the chat composer mounts
  again after a reload and the JWT survived (no /login redirect),
  which is what the 2026.5.1 chat regression actually broke.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b69262c38

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

# produced for turn 3). Lower-cased substring checks keep the
# assertion robust to formatting jitter.
joined = " ".join(first).lower()
assert "1" in first[0], f"{label}: turn-1 answer should contain '1', got {first[0]!r}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Assert the arithmetic result instead

In this smoke test, the first prompt is What is 1+1?, but the assertion requires the reply to contain the character 1. A perfectly correct concise response like 2 (which is common with temperature 0) will fail this workflow even though the API and history wiring are working, causing false-negative CI runs for the new Studio GGUF job.

Useful? React with 👍 / 👎.

Job 2 (tool calling):
  The server-side agentic loop in routes/inference.py:1888 always
  yields SSE chunks -- the request's `stream=False` is honoured for
  the plain passthrough path, NOT for the agentic path. The python /
  terminal / web_search probes were calling json.loads on the raw
  body and tripping JSONDecodeError.
  Added a post_sse() helper that streams the response and accumulates
  text deltas, used for every enable_tools=True call. Function
  calling (which does NOT enable agentic mode) keeps post().

Job 3 (JSON, images):
  Dropped the strict-schema variant of response_format. On the small
  gemma-4-E2B-it UD-IQ3_XXS quant, the GBNF-from-schema path
  occasionally produces empty content. Plain `{"type":"json_object"}`
  is still a real test of Studio's JSON-mode wiring through to
  llama-server, and that's the surface the docs expose. Added
  fence-stripping for chat templates that wrap JSON in ```json blocks.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 76caf50e79

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +776 to +778
openai_text = (openai_resp.choices[0].message.content or "").lower()
print(f"[image/openai] reply: {openai_text!r}")
assert openai_text, "OpenAI image_url returned empty content"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require image-specific evidence in the smoke

When the image payload is accepted but then dropped or not forwarded to llama-server, this still passes because a text-only path can return any non-empty answer to the accompanying text prompt (including an apology or guess). Since this job is meant to guard the multimodal wiring, the assertion should require some image-grounded signal for the red PNG, or otherwise verify that the loaded vision backend actually consumed the image; the same non-empty-only pattern appears in the Anthropic image check below.

Useful? React with 👍 / 👎.

Studio's image normaliser re-encodes embedded base64 images via
stb_image (routes/inference.py:3410) so llama-server gets a uniform
PNG payload. stb_image happily reads the 4x4 PNG as a PIL test, but
rejects it on the inference path with `broken data stream when
reading image file`. 64x64 is small enough to keep token cost
trivial (155 bytes) and large enough to satisfy stb_image's minimum.

Job 1, Job 2, the UI smoke, and the JSON portion of Job 3 are all
green now -- this is the last piece holding Job 3 back.
…mits

studio/install_llama_prebuilt.py lists releases on
ggml-org/llama.cpp via the GitHub API. Unauthenticated calls get
60/hr per source IP, which is fine for one install per workflow but
the new Studio Update CI does install + update + update back-to-back
on the same runner, blowing past the limit and falling back to a
source build (which then fails the idempotency assertion).

Surfaced on the Studio Update CI run with:
  failed to inspect published releases in ggml-org/llama.cpp:
  GitHub API returned 403 ...
  set GH_TOKEN or GITHUB_TOKEN to avoid GitHub API rate limits.

GITHUB_TOKEN with the existing `permissions: contents: read` is more
than enough for unauthenticated read API access (1000/hr, scoped to
the repo). Wired into every install.sh and `unsloth studio update`
step across studio-update-smoke.yml, studio-inference-smoke.yml, and
studio-ui-smoke.yml so a busy runner can't trip the same fallback.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cee324725c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/studio-ui-smoke.yml Outdated
Comment on lines +221 to +223
page.reload()
composer = page.locator('textarea[aria-label="Message input"]')
composer.wait_for(state = "visible", timeout = 60_000)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Verify history survives reload

In the UI smoke, if chat-history persistence or remote-thread rehydration breaks while auth/session restoration still works, this check passes because it only reloads and waits for the composer to reappear. That leaves the regression class the workflow header says it guards (chat-history persistence breaking) uncovered, so the job can stay green even when the prior chat does not survive a reload.

Useful? React with 👍 / 👎.

Rename the job to "Python lint (syntax + ruff + safety nets)" and
expand it from one non-blocking ruff invocation over studio/backend
into four real gates over the whole tree. Total CI time goes from
~8 s to ~12 s, but the previous job was informational; this one
blocks merges on actual breakage.

Steps (in order):
  1. AST/syntax (HARD GATE)
     `python -m compileall -q -j 0 unsloth unsloth_cli studio tests
      cli.py unsloth-cli.py`. Same parser the interpreter uses;
     anything broken here would also crash at `import X` on a user's
     machine. ~3.5 s across 350+ files locally.

  2. ruff check whole repo (HARD GATE)
     The narrow rule set in pyproject.toml [tool.ruff.lint] (E9 /
     F63 / F7 / F82) catches undefined names, broken comparisons,
     and syntax. The whole repo passes today, so the previous
     studio/backend-only `|| true` was masking real breakage on
     the wider tree. <1 s.

  3. Debugger-leftover scan (HARD GATE)
     AST-walk over every committed .py looking for `breakpoint()`,
     `pdb.set_trace()`, or `ipdb.set_trace()` call sites. AST-based
     so commented-out debugger lines don't false-positive (which
     is why a bare grep would not work -- there are three commented
     `# breakpoint()` markers in unsloth/models/rl* today). 0 hits
     locally across 350 files.

  4. SPDX-License-Identifier on studio/backend (WARNING)
     Surfaces drift in the one tree where we already have a strict
     SPDX policy. Currently 3 files missing; warned, not blocked,
     so the rollout can be a separate PR.

  5. ruff format drift (INFO)
     Counts files that would be reformatted by plain `ruff format`.
     Non-blocking because the canonical formatter is
     scripts/run_ruff_format.py = ruff format + the kwarg-spacing
     pass, so plain `ruff format --check` always reports a large
     diff. Once that custom pipeline is wired in, drop
     continue-on-error and add it to the gate.

ruff is pinned to 0.15.12 to match .pre-commit-config.yaml so a
CI-only ruff bump cannot start disagreeing with what pre-commit
already accepted.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42ec2590a2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .github/workflows/studio-backend-ci.yml Outdated
@@ -195,6 +221,91 @@ jobs:
with:
python-version: '3.12'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Run the syntax gate on supported Python versions

Because pyproject.toml still advertises >=3.9,<3.15, pinning this new whole-repo compile gate to only 3.12 leaves supported 3.9/3.10 syntax untested; for example, python3.10 -m compileall ... fails today on unsloth/kernels/moe/tests/common.py:172 while this 3.12 job passes. That means CI can stay green for code that crashes on supported Python versions, so either matrix this gate over the supported minors or narrow the advertised support/exclude 3.12-only helpers.

Useful? React with 👍 / 👎.

Drop the python-lint job from studio-backend-ci.yml and move it into
the dedicated `Lint CI` workflow. Two material changes:

1. License-header check now accepts BOTH header families
   The previous version only counted SPDX-License-Identifier, which
   warned on every Apache-2.0 file in unsloth/, unsloth_cli/, and
   scripts/ (e.g. unsloth/models/llama.py opens with the standard
   `# Copyright ... Daniel Han-Chen & the Unsloth team. All rights
   reserved. # Licensed under the Apache License, Version 2.0` block,
   which is correct, but my SPDX-only regex flagged it).
   New rule: a file is OK if either `SPDX-License-Identifier` or
   `Licensed under the Apache License` appears in the first 20 lines.
   Empty __init__.py files are skipped. Whole-repo coverage instead
   of just studio/backend.

2. Add shell / YAML / JSON parse gates
   - `bash -n` over every committed *.sh (14 today). Same idea as
     compileall: parse-only check.
   - `yaml.safe_load_all` over every *.yml / *.yaml (97 today),
     including .github/workflows/* so a typo in the workflow file
     itself shows up immediately.
   - `json.loads` over every *.json (18 today). Skips
     package-lock.json / bun.lock (huge, machine-generated) and
     tsconfig*.json (TypeScript JSONC convention -- already
     validated by `tsc --noEmit` in Frontend CI).

TypeScript and Rust are NOT duplicated here:
  - Studio Frontend CI runs `npm run typecheck` + `npm run build`
    on every studio/frontend/** change, which is a full TS AST +
    type check.
  - Studio Tauri CI runs `tauri build --debug --no-bundle` on every
    studio/src-tauri/** or studio/frontend/** change, which is a
    full Rust compile.
A duplicate fast-fail step here would burn cache for marginal
value, and the dedicated workflows already block merges.

Lint CI runs on every PR (no path filter): the whole job is
under 30 s of CI time, so paying that on every PR is preferable
to missing a regression on a path the focused workflows skip.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

tests/sh/test_get_torch_index_url.sh \

P2 Badge Trigger installer shell tests on installer changes

These newly added shell tests are installer/unit tests: test_get_torch_index_url.sh, test_mac_intel_compat.sh, and test_tauri_install_exit_order.sh all read install.sh, and test_torch_constraint.sh also reads install.ps1. Because this step lives in Backend CI, whose pull_request.paths filter above only includes studio/**, unsloth/**, unsloth_cli/**, tests/**, pyproject.toml, and this workflow file, an install.sh- or install.ps1-only PR will not run the tests that are supposed to catch installer regressions; add the installer paths to this workflow trigger or move these tests to an unfiltered/installer-triggered workflow.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

The license-header check missed two more legitimate header families
that are committed to the repo today:

  - LGPL-3.0 long form: e.g. unsloth/kernels/rope_embedding.py opens
    with "GNU Lesser General Public License" -- 7 such files under
    unsloth/kernels/.
  - AGPL-3.0 long form: e.g. unsloth/kernels/moe/autotune_cache.py
    opens with "GNU Affero General Public License" -- 2 such files
    under unsloth/kernels/moe/.

Both got flagged as drift on the previous run because the check
only knew about the SPDX one-liner and the Apache-2.0 preamble.
Add a third accepted marker, the substring "General Public License",
which appears in all three GNU long-form preambles (GPL, LGPL,
AGPL) and nothing else. Repo inventory:

   spdx (one-liner)        193 files (mostly studio/)
   apache-longform          55 files (unsloth/, unsloth_cli/)
   agpl-longform             2 files (unsloth/kernels/moe/)
   lgpl/gpl-longform         7 files (unsloth/kernels/)
   no recognised header     85 files (real drift -- mostly tests/)

So the warning count drops from 94 -> 85 with this commit; the
remaining 85 are actual missing headers, surfaced as a non-blocking
warning until the cleanup PR lands.
Three Priority-1 follow-ups from the lint review.

Lint CI gains two non-blocking gates that surface drift without
blocking merges (the same shape as the existing format-drift step):

  - codespell: typo catcher across source / comments / docs. Skips
    lockfiles, generated assets, binary artefacts, LICENSE files.
    ignore-words-list pulls out short identifiers and PyTorch
    idioms (parm/parms, ans, hist, etc.) the default dictionary
    would flag. Local run finds 16 real typos to fix in a follow-up.

  - shellcheck: catches subtle shell bugs `bash -n` doesn't see --
    unquoted expansions, useless cat, `[[ ]]` command substitution,
    etc. SC1090 + SC2034 muted because install/setup scripts
    legitimately source runtime paths and use export-only
    assignments. Critical-path coverage: install.sh, setup.sh,
    tests/sh/.

Both pinned for reproducibility (codespell>=2.3,<3 in pip,
shellcheck via apt-get). Both surface findings in PR annotations
without failing the run; drop continue-on-error after the cleanup
PRs land.

New workflow: Security audit. Runs `pip-audit` against the same
dep set Studio's backend pytest matrix installs, so we audit what
the runtime actually loads (not what pyproject.toml's transitive
resolution might pull in differently). Triggers:
  - PRs touching requirements / pyproject.toml,
  - push to main / pip,
  - nightly @ 04:13 UTC (off-the-hour to dodge cron rush),
  - workflow_dispatch.

The default branch already carries 17 known vulnerabilities per
the dependabot banner, so a hard gate today would block every PR
on a baseline we have not triaged. Non-blocking; full table goes
to GITHUB_STEP_SUMMARY for grep-ability and a 30-day artefact for
historical comparison.

The custom AST anti-pattern scan I prototyped was dropped: every
class of CPU-import-time bug we hit in this PR (bitsandbytes,
torchvision, _cuda_getCurrentRawStream, DEVICE_COUNT==0 stream
init) is already caught by the Repo tests (CPU) job exercising
the actual import on a CPU torch wheel. Restating the rule
in AST form would only add noise.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 518fa38e16

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread scripts/scan_packages.py
Comment on lines +966 to +970
if proc.returncode != 0:
print(
f" [ERROR] pip download (with deps) failed: {proc.stderr.strip()[:500]}",
file = sys.stderr,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail scan when transitive download resolution fails

In --with-deps mode, a non-zero pip download only logs an error and then the script continues scanning whatever files happen to exist in dest, which can be an incomplete or empty set. In that case main() still reports findings only from partial data and can exit 0, so the new security-audit.yml job can report a clean scan even when dependency resolution failed (e.g., conflict/network failure) and much of the tree was never scanned.

Useful? React with 👍 / 👎.

Comment thread scripts/scan_packages.py
Comment on lines +1578 to +1581
key = e["name"].lower().replace("-", "_").replace(".", "_")
if key not in seen:
seen.add(key)
unique_entries.append(e)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep all duplicate package constraints in scan input

This dedup keeps only the first spec per normalized name, dropping later constraints from other requirement sources. The security workflow passes multiple manifests that intentionally overlap (for example studio/backend/requirements/studio.txt pins datasets==4.3.0 while studio/backend/requirements/no-torch-runtime.txt allows a range), so removing later specs changes the resolved dependency set and can scan different versions than the ones actually installed in production paths.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2af52abdf8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/scan_packages.py
Comment on lines +954 to +956
"pip",
"download",
"--dest",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid executing sdists during pre-install scans

When a requirement or transitive dependency only has an sdist (or lacks reusable metadata), this pip download can run build/metadata hooks before the archive scanner gets to inspect the files; I checked the official pip docs, which describe pip download as using the same resolver behavior as install, and pip install documents setup.py egg_info for sdists. Since this workflow is explicitly meant to scan compromised packages without executing them, a malicious sdist can still run code on the CI runner here; use a metadata/API-based fetch path or fail closed on packages that require local metadata generation.

Useful? React with 👍 / 👎.

Comment on lines +111 to +112
for f in studio.txt extras.txt extras-no-deps.txt \
no-torch-runtime.txt overrides.txt triton-kernels.txt; do
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include single-env Studio requirements in the audit

This allowlist misses studio/backend/requirements/single-env/*.txt, even though studio/install_python_stack.py installs SINGLE_ENV / "data-designer-deps.txt" and SINGLE_ENV / "data-designer.txt" in the normal Studio install flow. A PR changing those production dependency manifests still triggers this workflow via requirements/**, but neither pip-audit nor the scan-packages job audits their packages, so a vulnerable or malicious Data Designer dependency can be added without this new security check seeing it; include those files in this list and the matching scan list below.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bd49dd7524

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +111 to +112
for f in studio.txt extras.txt extras-no-deps.txt \
no-torch-runtime.txt overrides.txt triton-kernels.txt; do
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include base requirements in the audit

When a PR changes studio/backend/requirements/base.txt, this workflow is triggered by studio/backend/requirements/**, but the generated audit inputs only copy the six files listed here and omit base.txt; studio/install_python_stack.py installs that file in the normal local/update flows (req = REQ_ROOT / "base.txt"), so a new vulnerable or malicious base dependency can be added while both the pip-audit and scan-packages shards still inspect none of it. Add base.txt to this list and the matching scan matrix/input list below.

Useful? React with 👍 / 👎.

@danielhanchen danielhanchen force-pushed the ci/workflow-permissions-and-fewer-skips branch from bd49dd7 to 9b0f5ff Compare May 6, 2026 23:07
The previous Security audit only covered Studio's backend requirements.
The unsloth pip package itself ships its own dep set via pyproject.toml
(typer/pydantic/pyyaml/nest-asyncio core, plus the huggingfacenotorch
extras: transformers/peft/accelerate/trl/datasets/diffusers/etc.) -- a
malicious upload to any of those would slip past us today. Build a
combined dep list from pyproject.toml + the six Studio requirements
files and feed it to both pip-audit and scan_packages.

Add scan_packages.py at scripts/scan_packages.py so the scanner ships
with the repo and CI does not depend on a network fetch at job time.

Pass --with-deps to scan_packages so the pre-install pattern scan
walks the full transitive closure -- supply-chain attacks usually land
several hops down (litellm 1.82.7 was a dep of a dep for most users;
top-level-only scanning would have missed it).

No installation in either job. pip-audit's -r mode resolves through
PyPI metadata, scan_packages downloads sdist/wheel archives raw and
inspects them without running install hooks. An attacker who has
compromised a transitive dep cannot execute code in this workflow.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a41a315d7e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +206 to +207
STUDIO_OLD_PW: ${{ env.STUDIO_EXTRA_OLD_PW }}
STUDIO_NEW_PW: ${{ env.STUDIO_EXTRA_NEW_PW }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pass extra UI passwords from the runner environment

This step maps STUDIO_OLD_PW/STUDIO_NEW_PW from ${{ env.STUDIO_EXTRA_* }} even though the prior step sets those values via $GITHUB_ENV; these expressions are resolved from the workflow/job env context, so they expand empty here and override the runtime values that tests/studio/playwright_extra_ui.py needs. On the Linux extra-UI smoke this submits blank passwords and fails before exercising Compare/Recipes/Export/Settings; either let the script consume STUDIO_EXTRA_* directly or write STUDIO_OLD_PW/STUDIO_NEW_PW into $GITHUB_ENV in the previous step.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 00f3e325ce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread scripts/scan_packages.py
Comment on lines +1211 to +1214
"download",
"--dest",
dest,
] + specs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid executing sdists in the package scanner

When the security-audit job runs scan_packages.py --with-deps, this pip download invocation can still execute untrusted sdist build hooks while resolving/downloading packages. I checked python -m pip download --help: --no-build-isolation is described in terms of building a source distribution, and --only-binary :all: is the option that disables source packages; without that guard, a malicious transitive dependency that only ships an sdist (or whose wheel is unavailable for this runner) can run code before the scanner inspects the archive, defeating the workflow’s “download without executing” threat model. Add --only-binary=:all: or otherwise fetch archives through a path that never builds metadata.

Useful? React with 👍 / 👎.

danielhanchen and others added 2 commits May 8, 2026 23:33
Mac Chat UI Tests on commit 00f3e32 timed out 60s into
page.fill('#confirm-password') (run 25578374480 / job 75091072289).
The previous fix (3274f72) wrapped the post-submit composer wait
but left the form-fill sequence single-shot. Same root cause as
the original 25497245250 / 74820324136 case but a step deeper:
pw_field.fill('#new-password') succeeds, then a re-render
between the two locators detaches '#confirm-password' and the
second fill burns the 60s ceiling.

Wrap the entire goto + settle + locator + fill + submit sequence
in a 3-attempt retry. Each retry re-navigates page.goto() with
wait_until='domcontentloaded' (fresh DOM, fresh form) and spawns
a new page in the same context if the old one died. Diagnostics
on each failed attempt: page.url, page_errors, console_errors,
screenshot.

Same hardening applied to playwright_extra_ui.py which has the
same change-password flow.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c876296dca

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +265 to +266
STUDIO_OLD_PW: ${{ env.STUDIO_EXTRA_OLD_PW }}
STUDIO_NEW_PW: ${{ env.STUDIO_EXTRA_NEW_PW }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Read extra-UI passwords from runtime env on macOS

This step sets STUDIO_OLD_PW/STUDIO_NEW_PW from ${{ env.STUDIO_EXTRA_* }}, but those STUDIO_EXTRA_* values are produced in the prior step via $GITHUB_ENV (runner environment), not declared in workflow/job/step env. In Actions, env context expressions are resolved from the workflow env context, so these resolve empty and override the variables the Playwright script expects, causing the macOS extra-UI smoke to submit blank passwords and fail before exercising Compare/Recipes/Export/Settings.

Useful? React with 👍 / 👎.

danielhanchen and others added 2 commits May 9, 2026 00:02
Extend the cross-version compat canary to catch ~80% of upstream
drift before a user hits it. Static checks only (GitHub raw fetch +
grep), CPU-only, runs PR-time + daily cron. 906 pass, 73 skipped.

TRL coverage extended:
- TRL_TAGS expanded from 12 to 28 (every stable release >=0.18.2,
  including the broken 0.19.0, plus main). Anchors: 0.22.2 / 0.27.1
  / 1.0.0 marked.
- Fix `__version__` parser to handle the TRL 0.22.x pattern
  (`__version__ = f.read()` from sibling VERSION file).
- Fix `has_def` in _fetch.py to allow indented matches so class
  methods are detected (the original anchored ^def only matched
  module-scope definitions).
- New tests for symbols the audit found we touch but didn't check:
  is_conversational, sft_trainer module + neftune_post_forward_hook,
  dpo_trainer module + MODEL_FOR_VISION_2_SEQ_MAPPING_NAMES,
  trl.trainer.utils.ConstantLengthDataset (gated),
  trl.models.utils.disable_gradient_checkpointing (gated >=1.0.0),
  trl.import_utils + _*_available cache pattern,
  trl.experimental.openenv.utils generators (one of two names),
  GRPOTrainer required methods (_prepare_inputs,
  _generate_and_score_completions, compute_loss; per-token-logps
  legacy/new dispatch), GRPOTrainer source must contain
  torch.inference_mode + accelerator.unwrap_model fingerprints,
  KTOTrainer.get_batch_logps (now lives at trl.experimental.kto
  on TRL 0.27+ — accept either path),
  SFTTrainer class existence, DPOTrainer methods (informational),
  chat-template propagation (legacy maybe_apply_chat_template OR
  successor apply_chat_template + chat_template_kwargs),
  truncate_with_protected_tokens informational.
- Tighten test_unwrap_model_for_generation_either_path to mirror
  the prod fallback exactly (drop unused trl/extras/profiling.py
  candidate).
- Replace test_trl_generation_vllm_generation_gated symbol set with
  the actual unsloth dependency (VLLMGeneration class + _init_vllm
  / sync_weights / generate methods, not VLLMClient/etc).

PEFT coverage extended (driven by the 8 PR audit unsloth#5015,
#5167, #5036, #4807 + unsloth-zoo#618, #596, #482, #430):
- VARIANT_KWARG_KEYS const (peft 0.18+; injected by zoo#430)
- ParamWrapper class + members (peft 0.18+; needed by zoo#618)
- LoraConfig.target_parameters (peft 0.19+)
- LoraModel._create_and_replace (signature pin for unsloth#4807)
- transformers_weight_conversion module + build_peft_weight_mapping
  (unsloth#5167 wraps this)
- integrations.dequantize_module_weight (3 callsites)
- PeftType.LORA (vllm_utils.py:2520)
- ModulesToSaveWrapper (both peft.utils.* paths)
- PeftModel.from_pretrained method exists
- peft.__version__ parseable

Transformers coverage added (driven by the 16-PR audit):
- New file test_transformers_pinned_symbols.py with 19 test
  categories x 12 transformers tags (4.57.6 floor + 5.0..5.8 + main).
  Anchors: 4.57.6 + 5.5.0.
- Trainer surface (compute_loss num_items_in_batch param,
  training_step grad-accum fingerprints, get_batch_samples
  num_items contract, inner_training_loop _tr_loss inplace v5)
- modeling_utils.checkpoint alias for unsloth-zoo#549
- PushToHubMixin._create_repo presence (unsloth-zoo#393)
- integrations.bitsandbytes module + Linear4bit reference
- quantizers.should_convert_module signature (zoo#491/#488)
- FP8Linear bias/has_bias rename (zoo#572)
- processing_utils.Unpack importable (zoo#583/584)
- gemma3 Gemma3Attention class + gpt_oss GptOssModel class
- auto_factory _LazyAutoMapping private API (unsloth#5155)
- configuration_utils PretrainedConfig/PreTrainedConfig alias
- tokenization_utils_base.apply_chat_template
- modeling_attn_mask_utils symbols
- cache_utils Cache + DynamicCache classes
- training_args.ParallelMode importable

Wire the new transformers job into version-compat-ci.yml (matrix
of 5 PR-time symbol jobs + zoo-imports under spoof + daily fresh-
fetch cron).

Local smoke: 906 pass, 73 skipped (gated optional features) across
vLLM + TRL + PEFT + ST + bnb + transformers suites.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 200822cf93

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread scripts/scan_packages.py
Comment on lines +1223 to +1226
print(
f" [ERROR] pip download (with deps) failed: {proc.stderr.strip()[:500]}",
file = sys.stderr,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail scan when transitive pip download errors out

When --with-deps resolution fails, this branch only logs an error and continues, so downloaded can be empty and main() still exits 0 because it only fails on HIGH/CRITICAL findings. In the security-audit.yml pip-scan-packages job (which always uses --with-deps), a transient PyPI failure or resolver conflict can therefore produce a green scan result even though no archives were actually inspected.

Useful? React with 👍 / 👎.

danielhanchen and others added 5 commits May 9, 2026 00:41
Two coverage extensions per follow-up:

bnb matrix: from 2 tests to 12 categories per tag, derived from a
full grep of unsloth + unsloth-zoo. Adds:
- bitsandbytes.matmul_4bit (top-level export)
- bnb.functional 4-bit kernel path: legacy `lib.cdequantize_*` (bnb
  <=0.48) OR new torch.ops.bitsandbytes.dequantize_* (bnb >=0.49) —
  passes either, fails if neither is wired
- bnb.functional.get_ptr (binding at unsloth/kernels/utils.py:233)
- bnb.functional.QuantState class + from_dict classmethod
  (zoo monkey-patches `QuantState.from_dict = ...`)
- bnb.nn.modules.fix_4bit_weight_quant_state_from_module (optional)
- bnb.nn.Linear8bitLt (legacy load_in_8bit path)
- bnb.optim.optimizer.Optimizer2State (PagedAdamW32bit base)
- bnb.utils.{pack_dict_to_tensor, unpack_tensor_to_dict}
  (state-dict save/load)
- bnb.cextension.ROCM_WARP_SIZE_64 (optional, AMD ROCm path)
- bnb.autograd._functions.matmul_4bit (dynamo-disable probe site)
- bnb.__version__ exported via any known mechanism (the 6 floor
  gates at 0.43.3, 0.46.0, 0.48.2.dev0, 0.49.0, 0.49.2 all read it)

Extended zoo-import smoke: from 5 narrow tests in
tests/vllm_compat/test_unsloth_zoo_imports.py to 32 tests in the
new tests/vllm_compat/test_extended_module_imports.py:
- 20 unsloth_zoo modules sweep (compiler, dataset_utils,
  device_type, empty_model, gradient_checkpointing, hf_utils,
  llama_cpp, logging_utils, loss_utils, patching_utils,
  patch_torch_functions, peft_utils, rl_replacements,
  saving_utils, tiled_mlp, tokenizer_utils, training_utils,
  utils, vision_utils, compiler_replacements). Each must import
  cleanly under the existing _zoo_aggressive_cuda_spoof harness;
  drift in transformers / peft / bnb symbols pinned at module-top
  trips here BEFORE any user-visible call.
- 7 unsloth.models.* core modules sweep (rl, rl_replacements,
  sentence_transformer, _utils, loader, loader_utils, mapper).
- _IS_MLX must be False on a non-Apple-Silicon spoof runner
  (catches MLX gate logic too lax in unsloth/__init__.py).
- FastLanguageModel/Vision/Model surface dump: from_pretrained +
  get_peft_model methods must be reachable on the dumped class.
- RL_FUNCTIONS dispatch table populated with grpo_trainer +
  sft_trainer + dpo_trainer keys (catches "imports cleanly but
  silently empty dispatch").
- unsloth_zoo.compiler.test_apply_fused_lm_head must be callable.
- FastModel.from_pretrained signature has model_name +
  max_seq_length + load_in_4bit kwargs (every Colab notebook
  calls these by name).

Wired into the existing zoo-imports-under-spoof job in
.github/workflows/version-compat-ci.yml.

Local smoke: 49 bnb pass, 28 extended-import pass + 4 skipped (env
quirks). Full version_compat suite: 947 pass, 76 skipped.
…ac buffer)

Run 25586582979 + 25586583008 + 25586583024 surfaced three real issues
on commit a975d58. All addressed:

1. version-compat-ci.yml `zoo-imports-under-spoof` job — every
   `import unsloth_zoo.<module>` failed with
     `Exception: No package metadata was found for torchcodec`
   transformers 5.x's `audio_utils.py:55` does
     `version.parse(importlib.metadata.version("torchcodec"))`
   UNCONDITIONALLY at module top, which trickles up through
   transformers.processing_utils -> unsloth_zoo.vision_utils -> the
   whole zoo import path. Fix: pip install `torchcodec<0.10` in the
   workflow alongside torch + torchvision (CPU wheel exists; the
   <0.10 cap mirrors the torch 2.10 / torchvision 0.26 ABI window
   already pinned).

2. studio-backend-ci.yml "Repo tests (CPU)" job — pytest's
   auto-discovery pulled in the new tests/vllm_compat/ +
   tests/version_compat/ files which require a heavier dep set
   (transformers/peft/bnb pins, torchcodec) than the Backend CI
   install line provides. Failed with
     `ImportError: cannot import name 'IterableDataset' from 'datasets'`
   (datasets 4.x removed the legacy export from the package root).
   Fix: --ignore=tests/vllm_compat + --ignore=tests/version_compat
   in the auto-discovery step. Both directories have a dedicated
   job in version-compat-ci.yml that installs the right dep set.

3. tests/studio/playwright_chat_ui.py — Mac Chat UI hit
     `net::ERR_NO_BUFFER_SPACE` after the change-password POST
   under --single-process Chromium on the macos-14 free runner; the
   page stayed on /change-password and BOTH composer.wait_for
   retries timed out at 60s each. The page.goto(BASE) recovery
   couldn't recover because the auth state never persisted. Fix:
   wrap the submit-button click in
     `page.expect_response("/api/auth/change-password" + POST,
                           timeout=30_000)`
   so the buffer-error surfaces immediately in the failing attempt
   rather than at the next composer.wait_for. The next retry
   iteration starts cleanly with a known-bad initial state. Falls
   back to fire-and-forget click if the response wait itself
   throws (so we don't introduce a new failure mode).

Local smoke after fixes: 975 pass, 80 skipped across version_compat
+ vllm_compat suites.
… throttling

Both playwright_chat_ui.py and playwright_extra_ui.py reimplemented the
same set of CI-runner workarounds (Chromium launch flags, view-transition
CSS killer, change-password retry, page-recovery). When one diverged the
other slowly rotted: the macos-14 / windows-latest / ubuntu-latest
failure modes are mostly identical so the cure is the same.

New module tests/studio/_playwright_robust.py is the single point of
truth, providing:

  - chromium_launch_args(platform): bundles macos-14 stability set
    (--single-process for the pipeTransport JSON-RPC crash) PLUS new
    throttling-kill flags (--disable-background-timer-throttling,
    --disable-renderer-backgrounding, --disable-backgrounding-occluded-
    windows, --disable-features=TranslateUI, --disable-ipc-flooding-
    protection) that prevent Chromium from deprioritising the headless
    context's CPU/timers when it thinks the window is backgrounded --
    which CI runners routinely flag.
  - install_view_transition_killer(ctx): the duplicated init script.
  - wait_for_health(base_url): pre-flight server probe inside the
    script -- catches the macos-14 gap where /api/health responds 200
    while the auth DB hasn't finished migrating.
  - recover_or_replace_page(page, ctx): canonical "page died mid-test"
    helper. Replaces the page if closed, optionally re-navigates +
    waits for networkidle.
  - click_and_wait_for_response(page, url_substr, do_click): generic
    POST-and-wait pattern that surfaces server-side 4xx / buffer-fail
    immediately. Now used by both files' change-password submit
    (parity -- previously only chat_ui had this).
  - dump_diagnostics(page, art_dir, name): screenshot + DOM excerpt +
    URL + localStorage keys JSON sidecar. Available for any future
    failure dump site.
  - BENIGN_PAGE_ERROR_PATTERNS / BENIGN_CONSOLE_ERROR_PATTERNS shared
    between the two files. Adds net::ERR_NO_BUFFER_SPACE +
    AbortError + chunk-load to the console-side filter so the
    diagnostic dump count tracks real signal.

Net effect: ~230 lines drop from chat_ui, ~146 from extra_ui, +401
shared. Total LOC down slightly. Behaviour preserved -- existing
retry windows / timeouts / fail conditions all unchanged.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16c4f655dc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +181 to +182
--ignore=tests/vllm_compat \
--ignore=tests/version_compat \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep compat suites in a PR-time workflow for all model changes

Ignoring tests/vllm_compat and tests/version_compat here removes these checks from the always-on backend PR gate, but the replacement workflow only triggers for a narrow path set (version-compat-ci.yml watches rl.py, rl_replacements.py, and sentence_transformer.py). That means PRs changing other modules covered by the compat suite (for example unsloth.models.loader/loader_utils imported by tests/vllm_compat/test_extended_module_imports.py) will now merge without running the compatibility canaries, so symbol/import regressions can slip through until the daily cron.

Useful? React with 👍 / 👎.

- actions/checkout v4.3.1 -> v6.0.2
- actions/setup-python v5.6.0 -> v6.2.0
- actions/setup-node v4.4.0 -> v6.4.0
- actions/upload-artifact v4.6.2 -> v7.0.1
- actions/cache @v4 (mutable) -> @27d5ce7f...  # v5.0.5 SHA-pinned (15 sites)
- actions/upload-artifact @v4 in wheel-smoke.yml -> SHA-pinned to v7.0.1

The 16 mutable @v4 references were exactly the @v0 / @v2 / @latest
class of reference the security-audit.yml comments call out as the
litellm / tj-actions attack surface, so they should never have shipped
as bare tags alongside the other SHA pins in this PR.

actions/cache v4 -> v5 regenerates the internal cache version hash,
so existing v4-saved caches (including the GGUF cache reused across
the studio smokes) miss once on first run after merge and then
re-populate. No semantic change beyond that.

Also corrects the dtolnay/rust-toolchain comment in security-audit.yml
and studio-tauri-smoke.yml: 29eef336d9 is the current stable branch
tip but its commit date is 2026-03-27, not 2026-05-07 as the comment
claimed.

release-desktop.yml intentionally left untouched (still on v4.3.1
checkout + v4.4.0 setup-node + older swatinem/rust-cache and unpinned
tauri-action). That file is outside the scope of this PR and should
get its own bump in a follow-up.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 960e68d98c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +341 to +345
semgrep scan \
--config p/supply-chain \
--config p/python \
--config p/javascript \
--config p/security-audit \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Load the custom Semgrep rules in CI

The new .semgrep/unsloth-rules.yml says it is wired into this workflow, but this Semgrep invocation only loads the stock packs, so none of the custom CVE-shape rules in that file ever run. Since the PR path filter also omits .semgrep/**, edits to those rules would not even trigger the audit workflow; add --config .semgrep/unsloth-rules.yml here and include the rules path in the trigger.

Useful? React with 👍 / 👎.

Comment on lines +66 to +70
- 'studio/backend/core/inference/mlx_inference.py'
- 'tests/studio/test_hardware_dispatch_matrix.py'
- 'tests/studio/test_is_mlx_dispatch_gate.py'
- 'tests/studio/test_mlx_training_worker_behaviors.py'
- 'tests/studio/run_real_mlx_smoke.py'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Trigger MLX prebuilt smoke for installer changes

This workflow later runs the real macOS prebuilt path via python studio/install_llama_prebuilt.py ..., but the PR path filter does not include studio/install_llama_prebuilt.py (or the setup script that passes the simple-policy flags). When a PR changes only that installer code, the Apple Silicon prebuilt GGUF smoke is skipped even though it is the only job validating that path; add the installer/setup files to this trigger list.

Useful? React with 👍 / 👎.

danielhanchen added a commit that referenced this pull request May 11, 2026
* fix: unblock 4 tests deselected/skipped in #5312 (real bugs)

PR #5312 surfaced two real regressions by turning previously-silent
skips into explicit `--deselect` / `pytest.skip(...)` blocks. Both
were left as follow-ups rather than fixed in that PR. This PR fixes
the underlying bugs so the suppressions can be dropped.

1. studio/backend/requirements/no-torch-runtime.txt: pin tokenizers

   Installing with `--no-deps -r no-torch-runtime.txt` (the path
   install.sh takes for the no-torch / GGUF-only mode) resolves
   transformers to 5.3.0 and tokenizers to the latest available
   (0.23.1). transformers 5.3.0 requires
   `tokenizers>=0.22.0,<=0.23.0`, so `from transformers import
   AutoConfig` then fails at import time:

       ImportError: tokenizers>=0.22.0,<=0.23.0 is required for a
       normal functioning of this module, but found
       tokenizers==0.23.1.

   Pin `tokenizers>=0.22.0,<=0.23.0` to match the constraint
   embedded inside every transformers version in the allowed window
   (4.56.0..5.3.0). Verified locally: a fresh `uv venv` + `uv pip
   install --no-deps -r no-torch-runtime.txt` followed by
   `from transformers import AutoConfig` now succeeds.

   Unblocks 3 deselected cases in studio-backend-ci.yml:
     - TestE2ETokenizersFix::test_autoconfig_works_with_no_torch_runtime
       (parametrized py 3.12 + 3.13 -> 2 cases)
     - TestE2EFullNoTorchSandbox::test_autoconfig_succeeds

2. unsloth/models/rl.py: defensive wrapper for _patch_trl_rl_trainers

   _patch_trl_rl_trainers has many internal `try: ... except: ...
   return` branches, but several paths (notably inspect.getsource on
   the thin wrappers TRL 1.x leaves in trl.trainer for trainers that
   moved to trl.experimental) can still propagate exceptions. The
   umbrella patch_trl_rl_trainers() ring-fences each call with
   try/except + warning_once, but direct callers (the CI shim in
   consolidated-tests-ci.yml, downstream tools, end-user scripts)
   used to see the raw exception, which forced #5312's CI heredoc to
   ring-fence with:

       except Exception as e:
           # TRL 1.x renames break the patch helper internally; we
           # accept that here and skip rather than fail the cell.
           pytest.skip(f"_patch_trl_rl_trainers raised: ...")

   Rename the existing implementation to _patch_trl_rl_trainers_impl
   and make _patch_trl_rl_trainers a thin wrapper that catches any
   uncaught exception and routes it through logger.info, matching
   the umbrella wrapper's behaviour. Power users who want the raw
   raising behaviour for their own diagnostics can still call
   _patch_trl_rl_trainers_impl directly.

   Adds tests/python/test_patch_trl_rl_trainers_defensive.py to lock
   the contract: the wrapper must never raise, and it must delegate
   to the impl on the happy path.

   Unblocks 1 skip in consolidated-tests-ci.yml's
   test_compile_sft_trainer_patch.

Follow-up for #5312 once this lands: drop the two `--deselect` lines
in studio-backend-ci.yml's repo-cpu-tests step and drop the
`except Exception ... pytest.skip(f"_patch_trl_rl_trainers raised: ")`
block in consolidated-tests-ci.yml's test_compile_sft_trainer_patch.

* chore: tighten comments and docstrings in the new code

Drop verbose justifications down to one or two lines per site.
The PR description carries the full context; in-file comments
only need to point at the WHY.

* chore(no-torch-runtime): drop redundant lower bound on tokenizers

tokenizers 0.23.0 was never published to PyPI (versions go 0.22.2 ->
0.23.1), so `tokenizers<=0.23.0` resolves to 0.22.2 in practice, the
same version the explicit >=0.22.0,<=0.23.0 pin resolved to. Verified
on Python 3.12 and 3.13.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4b8e886fac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +297 to +298
STUDIO_OLD_PW: ${{ env.STUDIO_EXTRA_OLD_PW }}
STUDIO_NEW_PW: ${{ env.STUDIO_EXTRA_NEW_PW }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Read Windows extra-UI passwords from the runner env

In the Windows extra-UI smoke, these ${{ env.STUDIO_EXTRA_* }} expressions are evaluated before the job runs and do not see the values written to $GITHUB_ENV in the previous step, so they expand to empty strings and override the runtime environment. When this step runs, playwright_extra_ui.py receives blank STUDIO_OLD_PW/STUDIO_NEW_PW and fails at login/password rotation instead of exercising the Compare/Recipes/Export/Settings paths; let the script consume the STUDIO_EXTRA_* variables directly or write STUDIO_OLD_PW/STUDIO_NEW_PW to $GITHUB_ENV before this step.

Useful? React with 👍 / 👎.

The previous gate triggered only on changes to rl.py, rl_replacements.py,
and sentence_transformer.py, but the symbol-existence tests cover EVERY
pinned upstream reference in unsloth. A new `from peft.foo import Bar`
added in unsloth/kernels/whatever.py is the same class of compat
regression as one added in unsloth/models/rl.py, and was previously
slipping through this gate.

Cost is small: the job is CPU-only raw-fetch + grep against pinned
upstream tags, ~1 minute end-to-end.
danielhanchen added a commit that referenced this pull request May 11, 2026
* fix(tests/sh): accept pinned tokenizers line after #5359

#5359 pinned the tokenizers line in
studio/backend/requirements/no-torch-runtime.txt from bare
`tokenizers` to `tokenizers<=0.23.0` to stop pip from resolving to
0.23.1+ (which transformers rejects at import time). The shell test
in tests/sh/test_torch_constraint.sh was still asserting the literal
`^tokenizers$` regex, which fails on the pinned form. Surfaced as
a hard fail on PR #5312's Backend CI Repo tests (CPU) step:

  === Structural: tokenizers in no-torch-runtime.txt ===
    FAIL: tokenizers present as standalone line (expected '1', got '0')
    FAIL: tokenizers before transformers (expected 'yes', got 'no')

Relax the regex to `^tokenizers([<>=!,~ ]|$)` so it matches both bare
and version-constrained forms, which preserves the original intent
(verify tokenizers is present in the file, before transformers).
Verified locally: 24 PASS, 0 FAIL.

* fixup(tests/sh): tighten tokenizers check to guard the safe bound

Address bot review feedback on #5361:

* Codex (P2): the previous relaxed regex `^tokenizers([<>=!,~ ]|$)`
  accepted any version operator, so `tokenizers>=0.23.1` would still
  pass the test even though that line would re-introduce the import
  failure #5359 fixed.

* Gemini (medium): the boundary char class did not cover all PEP 508
  separators (`[`, `;`, `@`).

Replace the single check with two:

1. Loose: `^tokenizers([^a-zA-Z0-9._-]|$)` confirms the package is
   listed (covers extras, env markers, URLs, bare line).

2. Tight regression guard: pipe those lines through a second grep
   that requires `<=0.23.0` or the functionally equivalent
   `<0.23.1`. Rejects bare `tokenizers`, `>=0.22.0` (no upper
   bound), `>=0.23.1`, `!=0.23.0`, `<=0.24.0`, etc.

Verified locally:
- Current main (tokenizers<=0.23.0): 25 PASS, 0 FAIL.
- Spot-check with the bug reverted (bare `tokenizers`): the new
  "tokenizers pinned with upper bound excluding 0.23.1+" check
  FAILS as intended; the original "listed" check still passes.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5506d8c169

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +182 to +183
python unsloth/scripts/notebook_validator.py refresh-colab \
--out unsloth/scripts/data/colab_pip_freeze.gpu.txt
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve committed Colab snapshot before strict diff

In the scheduled/dispatch colab-oracle job, this refresh writes the live Colab pip-freeze.gpu.txt directly over unsloth/scripts/data/colab_pip_freeze.gpu.txt immediately before colab-diff --strict compares --snapshot-dir unsloth/scripts/data. When Colab's pip image drifts, the strict diff reads the already-updated file as the “committed” snapshot and reports no pip-freeze drift, so the cron check that is supposed to fail within ~24h stays green. Write the live freeze to a temp path for linting, or run the strict diff before mutating the checkout.

Useful? React with 👍 / 👎.

@danielhanchen danielhanchen merged commit 6d4e6f2 into main May 11, 2026
54 checks passed
@danielhanchen danielhanchen deleted the ci/workflow-permissions-and-fewer-skips branch May 11, 2026 10:19
CodeMan62 pushed a commit to CodeMan62/unsloth-zoo that referenced this pull request May 14, 2026
…sses (unslothai#628)

* tests: skip MoE LoRA extractor coverage when discovery finds zero classes

test_every_patched_moe_experts_class_has_lora_extractor asserts that
every class flagged `_unsloth_already_patched=True` whose source defines
3D `gate_up_proj`/`down_proj` parameters also has
`_unsloth_lora_extractor_fn` registered. The defensive precondition was
"discovery produced at least one patched class"; failing that, the
test raised AssertionError("Discovery produced zero patched MoE classes
... Investigate.").

That precondition fires on any transformers version older than the MoE
surface unsloth_zoo patches. Specifically, with transformers==4.57.6
none of the gemma4 / llama4 / qwen3_moe / mxfp4 classes the
TEMPORARY_PATCHES list targets exist, so every patch fn no-ops and
discovery returns []. The "zero discoveries" outcome on an older
transformers is not a regression of the bug being prevented (a class
patched-but-missing-extractor); it just means the relevant classes
aren't there to patch.

Switch the precondition from `assert patched, ...` to `pytest.skip(...)`
with a comment explaining the two indistinguishable cases. CI cells
running newer transformers (where classes do exist) will still exercise
the real assertion below; the older-transformers cell skips cleanly.

Surfaced by the consolidated CPU-CI matrix on unslothai/unsloth#5312
in the cell pinned to `transformers==4.57.6`.

* tests: distinguish missing surface from discovery regression

Per codex/gemini review on PR unslothai#628: replacing the original assert with
an unconditional skip created a silent-failure path -- a real
discovery regression on a transformers that DOES ship the grouped-MoE
surface would now skip rather than fail, defeating the test.

Disambiguate the two zero-discovery causes by walking transformers
once and asking whether ANY class anywhere in transformers.models.*
declares `gate_up_proj` + `down_proj` as 3D `nn.Parameter`s -- the
exact filter `_looks_like_grouped_moe_experts` runs against discovered
classes. If at least one such class exists AND discovery returned
zero, that is a real regression: fail loud. If no such class exists,
the patch surface this test targets is genuinely absent on this
transformers version: skip. This avoids hard-coding model names that
change as transformers grows new MoE families.

Also update terminology in the comment + skip / fail messages: the
load-bearing marker `_has_unsloth_patched_forward` reads is
`_original_<modeling_tail>_<ClassName>_forward`, not the per-family
`_unsloth_already_patched` flag.
CodeMan62 pushed a commit to CodeMan62/unsloth-zoo that referenced this pull request May 14, 2026
Without a guard, re-running `higher_precision_softmax` on already-
rewritten source appends another `.to(<variable>.dtype)` cast each
pass. The regex matched the inner `softmax(...)` but the existing
`.to(x.dtype)` suffix was outside the matched span, so
`source.replace(full_match, new)` left the suffix in place and the
new replacement added a fresh `.to(x.dtype)`, producing:

    softmax(x, dim=-1, dtype=torch.float32).to(x.dtype).to(x.dtype)

after the second pass.

Add a negative lookahead `(?!\s*\.to\(\s*\2\s*\.dtype\s*\))` so the
finditer skips matches whose softmax(...) is already followed by
`.to(<variable>.dtype)`. Idempotent in all three cases:
  - no-dtype: `softmax(x, dim=-1)`
  - with-dtype: `softmax(x, dim=-1, dtype=torch.bfloat16)`
  - mixed `nn.functional.softmax` and `F.softmax`

Why this matters: defensive idempotency prevents source-bloat when a
single function passes through `create_new_function` more than once
(e.g. a re-compile triggered by `UNSLOTH_COMPILE_OVERWRITE=0` +
transformers version mismatch); also makes the upstream test
suite's CI assertion `f(f(s)) == f(s)` (added in
unslothai/unsloth#5312's consolidated CI) hold.

Verified: full `tests/` suite still passes (172 / 172) plus the new
idempotency local check.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants