tests: unblock three stale assertions broken on main (MLX CI + Backend CI)#5803
Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
bbe2d0e to
1500b14
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces test fixes and improvements across several test files. In test_desktop_auth.py, the guarded_import helper was updated to only block absolute imports (where level == 0), preventing relative imports within third-party packages from being incorrectly blocked. In run_real_mlx_smoke.py, the hardcoded assertion for logged steps was replaced with a dynamic check against config.max_steps. Lastly, in test_composer_rtl_bidi_attribute.py, the search anchor for the main composer test was updated to target the inner string literal "Message input" instead of the full JSX attribute. No review comments were provided for these changes.
MLX CI on Mac M1 + Backend CI (both Repo tests CPU and Python 3.10/11/12/13) have been red on every push to main for days. None of the underlying code is wrong; three test files have stale anchors / assertions left behind by PR #5537 (max_steps bump) and PR #5775 (composer + provision-desktop-auth). 1. tests/studio/run_real_mlx_smoke.py:393 PR #5537 bumped max_steps from 7 to 30 for seed-robust convergence but left `assert len(losses_per_step) == 7`. With logging_steps=1 the callback fires once per step; 30 entries, not 7. Track config.max_steps so the gate auto-follows future bumps. 2. tests/studio/test_composer_rtl_bidi_attribute.py:29 PR #5775 changed the composer aria-label from the literal `aria-label="Message input"` to a JSX ternary `aria-label={overlay ? "Image edit instructions" : "Message input"}`. Anchor on the inner string literal `"Message input"` instead. 3. studio/backend/tests/test_desktop_auth.py:487 The guarded_import in test_provision_desktop_auth_writes_secret_and_creates_db_without_backend_deps blocks any import whose name == "utils", including the relative `from .utils import echo` inside typer._click.decorators (typer 0.25+). Gate the block on level == 0 so only absolute imports of `utils` / `auth` / `fastapi` / `structlog` are rejected; relative imports inside third-party packages pass through. All three tests pass locally; the MLX one is a mechanical 7->config.max_steps swap and will be exercised by MLX CI on this PR.
for more information, see https://pre-commit.ci
5b6bc15 to
69f37f8
Compare
…k-glm-kimi Resolves three conflicts against the updated 5620 base (which itself merged main after main moved on with PRs unslothai#5735 / unslothai#5775 / unslothai#5803 etc. touching the same routes/inference.py and tool_call_parser.py surface): * studio/backend/core/inference/tool_call_parser.py ``_TOOL_CLOSED_PATS``: kept 5624's full set (Mistral pre-v11 array, Mistral v11+ name{json}, DeepSeek envelope, Kimi section) on top of the 3-pattern base. New 5620 base reverted to the 3 base patterns because main never carried the tool-format extensions. * studio/backend/routes/inference.py Merged the two regex bodies: kept 5624's elaborate python_tag ``(?:[^<]|<(?!\|))*`` clause and the new-family closed-pair patterns (DeepSeek envelope, Kimi section). DROPPED the inlined Mistral patterns in favour of the base's ``_strip_tool_xml`` helper which delegates Mistral handling to the parser module's ``_strip_mistral_closed_calls`` -- the non-greedy ``\{.*?\}`` form truncates at the first ``}`` of a nested JSON arg, so balanced brace/bracket scanning is correct here. Also kept the base's orphan-close and tail-only ``</parameter>`` patterns from the speculative-buffer split boundary work. Net: 9 call sites continue to use ``_strip_tool_xml(...)`` (Mistral-safe). * studio/backend/tests/test_safetensors_tool_loop.py ``TestRoutesPythonTagStrip``: kept the base's wording for the section header (the two were near-identical) and switched the helper ``_strip`` back to ``_strip_tool_xml`` since the helper is restored. Also retargeted ``test_pr5624_regressions.py``'s routes-layer strip tests to ``_strip_tool_xml`` for consistency with the restored helper. Tests: pytest studio/backend/tests/test_safetensors_tool_loop.py studio/backend/tests/test_safetensors_capability_advertise.py studio/backend/tests/test_pr5624_regressions.py -q -> 170 passed in 1.93s pytest studio/backend/tests/ -q -k 'not gpu and not llama_cpp_integration' -> 2034 passed, 15 failed (pre-existing on the 5620 base; same set as before the merge: test_training_worker_flash_attn, test_desktop_auth, test_studio_api integration shims).
Summary
MLX CI on Mac M1andBackend CIhave been red on every push tomainfor days. The underlying code is fine. Three test files have stale anchors / assertions left behind by recently-landed PRs.This PR fixes all three so PR-time CI on every open PR (including #5743) goes back to green without rebasing past these regressions.
1. MLX CI —
tests/studio/run_real_mlx_smoke.py:393PR #5537 bumped
max_stepsfrom 7 to 30 to make the convergence gate seed-robust but left the assertion at== 7. Withlogging_steps = 1, the callback fires once per step, so 30 entries, not 7. Failure since 2026-05-18:Fix: track
config.max_stepsso the gate auto-follows future bumps.2. Backend CI / Repo tests (CPU) —
tests/studio/test_composer_rtl_bidi_attribute.py:29PR #5775 changed
studio/frontend/src/components/assistant-ui/thread.tsx:537from the literalto a JSX ternary
but the test still anchors on the old literal form, so
src.find(...)returns-1:Fix: anchor on the inner string literal
"Message input".3. Backend CI / Python 3.10-3.13 —
studio/backend/tests/test_desktop_auth.py:487test_provision_desktop_auth_writes_secret_and_creates_db_without_backend_depsinstalls abuiltins.__import__guard that blocks("auth", "fastapi", "structlog", "utils")to prove the CLI does not pull in Studio backend deps. The guard is too broad: it matches the relativefrom .utils import echoinsidetyper._click.decorators(typer 0.25+), which calls__import__("utils", ..., level=1):Fix: gate the block on
level == 0so only absolute imports ofauth/fastapi/structlog/utilsget rejected; relative imports inside third-party packages pass through. This preserves the test's intent (no Studio backend deps) without misclassifying everyfrom .utils import xeverywhere onsys.path.Verification
The MLX one cannot be exercised on Linux but the change is mechanical (
7->config.max_steps); it will be validated byMLX CI on Mac M1on this PR.Test plan
MLX CI on Mac M1goes green on this PRBackend CI(Repo tests CPU + all Python matrix variants) goes green