Skip to content

tests: unblock three stale assertions broken on main (MLX CI + Backend CI)#5803

Merged
danielhanchen merged 2 commits into
mainfrom
ci/unblock-main-three-stale-tests
May 27, 2026
Merged

tests: unblock three stale assertions broken on main (MLX CI + Backend CI)#5803
danielhanchen merged 2 commits into
mainfrom
ci/unblock-main-three-stale-tests

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Summary

MLX CI on Mac M1 and Backend CI have been red on every push to main for days. The underlying code is fine. Three test files have stale anchors / assertions left behind by recently-landed PRs.

This PR fixes all three so PR-time CI on every open PR (including #5743) goes back to green without rebasing past these regressions.

1. MLX CI — tests/studio/run_real_mlx_smoke.py:393

PR #5537 bumped max_steps from 7 to 30 to make the convergence gate seed-robust but left the assertion at == 7. With logging_steps = 1, the callback fires once per step, so 30 entries, not 7. Failure since 2026-05-18:

AssertionError: expected 7 logged steps, got
  [10.5475, 5.4663, 3.1851, 1.6394, ..., 0.0]   (30 entries)

Fix: track config.max_steps so the gate auto-follows future bumps.

2. Backend CI / Repo tests (CPU) — tests/studio/test_composer_rtl_bidi_attribute.py:29

PR #5775 changed studio/frontend/src/components/assistant-ui/thread.tsx:537 from the literal

aria-label="Message input"

to a JSX ternary

aria-label={overlay ? "Image edit instructions" : "Message input"}

but the test still anchors on the old literal form, so src.find(...) returns -1:

AssertionError: anchor 'aria-label="Message input"' not found
assert -1 != -1

Fix: anchor on the inner string literal "Message input".

3. Backend CI / Python 3.10-3.13 — studio/backend/tests/test_desktop_auth.py:487

test_provision_desktop_auth_writes_secret_and_creates_db_without_backend_deps installs a builtins.__import__ guard that blocks ("auth", "fastapi", "structlog", "utils") to prove the CLI does not pull in Studio backend deps. The guard is too broad: it matches the relative from .utils import echo inside typer._click.decorators (typer 0.25+), which calls __import__("utils", ..., level=1):

File ".../typer/_click/decorators.py", line 5, in <module>
File "<string>", line 13, in guarded_import
ModuleNotFoundError: utils

Fix: gate the block on level == 0 so only absolute imports of auth / fastapi / structlog / utils get rejected; relative imports inside third-party packages pass through. This preserves the test's intent (no Studio backend deps) without misclassifying every from .utils import x everywhere on sys.path.

Verification

$ pytest tests/studio/test_composer_rtl_bidi_attribute.py -v
12 passed in 0.03s

$ pytest studio/backend/tests/test_desktop_auth.py::test_provision_desktop_auth_writes_secret_and_creates_db_without_backend_deps -v
1 passed in 0.69s

The MLX one cannot be exercised on Linux but the change is mechanical (7 -> config.max_steps); it will be validated by MLX CI on Mac M1 on this PR.

Test plan

  • MLX CI on Mac M1 goes green on this PR
  • Backend CI (Repo tests CPU + all Python matrix variants) goes green
  • No new regressions in any other workflow

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

@danielhanchen danielhanchen force-pushed the ci/unblock-main-three-stale-tests branch from bbe2d0e to 1500b14 Compare May 27, 2026 05:42

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces test fixes and improvements across several test files. In test_desktop_auth.py, the guarded_import helper was updated to only block absolute imports (where level == 0), preventing relative imports within third-party packages from being incorrectly blocked. In run_real_mlx_smoke.py, the hardcoded assertion for logged steps was replaced with a dynamic check against config.max_steps. Lastly, in test_composer_rtl_bidi_attribute.py, the search anchor for the main composer test was updated to target the inner string literal "Message input" instead of the full JSX attribute. No review comments were provided for these changes.

danielhanchen and others added 2 commits May 27, 2026 06:27
MLX CI on Mac M1 + Backend CI (both Repo tests CPU and Python 3.10/11/12/13)
have been red on every push to main for days. None of the underlying code
is wrong; three test files have stale anchors / assertions left behind by
PR #5537 (max_steps bump) and PR #5775 (composer + provision-desktop-auth).

1. tests/studio/run_real_mlx_smoke.py:393
   PR #5537 bumped max_steps from 7 to 30 for seed-robust convergence but
   left `assert len(losses_per_step) == 7`. With logging_steps=1 the
   callback fires once per step; 30 entries, not 7. Track config.max_steps
   so the gate auto-follows future bumps.

2. tests/studio/test_composer_rtl_bidi_attribute.py:29
   PR #5775 changed the composer aria-label from the literal
   `aria-label="Message input"` to a JSX ternary
   `aria-label={overlay ? "Image edit instructions" : "Message input"}`.
   Anchor on the inner string literal `"Message input"` instead.

3. studio/backend/tests/test_desktop_auth.py:487
   The guarded_import in test_provision_desktop_auth_writes_secret_and_creates_db_without_backend_deps
   blocks any import whose name == "utils", including the relative
   `from .utils import echo` inside typer._click.decorators (typer 0.25+).
   Gate the block on level == 0 so only absolute imports of `utils` /
   `auth` / `fastapi` / `structlog` are rejected; relative imports
   inside third-party packages pass through.

All three tests pass locally; the MLX one is a mechanical 7->config.max_steps
swap and will be exercised by MLX CI on this PR.
@danielhanchen danielhanchen force-pushed the ci/unblock-main-three-stale-tests branch from 5b6bc15 to 69f37f8 Compare May 27, 2026 06:28
@danielhanchen danielhanchen merged commit 02ea6c9 into main May 27, 2026
33 checks passed
@danielhanchen danielhanchen deleted the ci/unblock-main-three-stale-tests branch May 27, 2026 07:30
rhsCZ pushed a commit to rhsCZ/unsloth that referenced this pull request May 27, 2026
…k-glm-kimi

Resolves three conflicts against the updated 5620 base (which itself
merged main after main moved on with PRs unslothai#5735 / unslothai#5775 / unslothai#5803 etc.
touching the same routes/inference.py and tool_call_parser.py surface):

* studio/backend/core/inference/tool_call_parser.py
  ``_TOOL_CLOSED_PATS``: kept 5624's full set (Mistral pre-v11 array,
  Mistral v11+ name{json}, DeepSeek envelope, Kimi section) on top of
  the 3-pattern base. New 5620 base reverted to the 3 base patterns
  because main never carried the tool-format extensions.

* studio/backend/routes/inference.py
  Merged the two regex bodies: kept 5624's elaborate python_tag
  ``(?:[^<]|<(?!\|))*`` clause and the new-family closed-pair
  patterns (DeepSeek envelope, Kimi section). DROPPED the inlined
  Mistral patterns in favour of the base's ``_strip_tool_xml`` helper
  which delegates Mistral handling to the parser module's
  ``_strip_mistral_closed_calls`` -- the non-greedy ``\{.*?\}`` form
  truncates at the first ``}`` of a nested JSON arg, so balanced
  brace/bracket scanning is correct here. Also kept the base's
  orphan-close and tail-only ``</parameter>`` patterns from the
  speculative-buffer split boundary work. Net: 9 call sites continue
  to use ``_strip_tool_xml(...)`` (Mistral-safe).

* studio/backend/tests/test_safetensors_tool_loop.py
  ``TestRoutesPythonTagStrip``: kept the base's wording for the
  section header (the two were near-identical) and switched the helper
  ``_strip`` back to ``_strip_tool_xml`` since the helper is restored.

Also retargeted ``test_pr5624_regressions.py``'s routes-layer strip
tests to ``_strip_tool_xml`` for consistency with the restored helper.

Tests:
  pytest studio/backend/tests/test_safetensors_tool_loop.py
         studio/backend/tests/test_safetensors_capability_advertise.py
         studio/backend/tests/test_pr5624_regressions.py -q
  -> 170 passed in 1.93s

  pytest studio/backend/tests/ -q -k 'not gpu and not llama_cpp_integration'
  -> 2034 passed, 15 failed (pre-existing on the 5620 base; same
     set as before the merge: test_training_worker_flash_attn,
     test_desktop_auth, test_studio_api integration shims).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant