PR #697 Studio probe on macos-14 (do not merge)#157
Conversation
Companion to the pr697-cross-os branch (which exercises the 23 author tests via the torch shim). This branch drives Unsloth Studio on Apple Silicon: 1. install.sh --local --no-torch (Studio install + venv). 2. Force-reinstall unsloth-zoo from PR unslothai#697 head (Lyxot:fix/mlx-save-gguf-export-parity). 3. Install mlx / mlx-lm / mlx-vlm wheels (skipped by --no-torch). 4. Run tests/pr697/probe_real_mlx.py against REAL Apple Silicon mlx kernels -- 17 probes covering subpackage imports, every PR-697 helper, and the contract of each of the 11 fixes. 5. Boot Studio (UNSLOTH_API_ONLY=1) and confirm /api/health stays healthy with the PR-697 unsloth-zoo overlaid -- no boot regression. Sweep all 26 existing staging-2 workflows to keep the run focused. studio_test_kit/ vendored from workspace root for future Playwright UI walkthroughs (not exercised in this iteration to fit the macos-14 budget). Throwaway branch. Do not merge.
There was a problem hiding this comment.
Code Review
This pull request introduces the studio_test_kit package, which provides a comprehensive suite of Playwright-based tools and examples for driving Unsloth Studio end-to-end. The review feedback focuses on improving cross-platform compatibility (specifically removing setsid to support macOS), enhancing CI robustness (using force checkout and streaming long-running installation logs to prevent memory buffering), and improving error diagnostics by capturing and printing stderr on subprocess and network failures.
| cmd = ["setsid", "-f", "bash", "-c", | ||
| f'{shlex.quote(bin_path)} studio -p {port} ' | ||
| f'2>&1 | tee -a {shlex.quote(str(log_path))}'] |
There was a problem hiding this comment.
The setsid command-line utility is not available by default on macOS, which will cause launch_studio to fail with a FileNotFoundError on macOS systems. Since subprocess.Popen is already called with start_new_session=True (which internally invokes the setsid() system call to start the process in a new session and process group), prefixing the command with setsid -f is redundant and breaks macOS compatibility. Removing setsid -f ensures the tool runs successfully on macOS.
| cmd = ["setsid", "-f", "bash", "-c", | |
| f'{shlex.quote(bin_path)} studio -p {port} ' | |
| f'2>&1 | tee -a {shlex.quote(str(log_path))}'] | |
| cmd = ["bash", "-c", | |
| f'{shlex.quote(bin_path)} studio -p {port} ' | |
| f'2>&1 | tee -a {shlex.quote(str(log_path))}'] |
| def _run(cmd: str | list[str], cwd: Optional[Path] = None, env: Optional[dict] = None, | ||
| check: bool = True, timeout: Optional[int] = None) -> subprocess.CompletedProcess: | ||
| if isinstance(cmd, str): | ||
| cmd_list = shlex.split(cmd) | ||
| else: | ||
| cmd_list = cmd | ||
| full_env = {**os.environ, **(env or {})} | ||
| return subprocess.run( | ||
| cmd_list, cwd=cwd, env=full_env, check=check, timeout=timeout, | ||
| text=True, capture_output=True, | ||
| ) |
There was a problem hiding this comment.
Running subprocess.run with capture_output=True for long-running commands like install_studio (which can take up to 30 minutes) buffers all stdout and stderr in memory. This not only consumes unnecessary memory but also prevents the installation progress from being streamed to the console/CI logs, making it look like the process is hung and extremely difficult to debug if it fails. Making capture_output configurable allows us to disable it for the installation command.
| def _run(cmd: str | list[str], cwd: Optional[Path] = None, env: Optional[dict] = None, | |
| check: bool = True, timeout: Optional[int] = None) -> subprocess.CompletedProcess: | |
| if isinstance(cmd, str): | |
| cmd_list = shlex.split(cmd) | |
| else: | |
| cmd_list = cmd | |
| full_env = {**os.environ, **(env or {})} | |
| return subprocess.run( | |
| cmd_list, cwd=cwd, env=full_env, check=check, timeout=timeout, | |
| text=True, capture_output=True, | |
| ) | |
| def _run(cmd: str | list[str], cwd: Optional[Path] = None, env: Optional[dict] = None, | |
| check: bool = True, timeout: Optional[int] = None, capture_output: bool = True) -> subprocess.CompletedProcess: | |
| if isinstance(cmd, str): | |
| cmd_list = shlex.split(cmd) | |
| else: | |
| cmd_list = cmd | |
| full_env = {**os.environ, **(env or {})} | |
| return subprocess.run( | |
| cmd_list, cwd=cwd, env=full_env, check=check, timeout=timeout, | |
| text=True, capture_output=capture_output, | |
| ) |
| _run( | ||
| ["bash", str(install_sh), "--local"], | ||
| cwd=repo, | ||
| env={"UNSLOTH_STUDIO_HOME": str(home)}, | ||
| timeout=60 * 30, | ||
| ) |
There was a problem hiding this comment.
Pass capture_output=False to _run so that the installation progress is streamed directly to the console/CI logs instead of being buffered in memory for up to 30 minutes.
| _run( | |
| ["bash", str(install_sh), "--local"], | |
| cwd=repo, | |
| env={"UNSLOTH_STUDIO_HOME": str(home)}, | |
| timeout=60 * 30, | |
| ) | |
| _run( | |
| ["bash", str(install_sh), "--local"], | |
| cwd=repo, | |
| env={"UNSLOTH_STUDIO_HOME": str(home)}, | |
| timeout=60 * 30, | |
| capture_output=False, | |
| ) |
|
|
||
| if reuse_clone and (repo / ".git").exists(): | ||
| _run(["git", "fetch", "origin", branch], cwd=repo) | ||
| _run(["git", "checkout", branch], cwd=repo) |
There was a problem hiding this comment.
In automated environments or CI pipelines, git checkout can fail if there are untracked files or local modifications that would be overwritten. Using the --force (-f) flag ensures the checkout succeeds reliably.
| _run(["git", "checkout", branch], cwd=repo) | |
| _run(["git", "checkout", "-f", branch], cwd=repo) |
| "-c:v", "libx264", "-pix_fmt", "yuv420p", "-crf", "23", | ||
| str(out_mp4), | ||
| ] | ||
| subprocess.run(cmd, check=True, capture_output=True, text=True) |
There was a problem hiding this comment.
When ffmpeg fails, subprocess.run raises a CalledProcessError. Because capture_output=True is set, the actual error output from ffmpeg is hidden from the console/CI logs, making it difficult to diagnose transcoding issues. Catching CalledProcessError and printing e.stderr before re-raising provides much better visibility.
| subprocess.run(cmd, check=True, capture_output=True, text=True) | |
| try: | |
| subprocess.run(cmd, check=True, capture_output=True, text=True) | |
| except subprocess.CalledProcessError as e: | |
| import sys | |
| print(f"ffmpeg hstack failed: {e.stderr}", file=sys.stderr) | |
| raise |
| subprocess.run( | ||
| ["ffmpeg", "-y", "-i", str(src_webm), | ||
| "-c:v", "libx264", "-pix_fmt", "yuv420p", "-crf", "23", str(out_mp4)], | ||
| check=True, capture_output=True, text=True, | ||
| ) |
There was a problem hiding this comment.
Catch CalledProcessError and print e.stderr to make troubleshooting ffmpeg failures in CI easier.
| subprocess.run( | |
| ["ffmpeg", "-y", "-i", str(src_webm), | |
| "-c:v", "libx264", "-pix_fmt", "yuv420p", "-crf", "23", str(out_mp4)], | |
| check=True, capture_output=True, text=True, | |
| ) | |
| try: | |
| subprocess.run( | |
| ["ffmpeg", "-y", "-i", str(src_webm), | |
| "-c:v", "libx264", "-pix_fmt", "yuv420p", "-crf", "23", str(out_mp4)], | |
| check=True, capture_output=True, text=True, | |
| ) | |
| except subprocess.CalledProcessError as e: | |
| import sys | |
| print(f"ffmpeg transcode failed: {e.stderr}", file=sys.stderr) | |
| raise |
| r.raise_for_status() | ||
| b = r.json() |
There was a problem hiding this comment.
If the authentication request fails (e.g., due to invalid credentials), r.raise_for_status() will raise an httpx.HTTPStatusError. Catching this error and including the response body (which often contains the error details like {"detail": "..."}) makes debugging authentication failures much easier.
try:
r.raise_for_status()
except httpx.HTTPStatusError as e:
raise RuntimeError(f"Login failed ({e.response.status_code}): {e.response.text}") from e
b = r.json()…studio install.sh creates the venv directly at $UNSLOTH_STUDIO_HOME/unsloth_studio/ (not the .venv_* pattern from older docs). Confirmed via install log on the prior run. Also expose $UNSLOTH_STUDIO_HOME/bin on PATH so the unsloth CLI shim is visible to subsequent steps.
Companion to staging-2#156 (cross-OS shim path). This run drives Unsloth Studio on real Apple Silicon macos-14 with unslothai/unsloth-zoo#697 overlaid on top of the install -- the highest-fidelity validation we can get without local Mac hardware.
What this CI does
bash install.sh --local --no-torch-- canonical Studio install on macOS.unsloth-zoofrom PR docker images unslothai/unsloth#697 head (Lyxot:fix/mlx-save-gguf-export-parity).pip install mlx mlx-lm mlx-vlm-- skipped by--no-torch, needed for real-MLX probes.tests/pr697/probe_real_mlx.py-- 17 probes exercising every PR-697 helper against real Apple Silicon mlx kernels (not the torch shim).UNSLOTH_API_ONLY=1) and confirm/api/healthstays healthy with PR-697 overlaid.Why this complements staging-2#156
/api/healthThrowaway branch. Do not merge.