fix: 7 codex-discovered issues (#118-#124) by raullenchai · Pull Request #126 · raullenchai/Rapid-MLX

raullenchai · 2026-04-16T16:46:57Z

Summary

Batch fix for 7 pre-existing bugs surfaced by codex CLI review during
the doctor harness PR (#125). Each was filed as a separate issue; fixed
together because they're all small, independent, and low-risk.

Changes

Issue	What	File
#118	Wheel missing agent profile YAMLs	pyproject.toml
#119	'all' extra self-references rapid-mlx	pyproject.toml
#120	Chat template fallback drops tools	utils/chat_template.py
#121	MTP misses nested text_config	utils/tokenizer.py
#122	enable_thinking dropped on tool retry	utils/chat_template.py
#123	Gemma 4 OOM during multimodal shard load	models/gemma4_text.py
#124	vllm-mlx-chat without gradio	gradio_app.py

Codex review trail

Round 1: found P1 — MTP detect worked but inject still saw 0 from nested config. Fixed by surfacing num_nextn_predict_layers to top-level before calling injector.
Round 2: clean.
Round 3 (full diff): flagged 2 pre-existing issues not in this PR scope.

Test plan

2075 tests pass (+1 new test for chat_template: enable_thinking dropped during tool fallback retry #122's enable_thinking preservation)
Existing test_fallback_strips_tools_when_both_fail updated to match new 4-step retry cascade
New test_fallback_keeps_enable_thinking_when_only_tools_rejected verifies the happy path

Closes #118, #119, #120, #121, #122, #123, #124.

🤖 Generated with Claude Code

Batch fix for pre-existing bugs surfaced by codex CLI review during the doctor harness PR. Each was filed as a separate issue; fixing them together because they're all small, independent, and low-risk. #118 — Wheel missing agent profile YAMLs pyproject.toml package-data now includes agents/profiles/*.yaml. Without this, pip-installed wheels had no built-in agent profiles and `rapid-mlx agents` showed nothing. #119 — 'all' extra self-references rapid-mlx Expanded the union directly instead of self-referencing rapid-mlx[vision,chat,embeddings]. The self-dep broke `pip install .[all]` and editable installs. #120 — Chat template fallback drops tools The no-apply_chat_template fallback now calls _inject_tools_into_messages before formatting, same as the TypeError fallback path. Without this, models hitting the fallback lost all tool-calling context. #121 — MTP misses nested text_config Extracted _read_num_mtp_layers() that checks both top-level and text_config.num_nextn_predict_layers. Used by both _try_inject_mtp (strict=False path) and _try_inject_mtp_post_load. Without this, VLM+MTP checkpoints silently lost their MTP head. #122 — enable_thinking dropped during tool fallback retry Step 1 of the TypeError cascade pops enable_thinking to test if the template rejects it. Step 2 pops tools. But enable_thinking was never restored for the tool-injection retry, so thinking- capable models (Qwen, DeepSeek) silently lost that feature on tool-enabled requests. Now restores enable_thinking before the step-2 retry, with a final fallback that strips both if the template rejects everything. Updated existing test + added new test for the "only tools rejected" happy path (enable_thinking survives). #123 — Gemma 4 OOM during multimodal shard load Sanitize per-shard instead of loading the entire multimodal checkpoint then sanitizing. On Gemma 4, the vision/audio shards can be >2× the language-model weight, so transient memory was way higher than needed for text-only mode. #124 — vllm-mlx-chat published without gradio Already had a try/except, but the error message was terse. Expanded to explain the [chat] extra requirement clearly. Changed from `raise SystemExit(1)` to `sys.exit(1)`. All 2075 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

inject_mtp_support() reads config["num_nextn_predict_layers"] directly. For VLM checkpoints where that field lives under text_config, the previous fix (#121) correctly *detected* the nested layout but still passed the original nested config — the injector saw 0 and silently bailed. Now surfaces num_nextn_predict_layers to the top level of the config dict before calling inject_mtp_support, so the injector gets the right value regardless of nesting depth. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…tion Two pre-existing bugs found by codex review round 3 on PR #126: 1. mlx_lm.load() success path skipped _try_inject_mtp_post_load(), so MTP-capable models whose mtp.* weights get sanitize()'d by mlx-lm came back without MTP heads. Only the strict=False fallback path ran the post-load check. Now both paths call it. 2. `chunk.token` truthiness check dropped token ID 0 (pad token on some tokenizers). Changed to `chunk.token is not None` so token ID 0 is preserved in GenerationOutput.tokens. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Your Name and others added 3 commits April 16, 2026 09:40

raullenchai merged commit 061f086 into main Apr 16, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: 7 codex-discovered issues (#118-#124)#126

fix: 7 codex-discovered issues (#118-#124)#126
raullenchai merged 3 commits intomainfrom
fix/codex-discovered-issues

raullenchai commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

raullenchai commented Apr 16, 2026

Summary

Changes

Codex review trail

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant