fix: 7 codex-discovered issues (#118-#124)#126
Merged
raullenchai merged 3 commits intomainfrom Apr 16, 2026
Merged
Conversation
Batch fix for pre-existing bugs surfaced by codex CLI review during the doctor harness PR. Each was filed as a separate issue; fixing them together because they're all small, independent, and low-risk. #118 — Wheel missing agent profile YAMLs pyproject.toml package-data now includes agents/profiles/*.yaml. Without this, pip-installed wheels had no built-in agent profiles and `rapid-mlx agents` showed nothing. #119 — 'all' extra self-references rapid-mlx Expanded the union directly instead of self-referencing rapid-mlx[vision,chat,embeddings]. The self-dep broke `pip install .[all]` and editable installs. #120 — Chat template fallback drops tools The no-apply_chat_template fallback now calls _inject_tools_into_messages before formatting, same as the TypeError fallback path. Without this, models hitting the fallback lost all tool-calling context. #121 — MTP misses nested text_config Extracted _read_num_mtp_layers() that checks both top-level and text_config.num_nextn_predict_layers. Used by both _try_inject_mtp (strict=False path) and _try_inject_mtp_post_load. Without this, VLM+MTP checkpoints silently lost their MTP head. #122 — enable_thinking dropped during tool fallback retry Step 1 of the TypeError cascade pops enable_thinking to test if the template rejects it. Step 2 pops tools. But enable_thinking was never restored for the tool-injection retry, so thinking- capable models (Qwen, DeepSeek) silently lost that feature on tool-enabled requests. Now restores enable_thinking before the step-2 retry, with a final fallback that strips both if the template rejects everything. Updated existing test + added new test for the "only tools rejected" happy path (enable_thinking survives). #123 — Gemma 4 OOM during multimodal shard load Sanitize per-shard instead of loading the entire multimodal checkpoint then sanitizing. On Gemma 4, the vision/audio shards can be >2× the language-model weight, so transient memory was way higher than needed for text-only mode. #124 — vllm-mlx-chat published without gradio Already had a try/except, but the error message was terse. Expanded to explain the [chat] extra requirement clearly. Changed from `raise SystemExit(1)` to `sys.exit(1)`. All 2075 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
inject_mtp_support() reads config["num_nextn_predict_layers"] directly. For VLM checkpoints where that field lives under text_config, the previous fix (#121) correctly *detected* the nested layout but still passed the original nested config — the injector saw 0 and silently bailed. Now surfaces num_nextn_predict_layers to the top level of the config dict before calling inject_mtp_support, so the injector gets the right value regardless of nesting depth. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion Two pre-existing bugs found by codex review round 3 on PR #126: 1. mlx_lm.load() success path skipped _try_inject_mtp_post_load(), so MTP-capable models whose mtp.* weights get sanitize()'d by mlx-lm came back without MTP heads. Only the strict=False fallback path ran the post-load check. Now both paths call it. 2. `chunk.token` truthiness check dropped token ID 0 (pad token on some tokenizers). Changed to `chunk.token is not None` so token ID 0 is preserved in GenerationOutput.tokens. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Batch fix for 7 pre-existing bugs surfaced by codex CLI review during
the doctor harness PR (#125). Each was filed as a separate issue; fixed
together because they're all small, independent, and low-risk.
Changes
Codex review trail
num_nextn_predict_layersto top-level before calling injector.Test plan
Closes #118, #119, #120, #121, #122, #123, #124.
🤖 Generated with Claude Code