Skip to content

fix: 7 codex-discovered issues (#118-#124)#126

Merged
raullenchai merged 3 commits intomainfrom
fix/codex-discovered-issues
Apr 16, 2026
Merged

fix: 7 codex-discovered issues (#118-#124)#126
raullenchai merged 3 commits intomainfrom
fix/codex-discovered-issues

Conversation

@raullenchai
Copy link
Copy Markdown
Owner

Summary

Batch fix for 7 pre-existing bugs surfaced by codex CLI review during
the doctor harness PR (#125). Each was filed as a separate issue; fixed
together because they're all small, independent, and low-risk.

Changes

Issue What File
#118 Wheel missing agent profile YAMLs pyproject.toml
#119 'all' extra self-references rapid-mlx pyproject.toml
#120 Chat template fallback drops tools utils/chat_template.py
#121 MTP misses nested text_config utils/tokenizer.py
#122 enable_thinking dropped on tool retry utils/chat_template.py
#123 Gemma 4 OOM during multimodal shard load models/gemma4_text.py
#124 vllm-mlx-chat without gradio gradio_app.py

Codex review trail

  • Round 1: found P1 — MTP detect worked but inject still saw 0 from nested config. Fixed by surfacing num_nextn_predict_layers to top-level before calling injector.
  • Round 2: clean.
  • Round 3 (full diff): flagged 2 pre-existing issues not in this PR scope.

Test plan

Closes #118, #119, #120, #121, #122, #123, #124.

🤖 Generated with Claude Code

Your Name and others added 3 commits April 16, 2026 09:40
Batch fix for pre-existing bugs surfaced by codex CLI review during
the doctor harness PR. Each was filed as a separate issue; fixing
them together because they're all small, independent, and low-risk.

#118 — Wheel missing agent profile YAMLs
  pyproject.toml package-data now includes agents/profiles/*.yaml.
  Without this, pip-installed wheels had no built-in agent profiles
  and `rapid-mlx agents` showed nothing.

#119 — 'all' extra self-references rapid-mlx
  Expanded the union directly instead of self-referencing
  rapid-mlx[vision,chat,embeddings]. The self-dep broke
  `pip install .[all]` and editable installs.

#120 — Chat template fallback drops tools
  The no-apply_chat_template fallback now calls
  _inject_tools_into_messages before formatting, same as the
  TypeError fallback path. Without this, models hitting the
  fallback lost all tool-calling context.

#121 — MTP misses nested text_config
  Extracted _read_num_mtp_layers() that checks both top-level
  and text_config.num_nextn_predict_layers. Used by both
  _try_inject_mtp (strict=False path) and _try_inject_mtp_post_load.
  Without this, VLM+MTP checkpoints silently lost their MTP head.

#122 — enable_thinking dropped during tool fallback retry
  Step 1 of the TypeError cascade pops enable_thinking to test if
  the template rejects it. Step 2 pops tools. But enable_thinking
  was never restored for the tool-injection retry, so thinking-
  capable models (Qwen, DeepSeek) silently lost that feature on
  tool-enabled requests. Now restores enable_thinking before the
  step-2 retry, with a final fallback that strips both if the
  template rejects everything.

  Updated existing test + added new test for the "only tools
  rejected" happy path (enable_thinking survives).

#123 — Gemma 4 OOM during multimodal shard load
  Sanitize per-shard instead of loading the entire multimodal
  checkpoint then sanitizing. On Gemma 4, the vision/audio shards
  can be >2× the language-model weight, so transient memory was
  way higher than needed for text-only mode.

#124 — vllm-mlx-chat published without gradio
  Already had a try/except, but the error message was terse.
  Expanded to explain the [chat] extra requirement clearly.
  Changed from `raise SystemExit(1)` to `sys.exit(1)`.

All 2075 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
inject_mtp_support() reads config["num_nextn_predict_layers"]
directly.  For VLM checkpoints where that field lives under
text_config, the previous fix (#121) correctly *detected* the
nested layout but still passed the original nested config — the
injector saw 0 and silently bailed.

Now surfaces num_nextn_predict_layers to the top level of the
config dict before calling inject_mtp_support, so the injector
gets the right value regardless of nesting depth.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion

Two pre-existing bugs found by codex review round 3 on PR #126:

1. mlx_lm.load() success path skipped _try_inject_mtp_post_load(),
   so MTP-capable models whose mtp.* weights get sanitize()'d by
   mlx-lm came back without MTP heads. Only the strict=False
   fallback path ran the post-load check. Now both paths call it.

2. `chunk.token` truthiness check dropped token ID 0 (pad token on
   some tokenizers). Changed to `chunk.token is not None` so token
   ID 0 is preserved in GenerationOutput.tokens.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Wheel install missing agent profile YAMLs (package-data only ships aliases.json)

1 participant