Skip to content

tests: follow MoE merge wrapper delegation in drift detector#653

Merged
danielhanchen merged 1 commit into
mainfrom
fix-moe-expert-merges-drift-detector-wrappers
May 15, 2026
Merged

tests: follow MoE merge wrapper delegation in drift detector#653
danielhanchen merged 1 commit into
mainfrom
fix-moe-expert-merges-drift-detector-wrappers

Conversation

@danielhanchen
Copy link
Copy Markdown
Member

Summary

  • After saving: layout-aware MoE LoRA merge + loud-fail on fallback (#5410) #647 the gate/up MoE merge entry points are thin wrappers that delegate to _merge_moe_gate_or_up_expert; the drift detector in test_moe_expert_merges_call_active_merge_device only inspected the wrapper source and so reported a false-positive regression.
  • Walk one level of delegation through sibling _merge_moe_* helpers via a small regex-driven source-union helper, so the assertion still requires _active_merge_device() and forbids hardcoded .to('cuda', ...) across the dispatch surface that actually does the merge.

Test plan

  • pytest tests/test_upstream_pinned_symbols_accelerator.py::test_moe_expert_merges_call_active_merge_device -xvs
  • Core matrix on dependent PRs (unsloth#5430, #5432, #5433, #5434)

After zoo#647 the gate/up MoE merge entry points became thin wrappers
delegating to _merge_moe_gate_or_up_expert. The drift detector test
only inspected the wrapper source and so reported a false-positive
regression (no .to('cuda') call, no _active_merge_device call inside
the wrapper body itself).

Update test_moe_expert_merges_call_active_merge_device to follow one
level of delegation through sibling _merge_moe_* helpers via a small
regex-walked source-union helper, so the union still requires
_active_merge_device() and forbids hardcoded .to('cuda').

Local: pytest -k test_moe_expert_merges_call_active_merge_device
passes (1 passed in 0.26s).
@danielhanchen danielhanchen merged commit 57bbdc0 into main May 15, 2026
14 checks passed
@danielhanchen danielhanchen deleted the fix-moe-expert-merges-drift-detector-wrappers branch May 15, 2026 13:50
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 87138e24fb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

)
src = inspect.getsource(fn)
src = _effective_source(name, set())
assert "_active_merge_device(" in src, (
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep the active-device check on the entry point

Because _effective_source() concatenates every sibling helper before this assertion, any target can now satisfy the check via a callee even if the entry point itself starts doing its own device move with DEVICE_TYPE_TORCH or another non-helper device before delegating. That reopens the exact drift the test documents for non-thin helpers or wrappers that grow logic: the union still contains _active_merge_device( from _merge_moe_gate_or_up_expert, so the regression is missed unless it is a literal .to('cuda').

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the MoE expert merge tests to support code delegation by introducing a recursive source code inspection utility. This ensures that both entry points and their delegated helpers are correctly validated for device-agnostic implementations. A review comment suggested improving the robustness of the regular expression used to identify helper calls by allowing for optional whitespace.

"_merge_moe_fused_gate_up_expert",
"_merge_moe_fused_down_proj_expert",
]
_helper_call_re = re.compile(r"\b(_merge_moe_[A-Za-z0-9_]+)\(")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regex for detecting helper calls is slightly fragile as it doesn't account for optional whitespace between the function name and the opening parenthesis. While the current codebase follows a strict style, making the regex more robust ensures the test doesn't fail on valid Python syntax variations (e.g., _merge_moe_helper (...)). This minor improvement is preferred as it increases robustness without requiring a large-scale refactor.

Suggested change
_helper_call_re = re.compile(r"\b(_merge_moe_[A-Za-z0-9_]+)\(")
_helper_call_re = re.compile(r"\b(_merge_moe_[A-Za-z0-9_]+)\s*\(")
References
  1. The codebase follows a strict style regarding whitespace in function calls. (link)
  2. Fragile string-matching is acceptable for code patching only if it maintains consistency and avoids large-scale refactors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant