Skip to content

additional import try except handling for mlx#654

Merged
mmathew23 merged 1 commit into
unslothai:mainfrom
mmathew23:fix/mlximport
May 15, 2026
Merged

additional import try except handling for mlx#654
mmathew23 merged 1 commit into
unslothai:mainfrom
mmathew23:fix/mlximport

Conversation

@mmathew23

Copy link
Copy Markdown
Collaborator

Summary

Fix ModuleNotFoundError: No module named 'mlx' raised inside any torch.compile
/ inductor compile path on non-Apple hosts. The bug surfaces as
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: ModuleNotFoundError: No module named 'mlx' and breaks training for any model
whose attention implementation triggers an inductor compile (Qwen3 FFT through
the compiled-cache shim, Gemma3 default through flex_attention, etc.) on
machines where mlx is not installed (every Linux/CUDA host).

Root cause

unsloth_zoo/__init__.py registers six legacy-name shim modules in
sys.modules at import time, one per unsloth_zoo.mlx_* legacy name:

for _old_name in _LazyMLXAlias._LEGACY_TO_NEW:
    sys.modules[_old_name] = _LazyMLXAlias(_old_name)

Each shim's __getattr__ lazy-imports the real unsloth_zoo.mlx.* submodule
on first attribute access, which transitively does import mlx.core as mx at
the top of unsloth_zoo/mlx/compile.py. The intent is to support Apple
Silicon scripts that still use the flat module names without paying the mlx
import cost on every host.

The shim filters dunder attribute probes
(__file__, __path__, inspect.getmodule, hasattr(m, '__name__'), etc.)
so torch's own sys.modules walks during init never trigger the lazy import.
However, pickle.whichmodule() does not restrict itself to dunder
attributes. From CPython:

def whichmodule(obj, name):
    # name is the qualname of a function/class being pickled, e.g.
    # "Qwen3Attention_fast_forward" or some inductor-generated symbol.
    for module_name, module in sys.modules.copy().items():
        if module_name == '__main__' or module is None: continue
        try:
            if _getattribute(module, name)[0] is obj:
                return module_name
        except (AttributeError, KeyError):
            pass

It iterates every entry in sys.modules and does
getattr(module, name) with a real, non-dunder attribute name. When it hits
one of our six shims, the dunder filter does not catch the call, so
_resolve() runs, the real mlx.compile module is imported, and
import mlx.core raises ModuleNotFoundError.

torch._inductor.codecache.compiled_fx_graph_hash calls pickler.dumps(obj)
on the FX graph being compiled, which in turn invokes pickle.whichmodule()
for any function or class node. Result: every inductor-bound training path
crashes during graph hash on Linux.

Fix

One try/except in _LazyMLXAlias.__getattr__. If the real submodule import
fails because mlx is not installed, surface the failure as an
AttributeError so pickle.whichmodule's
except (AttributeError, KeyError) handler catches it cleanly and pickle
moves on to the next module:

def __getattr__(self, name):
    if name.startswith("__") and name.endswith("__"):
        raise AttributeError(name)
    try:
        real = self._resolve()
    except ModuleNotFoundError:
        raise AttributeError(name)
    return getattr(real, name)

Real user attribute access on a non-mlx host (e.g. from unsloth_zoo.mlx_loader import FastMLXModel) still surfaces a useful error: an AttributeError on
the legacy shim instead of a wrapped BackendCompilerFailed.

Reproduction

Minimal repro on any Linux/CUDA host without mlx installed, transformers 5.5
or 4.57.6, torch 2.10:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from unsloth import FastLanguageModel
import torch

model, tok = FastLanguageModel.from_pretrained(
    model_name="unsloth/Qwen3-0.6B",
    max_seq_length=1024,
    load_in_16bit=True,
    full_finetuning=True,
    dtype=torch.bfloat16,
)
# ... run a single SFTTrainer step. Crashes inside compile_fx_inner ->
# FxGraphCache.prepare_key -> compiled_fx_graph_hash -> pickle.dumps
# -> pickle.whichmodule -> _LazyMLXAlias.__getattr__ -> import mlx.core

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dea4e16d75

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread unsloth_zoo/__init__.py
return getattr(self._resolve(), name)
try:
real = self._resolve()
except ModuleNotFoundError:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict ModuleNotFoundError catch to missing mlx only

Catching every ModuleNotFoundError here and converting it to AttributeError masks unrelated import failures inside the real target module (for example, if another dependency is missing or a sub-import path regresses). In those cases, legacy imports like from unsloth_zoo.mlx_loader import ... now fail as a generic missing-attribute error instead of surfacing the true missing module, which makes real breakages hard to diagnose and can hide regressions on MLX-capable environments.

Useful? React with 👍 / 👎.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the _LazyMLXAlias.__getattr__ method in unsloth_zoo/__init__.py to handle cases where the mlx submodule cannot be resolved on non-Apple environments. By catching the import failure and raising an AttributeError, the code allows external tools like pickle.whichmodule to skip the stub gracefully. The reviewer suggested catching ImportError instead of ModuleNotFoundError to improve robustness against various loading failures, such as incompatible binaries or missing system dependencies.

Comment thread unsloth_zoo/__init__.py
return getattr(self._resolve(), name)
try:
real = self._resolve()
except ModuleNotFoundError:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider catching ImportError instead of ModuleNotFoundError. While ModuleNotFoundError specifically handles the case where the package is missing, ImportError is more general and will also catch cases where the package is found but fails to load (e.g., due to missing system dependencies or incompatible binaries). This provides better robustness for the goal of allowing pickle.whichmodule to skip the shim on non-supported environments. This also aligns with the error handling patterns used elsewhere in this repository.

Suggested change
except ModuleNotFoundError:
except ImportError:
References
  1. The project's general rules and existing code patterns favor using ImportError for checking the availability of optional or platform-specific dependencies.

@mmathew23 mmathew23 merged commit 7b90fec into unslothai:main May 15, 2026
11 checks passed

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request modifies the lazy module resolution in unsloth_zoo/__init__.py to catch ModuleNotFoundError during attribute access, preventing crashes in torch.compile on non-Apple platforms. The reviewer suggests catching the broader ImportError and improving the AttributeError message to include the original exception context, ensuring better clarity for users when resolution fails.

Comment thread unsloth_zoo/__init__.py
Comment on lines +164 to +175
try:
real = self._resolve()
except ModuleNotFoundError:
# mlx is Apple-only. On non-mlx hosts the real submodule import
# fails. Surface as AttributeError so callers that walk sys.modules
# (notably pickle.whichmodule, used by torch._inductor's FX graph
# hash pickler) skip this stub cleanly instead of crashing the
# whole compile. Real user attribute access on a non-mlx host
# still surfaces a useful error -- they will see the AttributeError
# rather than a torch._dynamo.exc.BackendCompilerFailed wrapper.
raise AttributeError(name)
return getattr(real, name)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

While the logic correctly addresses the torch.compile crash by surfacing an AttributeError for pickle.whichmodule, the current implementation loses significant context for human users. Raising a generic AttributeError(name) results in a message like AttributeError: FastMLXModel, which is unhelpful for a user on a non-Apple host who might be confused as to why the attribute is missing.

I suggest:

  1. Catching ImportError instead of ModuleNotFoundError to be more robust against various import failures (e.g., corrupted installs or circular dependencies) that should also be skipped by pickle.
  2. Preserving the original error message and using from e to provide a clear explanation to the user while still satisfying the AttributeError requirement for pickle.
Suggested change
try:
real = self._resolve()
except ModuleNotFoundError:
# mlx is Apple-only. On non-mlx hosts the real submodule import
# fails. Surface as AttributeError so callers that walk sys.modules
# (notably pickle.whichmodule, used by torch._inductor's FX graph
# hash pickler) skip this stub cleanly instead of crashing the
# whole compile. Real user attribute access on a non-mlx host
# still surfaces a useful error -- they will see the AttributeError
# rather than a torch._dynamo.exc.BackendCompilerFailed wrapper.
raise AttributeError(name)
return getattr(real, name)
try:
real = self._resolve()
except ImportError as e:
# mlx is Apple-only. On non-mlx hosts the real submodule import
# fails. Surface as AttributeError so callers that walk sys.modules
# (notably pickle.whichmodule, used by torch._inductor's FX graph
# hash pickler) skip this stub cleanly instead of crashing the
# whole compile. Real user attribute access on a non-mlx host
# still surfaces a useful error -- they will see the AttributeError
# rather than a torch._dynamo.exc.BackendCompilerFailed wrapper.
raise AttributeError(f"module '{self.__name__}' has no attribute '{name}' (resolution failed: {e})") from e
return getattr(real, name)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant