Bug fixes by danielhanchen · Pull Request #11 · unslothai/unsloth-zoo

danielhanchen · 2024-11-05T21:25:37Z

No description provided.

KEY DISCOVERY: cache_position corruption starts early with 1156-element tensors instead of single positions. This spreads and corrupts GPU memory. - Detect corrupted cache_position shapes early (call unslothai#11) - Fix by extracting last position and creating proper tensor - Robust flex_attention fallbacks when GPU is corrupted - CPU tensor fallbacks when even torch.ones() fails on GPU This should prevent the CUDA illegal memory access by fixing the root cause before it spreads. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Multi-reviewer pass on the autocast wrapper / norm-upcast path: - Instance-level forward (#2): an instance attribute `model.forward` (Unsloth runtime forward patching) shadows class-method overrides, so mutating __class__ silently bypassed the wrapper -> fp32 norm met a bf16 linear with no autocast and crashed. Now wrap the instance attribute when present; otherwise subclass as before. - Wrapper gating (unslothai#5, unslothai#7): install the wrapper iff fp32 norm params actually exist (from our upcast, the legacy env upcast, or an external _pre_set_compute_dtype policy) -- not on the upcast DECISION. Fixes the rollback path leaving external fp32 norms exposed, and stops wrapping models with no fp32 norm. Add _unwrap_forward_in_bf16_autocast for re-prepare (unslothai#10). - config.architectures leak (unslothai#8/unslothai#9): keep the original __name__ on the generated subclass (unique __qualname__ for registration) so save_pretrained records the base architecture. - Device detection (unslothai#11): recurse into mapping/list/tuple batches and fall back to the model's parameter device instead of defaulting to "cuda". - Legacy UNSLOTH_UPCAST_LAYERNORM (#1/#3/unslothai#4): route through the shared _cast_named_module + union matcher and honour the external-policy deferral. - Recursive external-ownership guard (unslothai#6): record descendants of tagged modules (the external policy casts recursively). - Fresh-interpreter pickle test (unslothai#12): real subprocess load. Shared helpers: _find_tensor_device_type, _call_forward_with_bf16_autocast, _canonical_module_name, _cast_named_module. Unit suite: 25 passed.

danielhanchen added 23 commits November 3, 2024 00:23

Update __init__.py

a811533

Merge branch 'main' into nightly

65c9233

Update __init__.py

37c7074

Update __init__.py

b200f44

Create patching_utils.py

8c379f9

Bug fixes

fa83697

Update pyproject.toml

54e0c64

Update __init__.py

5178b39

O3

ac18419

Update tokenizer_utils.py

8b3aade

Post patch

dc79f8d

Update patching_utils.py

8a27007

Update patching_utils.py

589697a

Update gradient_checkpointing.py

ac1334e

Update patching_utils.py

a6892c7

Update patching_utils.py

a768c1b

Update patching_utils.py

f2a8878

Update patching_utils.py

beee64a

Update patching_utils.py

408f327

Update patching_utils.py

0a8385b

Update patching_utils.py

664fd43

Update patching_utils.py

0c12090

Update loss_utils.py

03b91cd

danielhanchen merged commit e035111 into main Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fixes#11

Bug fixes#11
danielhanchen merged 23 commits into
mainfrom
nightly

danielhanchen commented Nov 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danielhanchen commented Nov 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant