forked from axolotl-ai-cloud/axolotl
-
Notifications
You must be signed in to change notification settings - Fork 0
Phase 2: ProTrain integration with Axolotl perf features (M0–M6C closed) #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
thad0ctor
wants to merge
43
commits into
protrain-optim-checkpoint-phase2-mode-c
from
protrain-phase2-integration
Closed
Changes from all commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
12f1a12
chore(protrain): fix pre-existing mypy on model_wrapper:386 + formatting
thad0ctor 6c3fcb1
feat(protrain): enable FlashAttention in canonical LoRA example (M4)
thad0ctor 45a934f
feat(protrain): allow load_in_8bit / load_in_4bit (M2+M3 Mode A)
thad0ctor 1fe8ddb
feat(protrain): integrate fused LoRA kernels via container hooks (M1)
thad0ctor a6dfc37
feat(protrain): add bnb 8-bit AdamW optimizer adapter (M2.5)
thad0ctor 823b4db
feat(protrain): reject unsupported optimizers at config load (M6B)
thad0ctor c857675
test(protrain): PEFT edge-case smoke tests (M6A)
thad0ctor 3ce55a8
test(protrain): cross-mode (A↔C) resume smoke tests (M6C)
thad0ctor 6cb5c84
fix(protrain): force_all_persistent suppresses trace-pass on-demand e…
thad0ctor a868978
test(protrain): pin bnb 8-bit/4-bit + ProTrain offload-mode (M3 audit…
thad0ctor 91e0912
fix(p2p): rank-symmetric check_cuda_p2p_support + measure_nccl barrier
thad0ctor a00df59
test(protrain): real multi-GPU cross-mode resume xfail tests (M6C)
thad0ctor 016dac8
docs(protrain): document mode-pinned checkpoints + Mode C plain LoRA gap
thad0ctor 4856090
feat(protrain): per-container PEFT-LoRA gather in on-demand profiler …
thad0ctor a71f26e
feat(protrain): cross-mode resume hook for HF Trainer load_checkpoint…
thad0ctor 4eb6da6
feat(protrain): skip profiler trace pass when explicit override knobs…
thad0ctor 32663f3
feat(protrain): runtime-side per-LoRA-container gather hooks (M6C-fix-3)
thad0ctor 008b62e
docs(protrain): update Mode C PEFT-LoRA section per M6C-fix-3 close
thad0ctor b5ffa3d
refactor(protrain): synchronous gather in ensure_chunks_resident (M6C…
thad0ctor b787acb
feat(protrain): late-NCCL-re-search skip on overrides + autocast diag…
thad0ctor 0f44bfb
feat(protrain): per-LoRA-container post-fwd/bwd hooks (M6C-fix-6 hard…
thad0ctor 55d9237
docs(protrain): formalize M6C-fix end-of-chain limitation in DESIGN.md
thad0ctor c0da428
feat(protrain): shape-preserving release-state placeholder (M6C-fix-7…
thad0ctor 17ffb8d
feat(protrain): close M6C chain — DDP init-sync bypass for chunk-mana…
thad0ctor 6febed8
docs(protrain): close M6C limitation section — multi-GPU plain LoRA M…
thad0ctor 2fcc1fc
feat(protrain): per-dtype α fragmentation factor (Coverage audit Bloc…
thad0ctor f74c559
test(protrain): regress paged_adamw_8bit + Mode C multi-GPU @ seq=2048
thad0ctor d1ef2dd
chore(protrain): address CodeRabbit PR #21 quick-win nits
thad0ctor 3aff348
chore(protrain): apply CodeRabbit re-review quick-win nits (round 2)
thad0ctor 09cf657
feat(protrain): in-process rebuild lifecycle (D1/D2/D3) + P2P fail-cl…
thad0ctor d7624fb
test(protrain): address remaining CodeRabbit test-quality deferrals (…
thad0ctor 6961490
feat(protrain): scheduler SWAP-stream safety barrier (R3-#1) + resume…
thad0ctor e6d8a1a
test(protrain): CodeRabbit R3 test-quality fixes (R3-#2, #3, #4, #5, #8)
thad0ctor b61f04e
feat(protrain): predict iter-1 init-transient peak (audit Block G)
thad0ctor aa0c6ba
fix(protrain): Mode-C steady-peak CKPT-chain accounting (audit Block G)
thad0ctor c996ce9
fix(protrain): close CodeRabbit R4 review (1 Critical + 2 Major + 1 M…
thad0ctor f09be09
chore(protrain): apply ruff-format reformats to cost/runtime + test_c…
thad0ctor 55377e5
chore(protrain): normalize confusable unicode in commentary/docstring…
thad0ctor 69eb152
fix(protrain): CodeRabbit full-review Majors — 4 real correctness gap…
thad0ctor 40bb8ad
chore(protrain): CodeRabbit full-review Minors — docs consistency + t…
thad0ctor 67372c3
fix(test): test_chunk_optim_shutdown caplog → mock.patch on LOG (CI f…
thad0ctor db094b5
chore(protrain): trim non-WHY comments and address CodeRabbit findings
thad0ctor cc72ca4
docs(protrain): document deferred non-compute α decomposition (ticket B)
thad0ctor File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
678 changes: 635 additions & 43 deletions
678
src/axolotl/integrations/protrain/api/model_wrapper.py
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.