-
Notifications
You must be signed in to change notification settings - Fork 89
Issues: Lightning-AI/lightning-thunder
Label tracking meta-issue (edit me to get automatically CC'ed...
#72
opened Mar 25, 2024 by
carmocca
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Thunder saves too many tensors for backward for a Transformer's residual connection pattern
bug
Something isn't working
memory use
rematerialization
#1368
opened Oct 30, 2024 by
IvanYashchuk
ThunderFX is slower than torch.compile for vicuna-33b-v1.3, Platypus-30B and falcon-40b with FSDP & zero3.
mixology
Issues that the mixology team has surfaced
#1366
opened Oct 30, 2024 by
mpatel31415
Support Qwen2.5-7B-Instruct model
blocks NeMo
huggingface
For supporting HF models
nemo
Issues needed to support NVIDIA NeMo models.
program-coverage
Requests for model and program coverage
thunderfx
for things that could be applicable to the dynamo+thunder frontend
#1286
opened Oct 10, 2024 by
tfogal
OOM for training on 4 nodes for falcon-40b and vicuna-33b-v1.3
memory use
mixology
Issues that the mixology team has surfaced
#1233
opened Oct 1, 2024 by
mpatel31415
Add documentation for the priority order of executors
enhancement
New feature or request
executors
#1217
opened Sep 30, 2024 by
IvanYashchuk
FSDP2 & Thunder looks memory hungrier than for things that could be applicable to the dynamo+thunder frontend
thunder.distributed.fsdp
for certain models
distributed
memory use
thunderfx
#1176
opened Sep 20, 2024 by
crcrpar
Thunder seems to use way more memory when for things that could be applicable to the dynamo+thunder frontend
litgpt.Config.parallel_residual=True
memory use
thunderfx
#1175
opened Sep 20, 2024 by
crcrpar
TypeError when calling Issues needed to support NVIDIA NeMo models.
program-coverage
Requests for model and program coverage
triage review
rmsnorm_fwd_noalloc
from Megatron TransformerBlock
nemo
#1053
opened Aug 26, 2024 by
riccardofelluga
Add a notebook demonstrating usage of Thunder as a Dynamo backend
dynamo
#963
opened Aug 13, 2024 by
IvanYashchuk
Split saved for backward information based on types
autograd
#959
opened Aug 12, 2024 by
IvanYashchuk
sdpa_ex - Incorrect device in trace vs from actual computation
sdpa
triage review
#950
opened Aug 9, 2024 by
kshitij12345
Different shapes, values of model weights and losses between FSDP training in Eager mode and with Thunder
bug
Something isn't working
distributed
mixology
Issues that the mixology team has surfaced
#866
opened Jul 25, 2024 by
mpatel31415
Raise an error when PyTorch's activation checkpointing is used with Thunder-jitted model
warnings & errors
#770
opened Jul 15, 2024 by
IvanYashchuk
[TransformerEngine] Support
backward(retain_graph=True)
autograd
TransformerEngine
#701
opened Jul 3, 2024 by
kshitij12345
have a method to compare speed of different parts of training between compilation backends
enhancement
New feature or request
#444
opened May 22, 2024 by
mpatel31415
Distributed and Bucketing Performance Improvements
bug
Something isn't working
distributed
enhancement
New feature or request
performance
#348
opened May 2, 2024 by
parthmannan
Thunder + Inductor gives OOM for stablecode-completion-alpha-3b model from LitGPT
bug
Something isn't working
memory use
mixology
Issues that the mixology team has surfaced
#246
opened Apr 22, 2024 by
mpatel31415
Enable xfailed tests from test_apex_executor.py
apex
bug
Something isn't working
ci / tests
#220
opened Apr 18, 2024 by
IvanYashchuk
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.