Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Middle Class? #2161

Open
EugenHotaj opened this issue Dec 16, 2024 · 3 comments
Open

GPU Middle Class? #2161

EugenHotaj opened this issue Dec 16, 2024 · 3 comments
Labels
discussion Start a discussion distributed Anything related to distributed env (multi-GPU, multi-node) triaged This issue has been assigned an owner and appropriate label

Comments

@EugenHotaj
Copy link

EugenHotaj commented Dec 16, 2024

Does torchtune have any plans to support "GPU middle class" users?

We're trying to evaluate using torchtune for post-training, especially since there are many useful features implemented (RLHF, LORA, etc). However, one big sticking point is that the system seems heavily geared towards single-node training. Are there plans to support multi-node training (e.g. 16-64 nodes) and things like model parallelism, 128k context training, etc?

If not, is torchtitan the recommended system to use?

Thanks!

@joecummings joecummings added discussion Start a discussion distributed Anything related to distributed env (multi-GPU, multi-node) triaged This issue has been assigned an owner and appropriate label labels Dec 16, 2024
@joecummings
Copy link
Contributor

Hey @EugenHotaj - glad you're checking out torchtune. Up til now, we've managed to provide pretty extensive offerings including long-context, large models up to 405B, and RLHF all on single node. This has allowed people will smaller GPU budgets to fine-tune some pretty incredible models and develop new features faster b/c single node is much easier to debug.

Now, all that said, torchtune technically already supports multi-node for FSDP. And we plan on adding tensor parallel + model parallel very soon. The absolute latest we will have these features in torchtune is end of January, but I would bet on sooner!

Would you need anything beyond these parallelism techniques, e.g. pipeline parallel? Are you running on MAST or something like SLURM?

@EugenHotaj
Copy link
Author

Now, all that said, torchtune technically already supports multi-node for FSDP. And we plan on adding tensor parallel + model parallel very soon. The absolute latest we will have these features in torchtune is end of January, but I would bet on sooner!

Thanks @joecummings that's awesome to hear!

Would you need anything beyond these parallelism techniques, e.g. pipeline parallel? Are you running on MAST or something like SLURM.

Yes we use SLURM -- I'm currently trying to hack a multi-node run from your suggestions on #2018 and torchtitan, so having some examples in torchtune would be super useful imo. We'd also take all the parallelisms we can get 😃, e.g. model, pipeline, and attention parallelism for longer context.

@tginart
Copy link

tginart commented Dec 17, 2024

I second SLURM! I have also been trying to hack this into torchtune since the single-node experience is quite good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Start a discussion distributed Anything related to distributed env (multi-GPU, multi-node) triaged This issue has been assigned an owner and appropriate label
Projects
None yet
Development

No branches or pull requests

3 participants