-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate DeepSpeed Inference #845
Comments
would be more than happy to take this on, are there any resources available for these tests or is it up to me to find them? |
If by “resources” you mean computing resources, yes we can easily make GPUs available for testing this PR. |
Hi Stella, thanks for your reply. Yup that's exactly what I meant, sorry about the poor wording. Great, in that case I'd love to take this on. Let me know how you'd like to arrange access to the GPUs and I'll get going! |
+1 |
Hey @Quentin-Anthony - I would love the chance to work on this! |
Some initial numbers are promising. With the current configs/125M.yml and text_generation.yml, I see some pretty consistent numbers of ~2.4s duration_seconds go down to ~1.4s, on a single-GPU node (A10G). Will share more numbers once I can get compute. |
DeepSpeed wins most inference benchmarks I see. We should test their claims on neox models. EleutherAI spends a significant amount of compute running inference, so any improvement in inference performance would be high-impact. What I would like to see are:
The text was updated successfully, but these errors were encountered: