Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to be able to assign to each GPU user-defined number of layers. #4656

Closed
4 tasks done
phalexo opened this issue Dec 27, 2023 · 3 comments
Closed
4 tasks done

Need to be able to assign to each GPU user-defined number of layers. #4656

phalexo opened this issue Dec 27, 2023 · 3 comments
Labels
enhancement New feature or request stale

Comments

@phalexo
Copy link

phalexo commented Dec 27, 2023

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

Please provide a detailed written description of what you were trying to do, and what you expected llama.cpp to do as an enhancement.
I would like to be able to explicitly define the number of layers to put on each available GPU depending on how much VRAM it has and if said GPU is being used for something else, i.e other models, context, etc....

Motivation

Avoid OOMs when the GPU 0 is getting more than others from the model weights PLUS the context, when one can put fewer layers on GPU 0 and more layers on other GPUs with free VRAM.

Please provide a detailed written description of reasons why this feature is necessary and how it is useful to llama.cpp users.

More users would be able to run larger models.

Possible Implementation

It should be easy to provide a flag to llama.cpp like "--fractions 4, 9, 9, 9" to put 4 layers on GPU 0, and 9 layers on 1,2,3.
This would free up VRAM on GPU 0 for the context, scratch VRAM, etc....

If you have an idea as to how it can be implemented, please write a detailed description. Feel free to give links to external sources or share visuals that might be helpful to understand the details better.

@phalexo phalexo added the enhancement New feature or request label Dec 27, 2023
@phalexo phalexo changed the title Need control to assign to each GPU user-defined number of layers. Need to be able to assign to each GPU user-defined number of layers. Dec 27, 2023
@ggerganov ggerganov reopened this Dec 31, 2023
@ggerganov
Copy link
Owner

I think #4766 will address most of this issue

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Mar 18, 2024
Copy link
Contributor

github-actions bot commented Apr 2, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

2 participants