Enable Llama2 70B to run with hqt on single card (#50)#780
Conversation
Add disk_offload flag that controls device_map=auto. Setting this flag enbales weights offload to disk when cpu memory runs OOM. Add const serialization path flag that gets a path for where to serialize const sections, so if there is no space on device to save all const sections they will be offloaded to disk.
We'll update this |
|
@HolyFalafel I guess this PR wouldn't work with Synapse 1.14 right? |
You are right, v1.14 doesn't support some of the changes done here, and llama 70B won't be able to run on a single card |
@regisss if you can try to run it on 1.14, it would be good, it might still work, we're not sure |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Add disk_offload flag that controls device_map=auto. Setting this flag enbales weights offload to disk when cpu memory runs OOM.
Add const serialization path flag that gets a path for where to serialize const sections, so if there is no space on device to save all const sections they will be offloaded to disk.
This branch replaces the branch from PR #762