You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Three weeks ago #6387 removed mmap() support for MoE models. This causes Mixtral 8x7b F16 to take 30x longer to load on my Threadripper w/ 5200 MT/s RAM. It used to take 2 seconds to load. Now it takes 56 seconds to load.
Can we reconsider this? I would rather have 3d tensor creation be a 1-time cost in the conversion script, rather than happening each time the llama.cpp process spawns.
The text was updated successfully, but these errors were encountered:
Three weeks ago #6387 removed mmap() support for MoE models. This causes Mixtral 8x7b F16 to take 30x longer to load on my Threadripper w/ 5200 MT/s RAM. It used to take 2 seconds to load. Now it takes 56 seconds to load.
Can we reconsider this? I would rather have 3d tensor creation be a 1-time cost in the conversion script, rather than happening each time the llama.cpp process spawns.
The text was updated successfully, but these errors were encountered: