Local Models - What Works, What Doesn't? #126

rosie-at-pieces · 2024-03-13T17:46:07Z

rosie-at-pieces
Mar 13, 2024
Collaborator

Hi everyone! We have a lot of local models available now to set as the copilot's runtime, and we're getting a lot of feedback about them, good and bad. Since these are highly experimental, we would love your help so we can make this experience the best we can.

Please share your experience on this discussion so that we have better data for recommendations. Examples:

I am running a 2022 MacBook Air, 24GB, with an Apple M2 chip. I find that Phi-2 GPU runs the fastest but Mistral 7B GPU is pretty good.

I am running a 2020 Windows 8GB machine, and Mistral 7B gives me a "I'm sorry, something went wrong with processing" error.

rosie-at-pieces · 2024-03-29T16:59:16Z

rosie-at-pieces
Mar 29, 2024
Collaborator Author

Also wanted to post this blog here, it gives some advice on best choosing a model!

How to Run an LLM Locally with Pieces

0 replies

dusrdev · 2024-06-20T17:15:23Z

dusrdev
Jun 20, 2024

I started using pieces yesterday, with the local model Mistral 7B. I am on Macbook Pro 14" with M2 pro - 16GB ram.
The performance is not the absolute best but good enough for me.
But I am curious about the hardware specs of each model, aside from the requirements which list suggested available ram, where can I see how much hard drive space a model actually requires? where are they stored?

1 reply

rosie-at-pieces Jun 20, 2024
Collaborator Author

great feedback here!

@brian-pieces @mack-at-pieces thoughts?

CaullenOmdahl · 2024-09-02T09:41:51Z

CaullenOmdahl
Sep 2, 2024

I'm using the latest version as of September 2nd, 2024 (GMT+7). I have an AMD RX6800 with 16GB of vram, and I should be able to load models completely in vram. It looks like the program is choosing to split the LLM across CPU/RAM and GPU. Is there a way to load completely on the GPU, or is there a limitation that I've overlooked?

`

lllllllllllllll lllllllllllllll Caullen@SERVER
lllllllllllllll lllllllllllllll --------------
lllllllllllllll lllllllllllllll OS: Windows 11 Pro [64-bit]
lllllllllllllll lllllllllllllll Host: System manufacturer System Product Name
lllllllllllllll lllllllllllllll Kernel: 10.0.22631.0
lllllllllllllll lllllllllllllll Motherboard: ASUSTeK COMPUTER INC. ROG STRIX X570-I GAMING
lllllllllllllll lllllllllllllll Uptime: 1 day 16 hours 35 minutes
Packages: 39 (choco)
lllllllllllllll lllllllllllllll Shell: PowerShell v5.1.22621.3958
lllllllllllllll lllllllllllllll Resolution: 2560x1440
lllllllllllllll lllllllllllllll Terminal: Windows Terminal
lllllllllllllll lllllllllllllll CPU: AMD Ryzen 9 5900X 12-Core Processor @ 4.575GHz
lllllllllllllll lllllllllllllll GPU: AMD Radeon RX 6800
lllllllllllllll lllllllllllllll Memory: 19.33 GiB / 63.91 GiB (30%)
lllllllllllllll lllllllllllllll Disk (C:): 286 GiB / 930 GiB (30%)

`

3 replies

brian-at-pieces Sep 3, 2024
Maintainer

@CaullenOmdahl sorry to hear that, defintely shouldn't be behaving in this way with that GPU. It's designed to split the model based on your available VRAM, but you should have plenty for any of our local models..

A couple of questions:

What model were you trying to use?
What lead you to believe it's spliting it between CPU and GPU?

CaullenOmdahl Sep 6, 2024

I'm using Llama 8B. When sending a query, CPU usage is 50-60% and GPU usage is around 40-60%, with about 40% of vmem being utilized.

brian-at-pieces Sep 9, 2024
Maintainer

Very odd... We're using OpenCL to detect the available VRAM for AMD cards but maybe there's an issue with our library and your specific card. We're looking to replace OpenCL soon, so that will hopefully resolve the issue, but in the meantime I'll need to do some testing to try to reproduce this. Sorry about this, I'll report back when I know more. If possible, please use one of the cloud model offerings until this is resolved!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local Models - What Works, What Doesn't? #126

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Local Models - What Works, What Doesn't? #126

rosie-at-pieces Mar 13, 2024 Collaborator

Replies: 3 comments · 4 replies

rosie-at-pieces Mar 29, 2024 Collaborator Author

dusrdev Jun 20, 2024

rosie-at-pieces Jun 20, 2024 Collaborator Author

CaullenOmdahl Sep 2, 2024

brian-at-pieces Sep 3, 2024 Maintainer

CaullenOmdahl Sep 6, 2024

brian-at-pieces Sep 9, 2024 Maintainer

rosie-at-pieces
Mar 13, 2024
Collaborator

Replies: 3 comments 4 replies

rosie-at-pieces
Mar 29, 2024
Collaborator Author

dusrdev
Jun 20, 2024

rosie-at-pieces Jun 20, 2024
Collaborator Author

CaullenOmdahl
Sep 2, 2024

brian-at-pieces Sep 3, 2024
Maintainer

brian-at-pieces Sep 9, 2024
Maintainer