Recent changes to scheduler leading to error `pre-allocated tensor in a buffer that cannot run the operation` #10611

marty1885 · 2024-12-01T08:40:41Z

marty1885
Dec 1, 2024

Hi sorry for another message. I am writing a backend for Tenstorrent's chips.

I've been on commit 0fff7fd for 3 weeks and wanted to pull changes from upstream llama.cpp. Since about 2 weeks ago, I've been failing at pulling upstream changes and make it work. Due to some limitations in my implementation of GGML_OP_CPY I had to reject some of the ops in the supports_op API while running LLMs. After pulling upstream changes, now llama.cpp reports error pre-allocated tensor (v_cache_view-0 (copy of Vcur-0 (transposed))) in a buffer (Metalium 0) that cannot run the operation (CPY).

I did some initial debugging but cannot wiggle myself out of the problem. Even if I force allow the operator to pass, I see massive performance regression (5x slower) and the LLM is not coherent (expected, I banned some copies for this reason).

I am unsure if this is a bug in my backend or in GGML. Even if my backend refuses to run CPY, I believe GGML can copy both tensors to CPU, do the copy there and write it back to device. I know it is slow, but it should work nevertheless.

Can someone point me in some direction? And what's the condition of the error showing up?

Answered by slaren

Dec 1, 2024

I am not aware of any changes to this behavior recently, other than the occasional minor bug fix. It would be possible to copy the tensor to the CPU and copy it back again after the operation, but that's not implemented. You can try using -nkvo to avoid offloadling the KV cache until you have implemented the CPY operation.

View full answer

slaren · 2024-12-01T12:37:48Z

slaren
Dec 1, 2024
Collaborator

I am not aware of any changes to this behavior recently, other than the occasional minor bug fix. It would be possible to copy the tensor to the CPU and copy it back again after the operation, but that's not implemented. You can try using -nkvo to avoid offloadling the KV cache until you have implemented the CPY operation.

4 replies

marty1885 Dec 1, 2024
Author

Thanks! The -nkvo solves the symptom!

I did some brute forcing and found be5cacc is the first commit causing the problem. I don't understand this part of the codebase enough. How does the changes leaning to the symptom and can it be fixed?

slaren Dec 1, 2024
Collaborator

I imagine the reason it worked before is because your backend does not support ROPE either, so the KV wasn't being offloaded either and it had the same effect as using -nkvo.

If your backend only implements a few operations, it may work better by setting the device type to GGML_BACKEND_DEVICE_TYPE_ACCEL.

marty1885 Dec 1, 2024
Author

Thanks for the suggestion. No, I have wide range of operators. But yes the architecture is having trouble writing data back to buffers (WIP). Also Accel backends gets recognized as a part of the CPU. I don't think that's what I want.

slaren Dec 1, 2024
Collaborator

That's the way it is at the moment. Offloading the KV requires supporting operations that are performed in-place in the KV, and that includes CPY and ROPE. You can try restoring the check for support for the KV to llama.cpp, but it can be complicated to get it right (which is why I removed it).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recent changes to scheduler leading to error `pre-allocated tensor in a buffer that cannot run the operation` #10611

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Recent changes to scheduler leading to error pre-allocated tensor in a buffer that cannot run the operation #10611

marty1885 Dec 1, 2024

Replies: 1 comment · 4 replies

slaren Dec 1, 2024 Collaborator

marty1885 Dec 1, 2024 Author

slaren Dec 1, 2024 Collaborator

marty1885 Dec 1, 2024 Author

slaren Dec 1, 2024 Collaborator

Recent changes to scheduler leading to error `pre-allocated tensor in a buffer that cannot run the operation` #10611

marty1885
Dec 1, 2024

Replies: 1 comment 4 replies

slaren
Dec 1, 2024
Collaborator

marty1885 Dec 1, 2024
Author

slaren Dec 1, 2024
Collaborator

marty1885 Dec 1, 2024
Author

slaren Dec 1, 2024
Collaborator