Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

b2950 broke RPC mode #7427

Closed
steampunque opened this issue May 21, 2024 · 3 comments
Closed

b2950 broke RPC mode #7427

steampunque opened this issue May 21, 2024 · 3 comments

Comments

@steampunque
Copy link

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

After b2950 patch RPC functionality is broken. Offloading to 3 machines first server crashes with below message. Reverting back to b2949 fixes the problem.

ll_startrpc
create_backend: using CUDA backend
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: yes
ggml_cuda_init: CUDA_USE_TENSOR_CORES: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce GTX 1070, compute capability 6.1, VMM: yes
Starting RPC server on 0.0.0.0:50052, backend memory: 8022 MB
Accepted client connection, free_mem=8412266496, total_mem=8500477952
GGML_ASSERT: /usr/local/src/ai/llamacpp/llama.cpp/ggml-backend.c:226: offset + size <= ggml_nbytes(tensor) && "tensor write out of bounds"
[New LWP 11678]
[New LWP 11684]
[New LWP 11685]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f66930dc3c7 in wait4 () from /lib64/libc.so.6
#0 0x00007f66930dc3c7 in wait4 () from /lib64/libc.so.6
#1 0x0000000000411f4b in ggml_print_backtrace ()
#2 0x000000000046639a in ggml_backend_tensor_set ()
#3 0x0000000000541d20 in start_rpc_server ()
#4 0x0000000000406ebc in main ()
[Inferior 1 (process 11677) detached]
/usr/local/bin/ll_startrpc: line 14: 11677 Aborted rpc-server -H 0.0.0.0 -p 50052

@rgerganov
Copy link
Collaborator

You are most probably running an old rpc-server with a new build of llama.cpp. We have added #pragma pack(push, 1) to rpc_tensor and now it is serialized into 292 bytes instead of 296 bytes. Make sure that you are building rpc-server from the same source tree as the rest of the binaries.

I may add a new HELLO command which advertises the version of the rpc-server when a new client connects. This may prevent problems like this in the long term.

@steampunque
Copy link
Author

steampunque commented May 21, 2024

I am pretty sure client and server were both built from the same source tree, b2950. I upgraded all 3 machines to b2950 at the same time.

There is a small chance the source tree didn't sync right to the 1070 machine, I will double
check and rebuild it tonight.

@steampunque
Copy link
Author

That was the problem, my source tree got out of sync on the 1070 machine somehow. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants