rpc: fix register position #11424

thxCode · 2025-01-26T07:23:53Z

Make sure to read the contributing guidelines before submitting a PR

PR #11262 reverted the changes introduced by PR #9296, changed the output layer assigned device to the remote RPC server.

this PR will keep assigning the output layer to the local device.

-m ../Qwen/Qwen2.5-0.5B-Instruct-GGUF/qwen2.5-0.5b-instruct-fp16.gguf --tensor-split 1,9 --rpc 127.0.0.1:50052

current

load_tensors: layer   0 assigned to device Metal
load_tensors: layer   1 assigned to device Metal
load_tensors: layer   2 assigned to device Metal
load_tensors: layer   3 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer   4 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer   5 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer   6 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer   7 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer   8 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer   9 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer  10 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer  11 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer  12 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer  13 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer  14 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer  15 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer  16 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer  17 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer  18 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer  19 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer  20 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer  21 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer  22 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer  23 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer  24 assigned to device RPC[127.0.0.1:50052]

after this pr

load_tensors: layer   0 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer   1 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer   2 assigned to device RPC[127.0.0.1:50052]
load_tensors: layer   3 assigned to device Metal
load_tensors: layer   4 assigned to device Metal
load_tensors: layer   5 assigned to device Metal
load_tensors: layer   6 assigned to device Metal
load_tensors: layer   7 assigned to device Metal
load_tensors: layer   8 assigned to device Metal
load_tensors: layer   9 assigned to device Metal
load_tensors: layer  10 assigned to device Metal
load_tensors: layer  11 assigned to device Metal
load_tensors: layer  12 assigned to device Metal
load_tensors: layer  13 assigned to device Metal
load_tensors: layer  14 assigned to device Metal
load_tensors: layer  15 assigned to device Metal
load_tensors: layer  16 assigned to device Metal
load_tensors: layer  17 assigned to device Metal
load_tensors: layer  18 assigned to device Metal
load_tensors: layer  19 assigned to device Metal
load_tensors: layer  20 assigned to device Metal
load_tensors: layer  21 assigned to device Metal
load_tensors: layer  22 assigned to device Metal
load_tensors: layer  23 assigned to device Metal
load_tensors: layer  24 assigned to device Metal

rgerganov · 2025-01-26T09:49:41Z

Thanks for catching this! Changes look fine to me but please wait for @slaren's approval before merging.

slaren

Applications should not depend on the order of devices, and we definitely should not modify the ggml-backend API in this way. Instead, either pass a sorted list of devices to llama.cpp, or add the necessary logic to sort the device list here:
https://github.com/ggerganov/llama.cpp/blob/2cc9b8c32c78d09cd1b4df0aaa605ab2d0176243/src/llama.cpp#L9407-L9422

Signed-off-by: thxCode <[email protected]>

thxCode · 2025-01-26T13:55:25Z

@slaren PTAL

lucyknada · 2025-01-30T17:20:46Z

@slaren
user-facing this becomes an issue because if you --list-devices you see an entirely different order (rpc devices last) than youre supposed to pass to --tensor-split to (rpc first), so you are left to guess what the order ends up being instead of just getting it from --list-devices, through sheer bruteforce, is there a way to still adjust this to match?

slaren · 2025-01-30T17:24:25Z

Modify --list-devices so that it also sorts the device list in the same way as llama.cpp. Or as already suggested:

pass a sorted list of devices to llama.cpp

lucyknada · 2025-01-30T17:27:22Z

pass a sorted list of devices to llama.cpp

is that a user-facing option or implementation detail?

slaren · 2025-01-30T17:30:37Z

Both. The user can pass a custom list of devices with -dev, but the llama.cpp examples can also pass a list instead of relying on the default list of devices that llama.cpp uses.

abc-nix · 2025-01-30T20:12:19Z

For those that don't get it (like me initially), you first need to check the device names using the --list-devices option (example below):

 $ llama.cpp/build/bin/llama-server --rpc <IP1>:<PORT1> --rpc <IP2>:<PORT2> --list-devices
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
  Device 0: NVIDIA GeForce RTX XXXX, compute capability 8.6, VMM: yes
  Device 1: NVIDIA GeForce GTX YYYY, compute capability 7.5, VMM: yes
Available devices:
  CUDA0: NVIDIA GeForce RTX XXXX (A MiB, B MiB free)
  CUDA1: NVIDIA GeForce GTX YYYY (A MiB, B MiB free)
  RPC[IP1:PORT1]: RPC[IP1:PORT1] (A MiB, B MiB free)
  RPC[IP2:PORT2]: RPC[IP2:PORT2] (A MiB, B MiB free)

It is under Available devices where you get the device names. Next time you launch llama-server, you will use the --device option with the order you want for your devices. An example:

$ llama.cpp/build/bin/llama-server --rpc <IP1>:<PORT1> --rpc <IP2>:<PORT2> \
--device RPC[IP1:PORT1],CUDA0,CUDA1,RPC[IP2:PORT2] \
-ngl 33 --tensor_split 3/20/10/0 --device-draft CUDA1,RPC[IP2:PORT2] -ngld 99 [...]

This way, you can set up the order however you want. In the complicated example above, the main model is offloaded to the first RPC device (using IP1:PORT1 address), mostly on the CUDA0 device, and partially to the CUDA1 device, while the draft model is offloaded to the CUDA1 device and the second RPC device (using IP2:PORT2 address).

Signed-off-by: thxCode <[email protected]>

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jan 26, 2025

rgerganov approved these changes Jan 26, 2025

View reviewed changes

slaren requested changes Jan 26, 2025

View reviewed changes

rpc: fix register position

2ee90d3

Signed-off-by: thxCode <[email protected]>

thxCode force-pushed the rpc-reg branch from b83034a to 2ee90d3 Compare January 26, 2025 13:54

slaren approved these changes Jan 26, 2025

View reviewed changes

slaren merged commit 1d8ee06 into ggml-org:master Jan 26, 2025
44 of 45 checks passed

rgerganov mentioned this pull request Jan 30, 2025

Misc. bug: llama-server with rpc oom's allocation even though plenty left on devices #11435

Open

abc-nix mentioned this pull request Feb 3, 2025

add --rpc-layers flag to explicitly set RPC layers #11606

Closed

jukofyork mentioned this pull request Feb 5, 2025

llama : add option to override model tensor buffers #11397

Draft

2 tasks

tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025

rpc: fix register position (ggml-org#11424)

ed257b1

Signed-off-by: thxCode <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rpc: fix register position #11424

rpc: fix register position #11424

thxCode commented Jan 26, 2025

rgerganov commented Jan 26, 2025

slaren left a comment

thxCode commented Jan 26, 2025

lucyknada commented Jan 30, 2025

slaren commented Jan 30, 2025

lucyknada commented Jan 30, 2025

slaren commented Jan 30, 2025

abc-nix commented Jan 30, 2025 •

edited

Loading

rpc: fix register position #11424

rpc: fix register position #11424

Conversation

thxCode commented Jan 26, 2025

current

after this pr

rgerganov commented Jan 26, 2025

slaren left a comment

Choose a reason for hiding this comment

thxCode commented Jan 26, 2025

lucyknada commented Jan 30, 2025

slaren commented Jan 30, 2025

lucyknada commented Jan 30, 2025

slaren commented Jan 30, 2025

abc-nix commented Jan 30, 2025 • edited Loading

abc-nix commented Jan 30, 2025 •

edited

Loading