Error: Attempt to split tensors that exceed maximum supported devices. Current LLAMA_MAX_DEVICES=1

Hey, I'm not sure if this is a bug, or I just need change a config somewhere, but since the update, I get this error when I try to run models split across two cuda devices:

`ValueError: Attempt to split tensors that exceed maximum supported devices. Current LLAMA_MAX_DEVICES=1`

Is that related to this? https://github.com/ggerganov/llama.cpp/pull/5240

Here is the command I ran:
`python3 -m llama_cpp.server --model /Models/deepseek-coder-33b-instruct.Q4_K_M.gguf --n_gpu_layers 56 --tensor_split 64 36 --offload_kqv false --n_ctx 8000 --n_batch 56 --chat_format chatml`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Error: Attempt to split tensors that exceed maximum supported devices. Current LLAMA_MAX_DEVICES=1 #1166

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Error: Attempt to split tensors that exceed maximum supported devices. Current LLAMA_MAX_DEVICES=1 #1166

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions