Skip to content

Error: Attempt to split tensors that exceed maximum supported devices. Current LLAMA_MAX_DEVICES=1 #1166

Closed
@irthomasthomas

Description

@irthomasthomas

Hey, I'm not sure if this is a bug, or I just need change a config somewhere, but since the update, I get this error when I try to run models split across two cuda devices:

ValueError: Attempt to split tensors that exceed maximum supported devices. Current LLAMA_MAX_DEVICES=1

Is that related to this? ggml-org/llama.cpp#5240

Here is the command I ran:
python3 -m llama_cpp.server --model /Models/deepseek-coder-33b-instruct.Q4_K_M.gguf --n_gpu_layers 56 --tensor_split 64 36 --offload_kqv false --n_ctx 8000 --n_batch 56 --chat_format chatml

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions