Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Implement tensor parelellism over rpc. At the moment setting --split-mode row has no effect if used for rpc the rpc server.
Could you provide me with a rough outline on how I would best go about it?
What steps would I have to take to extend the functionality of the rpc server?
Motivation
I love your project, its everything i looked for. You guys are true heros, the antidote to nvidias corporate greed.
I am running at home two tesla p100 on old gaming mainboards, connected via an infiniband nic in eth mode. The nic is dirt cheap aswell as the tesla p100, if we can get this to work, you can easily run 8B models with 60+ tps with just two cards.
This will unlock the full potential of homelabs/smaller enterprise.
Love you guys
Possible Implementation
I just started looking into it and found your implementation for row splitting on a single host.
if (split_mode == LLAMA_SPLIT_MODE_ROW) {
ggml_backend_reg_t reg = ggml_backend_dev_backend_reg(dev);
auto ggml_backend_split_buffer_type_fn = (ggml_backend_split_buffer_type_t)
ggml_backend_reg_get_proc_address(reg, "ggml_backend_split_buffer_type");
if (ggml_backend_split_buffer_type_fn) {
size_t dev_index = [&]() {
auto * reg = ggml_backend_dev_backend_reg(dev);
for (size_t i = 0; i < ggml_backend_reg_dev_count(reg); ++i) {
if (ggml_backend_reg_dev_get(reg, i) == dev) {
return i;
}
}
throw std::runtime_error(format("device %s not found in its backend reg", ggml_backend_dev_name(dev)));
}();
auto * buft = ggml_backend_split_buffer_type_fn(dev_index, tensor_split);
if (buft != nullptr) {
buft_list.emplace_back(dev, buft);
}
}
}
Distributing the splits via rpc to different hosts for computation. What files/folders would I need to have a look at, I am asking for some general guidance.