Closed
Description
Feature Description
Adding CPUSet and thus a better core selection and usage for llama.cpp
Works on Windows and Linux x64 up to 64 logical cores.
Motivation
Faster, about 10%, and more efficient inference.
Keep the system responsive while using llama.cpp.
Possible Implementation
Problems addressed:
- Only uses physical cores
- Filters out the E-Cores on Intel platforms
- Sticks to the same Last Layer cache (eg. L3 for AMD Dual CCD processors)
- Cores are selected based on their scheduler priority (default: worst to best cores)
- Compute threads are only allocated on the selected cores
- Disables Windows power management throttling (Power, Timer, Memory)
- Always excludes Core 0
- Custom cpu bitmask
- Optionally include the Core 0 or the threaded ones
These command line options have been added:
-bco
: Best Core Order, set to 1 will invert the default order and the cores will be selected from the best to the worst-llct
: Last Level Cache Traversal, set to 1 will allow the core selection to traverse the Last Level cache index-acz
: Allow Core Zero, set to 1 will allow selection of Core 0-atc
: Allow Threaded Cores, set to 1 will allow selection of threaded, non physical cores-ccm
: Custom Cpu Mask, allow setting a custom cpu affinity bitmask as integer
Please test if the default settings are followed or not and if the options are behaving as expected.
In particular test if the E-Cores on Intel are correctly detected and disabled.
Make a comparison of the speed in prompt and eval against the master
branch and report the results.
Thanks