Skip to content

Help test CPUSet patch for Windows and Linux #6927

Closed
@mann1x

Description

@mann1x

Feature Description

Adding CPUSet and thus a better core selection and usage for llama.cpp
Works on Windows and Linux x64 up to 64 logical cores.

Motivation

Faster, about 10%, and more efficient inference.
Keep the system responsive while using llama.cpp.

Possible Implementation

#6832

Problems addressed:

  • Only uses physical cores
  • Filters out the E-Cores on Intel platforms
  • Sticks to the same Last Layer cache (eg. L3 for AMD Dual CCD processors)
  • Cores are selected based on their scheduler priority (default: worst to best cores)
  • Compute threads are only allocated on the selected cores
  • Disables Windows power management throttling (Power, Timer, Memory)
  • Always excludes Core 0
  • Custom cpu bitmask
  • Optionally include the Core 0 or the threaded ones

These command line options have been added:

  • -bco: Best Core Order, set to 1 will invert the default order and the cores will be selected from the best to the worst
  • -llct: Last Level Cache Traversal, set to 1 will allow the core selection to traverse the Last Level cache index
  • -acz: Allow Core Zero, set to 1 will allow selection of Core 0
  • -atc: Allow Threaded Cores, set to 1 will allow selection of threaded, non physical cores
  • -ccm: Custom Cpu Mask, allow setting a custom cpu affinity bitmask as integer

Please test if the default settings are followed or not and if the options are behaving as expected.
In particular test if the E-Cores on Intel are correctly detected and disabled.

Make a comparison of the speed in prompt and eval against the master branch and report the results.

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions