Skip to content

Performance degradation with P40 on larger models #6814

Closed
@samr7

Description

@samr7

I have a machine with a lot of old parts in it, including 8 P40s and 2 Xeon E5-2667v2 CPUs.

I build llama.cpp using:
cmake -DLLAMA_AVX2=off -DLLAMA_F16C=off -DLLAMA_CUBLAS=on -DLLAMA_CUDA_FORCE_MMQ=on

Using a llama2-70b-Q8_0 model, I see good results with release b1842 and earlier. With b1843 and newer, from January 12, with #4766, I see a ~62% drop:

bin/main -m ../text-generation-webui/models/Synthia-70b-v1.2.Q8_0.gguf -ngl 99 -p "Why is the sky blue?" -n 128

b1691: 10.76 t/s
b1767: 9.75 t/s
b1808: 9.76 t/s
b1832: 9.77 t/s
b1842: 9.76 t/s
b1843: 3.73 t/s
b2400: 3.83 t/s
b2709: 3.84 t/s

Trying the test with some other models, the discrepancy is much less in smaller models, to the point that the 8B model is considerably faster with the latest release:

Model b1842 b1843 b2709
Synthia-70b-v1.2.Q8_0 9.76 t/s 3.73 t/s 3.84 t/s
phind-codellama-34b-v2.Q8_0 16.99 t/s 7.54 t/s 7.78 t/s
llama-2-13b-Q8_0 21.10 t/s 17.67 t/s 18.63 t/s
Meta-Llama-3-8B-Instruct.Q8_0 25.66 t/s 33.27 t/s 31.83 t/s

Using fewer GPUs for this test (with the 70b model) makes b1842 a bit slower, but otherwise doesn't seem to change the result much:

GPUs b1842 b1843 b2709
8 9.76 t/s 3.73 t/s 3.84 t/s
4 9.61 t/s 3.77 t/s 3.89 t/s
3 8.32 t/s 3.77 t/s 3.91 t/s

Changing the CPU thread count (with the 70b model) shows relative improvements for each build, but does not resolve the bigger discrepancies:

Threads b1842 b2709
-t 1 10.05 t/s 3.90 t/s
-t 4 10.06 t/s 3.90 t/s
-t 8 10.09 t/s 3.90 t/s

The system is similar in topology to a Supermicro SYS-4028GR-TR2. The GPUs are all PCIe 3.0x16 attached to PLX switches and have relatively good CPU and P2P bandwidth over PCIe -- 11-13GB/s between any pair.

Any ideas?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions