Closed
Description
I am running c37b347 compiled with make LLAMA_HIPBLAS=1 AMDGPU_TARGETS=gfx1030 main
on Arch Linux. My GPU is a 6750XT, using ROCm 5.7.
I run into IO_PAGE_FAULT from amdgpu when trying to run Mixtral at high contexts with -nkvo
option.
I can reproduce the issue with: ./main -m ~/KoboldCpp/models/mixtral-instruct-8x7b-q4k-small.gguf -c 32768 -ngl 2 -nkvo -p "The quick brown fox jumps over " -n 128
system_info: n_threads = 8 / 16 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling:
repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0
The quick brown fox jumps over [end of text]
And I see lots of errors like these in my journal:
Jan 16 18:11:16 Silmeria kernel: amd_iommu_report_page_fault: 821417 callbacks suppressed
Jan 16 18:11:16 Silmeria kernel: amdgpu 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0x0 flags=0x0000]
Jan 16 18:11:16 Silmeria kernel: amdgpu 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0x700 flags=0x0000]
Jan 16 18:11:16 Silmeria kernel: amdgpu 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0xe00 flags=0x0000]
Jan 16 18:11:16 Silmeria kernel: amdgpu 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0x500 flags=0x0000]
Jan 16 18:11:16 Silmeria kernel: amdgpu 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0xc00 flags=0x0000]
Jan 16 18:11:16 Silmeria kernel: amdgpu 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0x300 flags=0x0000]
Jan 16 18:11:16 Silmeria kernel: amdgpu 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0xa00 flags=0x0000]
Jan 16 18:11:16 Silmeria kernel: amdgpu 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0xf00 flags=0x0000]
Jan 16 18:11:16 Silmeria kernel: amdgpu 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0x400 flags=0x0000]
Jan 16 18:11:16 Silmeria kernel: amdgpu 0000:0a:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0x500 flags=0x0000]
Jan 16 18:11:16 Silmeria kernel: amd_iommu_restart_log: 1430 callbacks suppressed
Jan 16 18:11:16 Silmeria kernel: AMD-Vi: IOMMU Event log restarting
Jan 16 18:11:16 Silmeria kernel: AMD-Vi: IOMMU Event log restarting
Jan 16 18:11:16 Silmeria kernel: AMD-Vi: IOMMU Event log restarting
Jan 16 18:11:16 Silmeria kernel: AMD-Vi: IOMMU Event log restarting
Jan 16 18:11:16 Silmeria kernel: AMD-Vi: IOMMU Event log restarting
Jan 16 18:11:16 Silmeria kernel: AMD-Vi: IOMMU Event log restarting
Jan 16 18:11:16 Silmeria kernel: AMD-Vi: IOMMU Event log restarting
Jan 16 18:11:16 Silmeria kernel: AMD-Vi: IOMMU Event log restarting
Jan 16 18:11:16 Silmeria kernel: AMD-Vi: IOMMU Event log restarting
Jan 16 18:11:16 Silmeria kernel: AMD-Vi: IOMMU Event log restarting
Without -nkvo
, the model operates normally: ./main -m ~/KoboldCpp/models/mixtral-instruct-8x7b-q4k-small.gguf -c 32768 -ngl 2 -p "The quick brown fox jumps over " -n 128
system_info: n_threads = 8 / 16 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling:
repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp
generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0
The quick brown fox jumps over lazy dog.
It’s a phrase we learned to type in elementary school, and one I’ve never forgotten because it includes every letter of the alphabet – A to Z!
But there are other phrases that include all the
I think this is a llama.cpp bug. Can anyone else reproduce?