Skip to content

Quantization of Q4_K and Q5_K fail with "illegal hardware instruction" #3279

Closed
@thilomichael

Description

@thilomichael

When I try to quantize any model (e.g., llama-2-7b-chat) with the command

$ ./quantize models/llama-2-7b-chat-f16.gguf models/test.gguf Q5_K

I receive an error "illegal hardware instruction"

[   1/ 291]                    token_embd.weight - [ 4096, 32000,     1,     1], type =    f16, quantizing to q5_K .. [1]    92878 illegal hardware instruction  ./quantize models/llama-2-7b-chat-f16.gguf  Q5_K

I'm using a MacBook Pro M1 with 32 GB of RAM. I tried compiling llama.cpp with METAL support and without, but with both configurations I've got this error. I already checked if my quantization binary is built for the correct architecture:

$ file quantize
quantize: Mach-O 64-bit executable arm64

I tried to find out which instruction is not supported and used lldb and got the following output:

* thread #2, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
    frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
quantize`quantize_row_q5_K_reference:
->  0x10006de40 <+924>: .long  0x1e011a40                ; unknown opcode
    0x10006de44 <+928>: fcsel  s2, s18, s2, gt
    0x10006de48 <+932>: fcmp   s20, s1
    0x10006de4c <+936>: fcsel  s1, s20, s1, mi
  thread #3, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
    frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
quantize`quantize_row_q5_K_reference:
->  0x10006de40 <+924>: .long  0x1e011a40                ; unknown opcode
    0x10006de44 <+928>: fcsel  s2, s18, s2, gt
    0x10006de48 <+932>: fcmp   s20, s1
    0x10006de4c <+936>: fcsel  s1, s20, s1, mi
Target 0: (quantize) stopped.

Does anyone else have this problem? I've already tried redownloading llama, converting the weights (through convert.py) but always run into that problem. I get the error for both Q4_K (which should be the same as Q4_K_M) and Q5_K. Interestingly, for Q6_K, Q8_0, Q4_0, Q4_1, etc. everything runs fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions