Quantization of Q4_K and Q5_K fail with "illegal hardware instruction"

When I try to quantize any model (e.g., llama-2-7b-chat) with the command
```bash
$ ./quantize models/llama-2-7b-chat-f16.gguf models/test.gguf Q5_K
```
I receive an error "illegal hardware instruction"
```
[   1/ 291]                    token_embd.weight - [ 4096, 32000,     1,     1], type =    f16, quantizing to q5_K .. [1]    92878 illegal hardware instruction  ./quantize models/llama-2-7b-chat-f16.gguf  Q5_K
```

I'm using a MacBook Pro M1 with 32 GB of RAM. I tried compiling llama.cpp with METAL support and without, but with both configurations I've got this error. I already checked if my quantization binary is built for the correct architecture:

```bash
$ file quantize
quantize: Mach-O 64-bit executable arm64
```

I tried to find out *which* instruction is not supported and used lldb and got the following output:

```
* thread #2, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
    frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
quantize`quantize_row_q5_K_reference:
->  0x10006de40 <+924>: .long  0x1e011a40                ; unknown opcode
    0x10006de44 <+928>: fcsel  s2, s18, s2, gt
    0x10006de48 <+932>: fcmp   s20, s1
    0x10006de4c <+936>: fcsel  s1, s20, s1, mi
  thread #3, stop reason = EXC_BAD_INSTRUCTION (code=1, subcode=0x1e011a40)
    frame #0: 0x000000010006de40 quantize`quantize_row_q5_K_reference + 924
quantize`quantize_row_q5_K_reference:
->  0x10006de40 <+924>: .long  0x1e011a40                ; unknown opcode
    0x10006de44 <+928>: fcsel  s2, s18, s2, gt
    0x10006de48 <+932>: fcmp   s20, s1
    0x10006de4c <+936>: fcsel  s1, s20, s1, mi
Target 0: (quantize) stopped.
```

Does anyone else have this problem? I've already tried redownloading llama, converting the weights (through `convert.py`) but always run into that problem. I get the error for both `Q4_K` (which should be the same as `Q4_K_M`) and `Q5_K`. Interestingly, for `Q6_K`, `Q8_0`, `Q4_0`, `Q4_1`, etc. everything runs fine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quantization of Q4_K and Q5_K fail with "illegal hardware instruction" #3279

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quantization of Q4_K and Q5_K fail with "illegal hardware instruction" #3279

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions