[Falcon] Attempting to run Falcon-180B Q5/6 give "illegal character"

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [X] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

I'm attempting to run llama.cpp, latest master, with TheBloke's Falcon 180B Q5/Q6 quantized GGUF models, but it errors out with "invalid character".
I'm unable to find any issues about this online anywhere.
Another system of mind causes the same problem, and a buddy's system does as well.
llama.cpp functions normally on other models, such as Llama2, WizardLM, etc.

The downloaded GGUF file works with "text-generation-webui" so it is functioning, and verified as a good copy by others in the community.

# Current Behavior

```
$ ./main -t 8 -m ../falcon-180b-chat.Q5_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "USER: Write a story about llamas. ASSISTANT:"
# ( OR any number of parameters, just -m <model> is enough )
...
< Many Tensors >
...
lama_model_loader: - tensor  640:          blk.79.attn_norm.weight f32      [ 14848,     1,     1,     1 ]
llama_model_loader: - tensor  641:           blk.79.ffn_down.weight q6_K     [ 59392, 14848,     1,     1 ]
llama_model_loader: - tensor  642:                 output_norm.bias f32      [ 14848,     1,     1,     1 ]                                                                                                                                   
llama_model_loader: - tensor  643:               output_norm.weight f32      [ 14848,     1,     1,     1 ]                                                                                                                                   
llama_model_loader: - kv   0:                       general.architecture str                                                                                                                                                                  
llama_model_loader: - kv   1:                               general.name str                               
llama_model_loader: - kv   2:                      falcon.context_length u32                                                                                                                                                                  
llama_model_loader: - kv   3:                  falcon.tensor_data_layout str                                           
llama_model_loader: - kv   4:                    falcon.embedding_length u32                                           
llama_model_loader: - kv   5:                 falcon.feed_forward_length u32                               
llama_model_loader: - kv   6:                         falcon.block_count u32     
llama_model_loader: - kv   7:                falcon.attention.head_count u32     
llama_model_loader: - kv   8:             falcon.attention.head_count_kv u32     
llama_model_loader: - kv   9:        falcon.attention.layer_norm_epsilon f32     
llama_model_loader: - kv  10:                          general.file_type u32     
llama_model_loader: - kv  11:                       tokenizer.ggml.model str     
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr     
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr     
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr     
llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr     
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32     
llama_model_loader: - kv  17:               general.quantization_version u32     
llama_model_loader: - type  f32:  322 tensors
llama_model_loader: - type q8_0:    1 tensors
llama_model_loader: - type q5_K:  201 tensors
llama_model_loader: - type q6_K:  120 tensors
error loading model: invalid character
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '../falcon-180b-chat.Q5_K_M.gguf'
main: error: unable to load model

```

Happy to provide longer output, but it was pretty standard model shapes/sizes ahead of the loader and error.

# Environment and Context

Dell R740xd, 640GB RAM, Skylake processors Xeon Silver 4112 CPU @ 2.60GHz, Ubuntu Focal 20.04,

```
$ git log | head -1
commit 019ba1dcd0c7775a5ac0f7442634a330eb0173cc
```

```
$ shasum -a 256 ../falcon-180b-chat.Q5_K_M.gguf 
e49e65f34b807d7cdae33d91ce8bd7610f87cd534a2d17ef965c6cf6b03bf3d8  ../falcon-180b-chat.Q5_K_M.gguf
```

Please let me know if this is already known, I can't seem to find it, and/or if I can help repo somehow. Thx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Falcon] Attempting to run Falcon-180B Q5/6 give "illegal character" #3484

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Falcon] Attempting to run Falcon-180B Q5/6 give "illegal character" #3484

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions