Skip to content

[Falcon] Attempting to run Falcon-180B Q5/6 give "illegal character" #3484

Closed
@zgiles

Description

@zgiles

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

I'm attempting to run llama.cpp, latest master, with TheBloke's Falcon 180B Q5/Q6 quantized GGUF models, but it errors out with "invalid character".
I'm unable to find any issues about this online anywhere.
Another system of mind causes the same problem, and a buddy's system does as well.
llama.cpp functions normally on other models, such as Llama2, WizardLM, etc.

The downloaded GGUF file works with "text-generation-webui" so it is functioning, and verified as a good copy by others in the community.

Current Behavior

$ ./main -t 8 -m ../falcon-180b-chat.Q5_K_M.gguf --color -c 4096 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "USER: Write a story about llamas. ASSISTANT:"
# ( OR any number of parameters, just -m <model> is enough )
...
< Many Tensors >
...
lama_model_loader: - tensor  640:          blk.79.attn_norm.weight f32      [ 14848,     1,     1,     1 ]
llama_model_loader: - tensor  641:           blk.79.ffn_down.weight q6_K     [ 59392, 14848,     1,     1 ]
llama_model_loader: - tensor  642:                 output_norm.bias f32      [ 14848,     1,     1,     1 ]                                                                                                                                   
llama_model_loader: - tensor  643:               output_norm.weight f32      [ 14848,     1,     1,     1 ]                                                                                                                                   
llama_model_loader: - kv   0:                       general.architecture str                                                                                                                                                                  
llama_model_loader: - kv   1:                               general.name str                               
llama_model_loader: - kv   2:                      falcon.context_length u32                                                                                                                                                                  
llama_model_loader: - kv   3:                  falcon.tensor_data_layout str                                           
llama_model_loader: - kv   4:                    falcon.embedding_length u32                                           
llama_model_loader: - kv   5:                 falcon.feed_forward_length u32                               
llama_model_loader: - kv   6:                         falcon.block_count u32     
llama_model_loader: - kv   7:                falcon.attention.head_count u32     
llama_model_loader: - kv   8:             falcon.attention.head_count_kv u32     
llama_model_loader: - kv   9:        falcon.attention.layer_norm_epsilon f32     
llama_model_loader: - kv  10:                          general.file_type u32     
llama_model_loader: - kv  11:                       tokenizer.ggml.model str     
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr     
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr     
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr     
llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr     
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32     
llama_model_loader: - kv  17:               general.quantization_version u32     
llama_model_loader: - type  f32:  322 tensors
llama_model_loader: - type q8_0:    1 tensors
llama_model_loader: - type q5_K:  201 tensors
llama_model_loader: - type q6_K:  120 tensors
error loading model: invalid character
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '../falcon-180b-chat.Q5_K_M.gguf'
main: error: unable to load model

Happy to provide longer output, but it was pretty standard model shapes/sizes ahead of the loader and error.

Environment and Context

Dell R740xd, 640GB RAM, Skylake processors Xeon Silver 4112 CPU @ 2.60GHz, Ubuntu Focal 20.04,

$ git log | head -1
commit 019ba1dcd0c7775a5ac0f7442634a330eb0173cc
$ shasum -a 256 ../falcon-180b-chat.Q5_K_M.gguf 
e49e65f34b807d7cdae33d91ce8bd7610f87cd534a2d17ef965c6cf6b03bf3d8  ../falcon-180b-chat.Q5_K_M.gguf

Please let me know if this is already known, I can't seem to find it, and/or if I can help repo somehow. Thx

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions