Add support for DBRX models: dbrx-base and dbrx-instruct

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Feature Description

Databricks just released 2 new models called DBRX (base and instruct). They have their own architecture: 
```json
{
  "architectures": [
    "DbrxForCausalLM"
  ],
  "attn_config": {
    "clip_qkv": 8,
    "kv_n_heads": 8,
    "model_type": "",
    "rope_theta": 500000
  },
  "auto_map": {
    "AutoConfig": "configuration_dbrx.DbrxConfig",
    "AutoModelForCausalLM": "modeling_dbrx.DbrxForCausalLM"
  },
  "d_model": 6144,
  "emb_pdrop": 0.0,
  "ffn_config": {
    "ffn_hidden_size": 10752,
    "model_type": "",
    "moe_jitter_eps": 0,
    "moe_loss_weight": 0.05,
    "moe_num_experts": 16,
    "moe_top_k": 4
  },
  "initializer_range": 0.02,
  "max_seq_len": 32768,
  "model_type": "dbrx",
  "n_heads": 48,
  "n_layers": 40,
  "output_router_logits": false,
  "resid_pdrop": 0.0,
  "router_aux_loss_coef": 0.05,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 100352
}
```

# Motivation

These models are superior to the predecessors like Llama-2 or Mixtral (even though they are larger), the community can really benefit from these two and the fine-tuned models that come after.

https://huggingface.co/databricks/dbrx-instruct

# Possible Implementation

If you have an idea as to how it can be implemented, please write a detailed description. Feel free to give links to external sources or share visuals that might be helpful to understand the details better.



python llama.cpp/convert-hf-to-gguf.py
```
Traceback (most recent call last):
  File "/llama.cpp/convert-hf-to-gguf.py", line 2099, in <module>
    main()
  File "/llama.cpp/convert-hf-to-gguf.py", line 2079, in main
    model_class = Model.from_model_architecture(hparams["architectures"][0])
  File "/llama.cpp/convert-hf-to-gguf.py", line 215, in from_model_architecture
    raise NotImplementedError(f'Architecture {arch!r} not supported!') from None
NotImplementedError: Architecture 'DbrxForCausalLM' not supported!
```

python llama.cpp/convert.py
```
  File "/llama.cpp/convert.py", line 1486, in <module>
    main()
  File "/llama.cpp/convert.py", line 1422, in main
    model_plus = load_some_model(args.model)
  File "/llama.cpp/convert.py", line 1291, in load_some_model
    model_plus = merge_multifile_models(models_plus)
  File "/llama.cpp/convert.py", line 747, in merge_multifile_models
    model = merge_sharded([mp.model for mp in models_plus])
  File "/llama.cpp/convert.py", line 726, in merge_sharded
    return {name: convert(name) for name in names}
  File "/llama.cpp/convert.py", line 726, in <dictcomp>
    return {name: convert(name) for name in names}
  File "/llama.cpp/convert.py", line 701, in convert
    lazy_tensors: list[LazyTensor] = [model[name] for model in models]
  File "/llama.cpp/convert.py", line 701, in <listcomp>
    lazy_tensors: list[LazyTensor] = [model[name] for model in models]
KeyError: 'transformer.blocks.0.ffn.experts.mlp.w1'
```


> Dbrx is a mixture-of-experts model, which each FFN is divided into 16 experts and only 4 are activated at any given time. We build on MegaBlocks
https://github.com/databricks/megablocks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for DBRX models: dbrx-base and dbrx-instruct #6344

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add support for DBRX models: dbrx-base and dbrx-instruct #6344

Description

Prerequisites

Feature Description

Motivation

Possible Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions