Skip to content

Add support for DBRX models: dbrx-base and dbrx-instruct #6344

Closed
@maziyarpanahi

Description

@maziyarpanahi

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

Databricks just released 2 new models called DBRX (base and instruct). They have their own architecture:

{
  "architectures": [
    "DbrxForCausalLM"
  ],
  "attn_config": {
    "clip_qkv": 8,
    "kv_n_heads": 8,
    "model_type": "",
    "rope_theta": 500000
  },
  "auto_map": {
    "AutoConfig": "configuration_dbrx.DbrxConfig",
    "AutoModelForCausalLM": "modeling_dbrx.DbrxForCausalLM"
  },
  "d_model": 6144,
  "emb_pdrop": 0.0,
  "ffn_config": {
    "ffn_hidden_size": 10752,
    "model_type": "",
    "moe_jitter_eps": 0,
    "moe_loss_weight": 0.05,
    "moe_num_experts": 16,
    "moe_top_k": 4
  },
  "initializer_range": 0.02,
  "max_seq_len": 32768,
  "model_type": "dbrx",
  "n_heads": 48,
  "n_layers": 40,
  "output_router_logits": false,
  "resid_pdrop": 0.0,
  "router_aux_loss_coef": 0.05,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "vocab_size": 100352
}

Motivation

These models are superior to the predecessors like Llama-2 or Mixtral (even though they are larger), the community can really benefit from these two and the fine-tuned models that come after.

https://huggingface.co/databricks/dbrx-instruct

Possible Implementation

If you have an idea as to how it can be implemented, please write a detailed description. Feel free to give links to external sources or share visuals that might be helpful to understand the details better.

python llama.cpp/convert-hf-to-gguf.py

Traceback (most recent call last):
  File "/llama.cpp/convert-hf-to-gguf.py", line 2099, in <module>
    main()
  File "/llama.cpp/convert-hf-to-gguf.py", line 2079, in main
    model_class = Model.from_model_architecture(hparams["architectures"][0])
  File "/llama.cpp/convert-hf-to-gguf.py", line 215, in from_model_architecture
    raise NotImplementedError(f'Architecture {arch!r} not supported!') from None
NotImplementedError: Architecture 'DbrxForCausalLM' not supported!

python llama.cpp/convert.py

  File "/llama.cpp/convert.py", line 1486, in <module>
    main()
  File "/llama.cpp/convert.py", line 1422, in main
    model_plus = load_some_model(args.model)
  File "/llama.cpp/convert.py", line 1291, in load_some_model
    model_plus = merge_multifile_models(models_plus)
  File "/llama.cpp/convert.py", line 747, in merge_multifile_models
    model = merge_sharded([mp.model for mp in models_plus])
  File "/llama.cpp/convert.py", line 726, in merge_sharded
    return {name: convert(name) for name in names}
  File "/llama.cpp/convert.py", line 726, in <dictcomp>
    return {name: convert(name) for name in names}
  File "/llama.cpp/convert.py", line 701, in convert
    lazy_tensors: list[LazyTensor] = [model[name] for model in models]
  File "/llama.cpp/convert.py", line 701, in <listcomp>
    lazy_tensors: list[LazyTensor] = [model[name] for model in models]
KeyError: 'transformer.blocks.0.ffn.experts.mlp.w1'

Dbrx is a mixture-of-experts model, which each FFN is divided into 16 experts and only 4 are activated at any given time. We build on MegaBlocks
https://github.com/databricks/megablocks

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmodelModel specific

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions