|
| 1 | +## Summary |
| 2 | +Phi-4-mini Instruct (3.8B) is a newly released version of the popular Phi-4 model developed by Microsoft. |
| 3 | + |
| 4 | +## Instructions |
| 5 | + |
| 6 | +Phi-4-mini uses the same example code as Llama, while the checkpoint, model params, and tokenizer are different. Please see the [Llama README page](../llama/README.md) for details. |
| 7 | + |
| 8 | +All commands for exporting and running Llama on various backends should also be applicable to Phi-4-mini, by swapping the following args: |
| 9 | +``` |
| 10 | +--model phi_4_mini |
| 11 | +--params examples/models/phi-4-mini/config.json |
| 12 | +--checkpoint <path-to-meta-checkpoint> |
| 13 | +``` |
| 14 | + |
| 15 | +### Generate the Checkpoint |
| 16 | +The original checkpoint can be obtained from HuggingFace: |
| 17 | +``` |
| 18 | +huggingface-cli download microsoft/Phi-4-mini-instruct |
| 19 | +``` |
| 20 | + |
| 21 | +We then convert it to Meta's checkpoint format: |
| 22 | +``` |
| 23 | +python examples/models/phi-4-mini/convert_weights.py <path-to-checkpoint-dir> <output-path> |
| 24 | +``` |
| 25 | + |
| 26 | +### Example export and run |
| 27 | +Here is an basic example for exporting and running Phi-4-mini, although please refer to [Llama README page](../llama/README.md) for more advanced usage. |
| 28 | + |
| 29 | +Export to XNNPack, no quantization: |
| 30 | +``` |
| 31 | +# No quantization |
| 32 | +# Set these paths to point to the downloaded files |
| 33 | +PHI_CHECKPOINT=path/to/checkpoint.pth |
| 34 | +
|
| 35 | +python -m examples.models.llama.export_llama \ |
| 36 | + --model phi_4_mini \ |
| 37 | + --checkpoint "${PHI_CHECKPOINT=path/to/checkpoint.pth:?}" \ |
| 38 | + --params examples/models/phi-4-mini/config.json \ |
| 39 | + -kv \ |
| 40 | + --use_sdpa_with_kv_cache \ |
| 41 | + -d fp32 \ |
| 42 | + -X \ |
| 43 | + --metadata '{"get_bos_id":151643, "get_eos_ids":[151643]}' \ |
| 44 | + --output_name="phi-4-mini.pte" |
| 45 | + --verbose |
| 46 | +``` |
| 47 | + |
| 48 | +Run using the executor runner: |
| 49 | +``` |
| 50 | +# Currently a work in progress, just need to enable HuggingFace json tokenizer in C++. |
| 51 | +# In the meantime, can run with an example Python runner with pybindings: |
| 52 | +
|
| 53 | +python -m examples.models.llama.runner.native |
| 54 | + --model phi_4_mini |
| 55 | + --pte <path-to-pte> |
| 56 | + -kv |
| 57 | + --tokenizer <path-to-tokenizer>/tokenizer.json |
| 58 | + --tokenizer_config <path-to_tokenizer>/tokenizer_config.json |
| 59 | + --prompt "What is in a california roll?" |
| 60 | + --params examples/models/phi-4-mini/config.json |
| 61 | + --max_len 64 |
| 62 | + --temperature 0 |
| 63 | +``` |
0 commit comments