[model] add llama4 #7611

hiyouga · 2025-04-06T05:10:52Z

What does this PR do?

Install

# build on this commit
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics,deepspeed]"
pip install "transformers>=4.51.1"

These tests were done with the LoRA method and text & multimodal interleaved data setups on 8 * L20 48G GPUs.

Test recipes

### model
model_name_or_path: meta-llama/Llama-4-Scout-17B-16E-Instruct
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all
deepspeed: examples/deepspeed/ds_z3_config.json  # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]

### dataset
dataset: mllm_demo,identity,alpaca_en_demo
template: llama4
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: saves/llama4-8b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 2
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

### eval
# eval_dataset: alpaca_en_demo
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500

Before submitting

Did you read the contributor guideline?
Did you write any new necessary tests?

add llama4

da971c3

hiyouga force-pushed the hiyouga/llama4 branch from 877da11 to da971c3 Compare April 6, 2025 05:31

hiyouga merged commit 6c200fd into main Apr 6, 2025
12 checks passed

hiyouga added the solved This problem has been already solved label Apr 6, 2025

hiyouga deleted the hiyouga/llama4 branch April 6, 2025 05:42

hiyouga mentioned this pull request Apr 6, 2025

fix llama4 training huggingface/transformers#37319

Merged

5 tasks

DachengLi1 mentioned this pull request Apr 6, 2025

llama4 sft fail #7615

Open

1 task

hiyouga mentioned this pull request Apr 7, 2025

Model Integration Pipeline #7607

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model] add llama4 #7611

[model] add llama4 #7611

hiyouga commented Apr 6, 2025 •

edited

Loading

[model] add llama4 #7611

[model] add llama4 #7611

Conversation

hiyouga commented Apr 6, 2025 • edited Loading

What does this PR do?

Install

Test recipes

Before submitting

hiyouga commented Apr 6, 2025 •

edited

Loading