Skip to content

[model] add llama4 #7611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 6, 2025
Merged

[model] add llama4 #7611

merged 1 commit into from
Apr 6, 2025

Conversation

hiyouga
Copy link
Owner

@hiyouga hiyouga commented Apr 6, 2025

What does this PR do?

Install

# build on this commit
git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch,metrics,deepspeed]"
pip install "transformers>=4.51.1"

These tests were done with the LoRA method and text & multimodal interleaved data setups on 8 * L20 48G GPUs.

image

Test recipes

### model
model_name_or_path: meta-llama/Llama-4-Scout-17B-16E-Instruct
trust_remote_code: true

### method
stage: sft
do_train: true
finetuning_type: lora
lora_rank: 8
lora_target: all
deepspeed: examples/deepspeed/ds_z3_config.json  # choices: [ds_z0_config.json, ds_z2_config.json, ds_z3_config.json]

### dataset
dataset: mllm_demo,identity,alpaca_en_demo
template: llama4
cutoff_len: 2048
max_samples: 1000
overwrite_cache: true
preprocessing_num_workers: 16
dataloader_num_workers: 4

### output
output_dir: saves/llama4-8b/lora/sft
logging_steps: 10
save_steps: 500
plot_loss: true
overwrite_output_dir: true
save_only_model: false

### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 2
learning_rate: 1.0e-4
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000
resume_from_checkpoint: null

### eval
# eval_dataset: alpaca_en_demo
# val_size: 0.1
# per_device_eval_batch_size: 1
# eval_strategy: steps
# eval_steps: 500

Before submitting

@hiyouga hiyouga merged commit 6c200fd into main Apr 6, 2025
12 checks passed
@hiyouga hiyouga added the solved This problem has been already solved label Apr 6, 2025
@hiyouga hiyouga deleted the hiyouga/llama4 branch April 6, 2025 05:42
@DachengLi1 DachengLi1 mentioned this pull request Apr 6, 2025
1 task
@hiyouga hiyouga mentioned this pull request Apr 7, 2025
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
solved This problem has been already solved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant