Skip to content

Commit e343443

Browse files
authored
add: if entry in the dreambooth training docs. (#3472)
1 parent 8d646f2 commit e343443

File tree

1 file changed

+64
-0
lines changed

1 file changed

+64
-0
lines changed

docs/source/en/training/dreambooth.mdx

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -496,3 +496,67 @@ image.save("dog-bucket.png")
496496
```
497497

498498
You may also run inference from any of the [saved training checkpoints](#inference-from-a-saved-checkpoint).
499+
500+
## IF
501+
502+
You can use the lora and full dreambooth scripts to also train the text to image [IF model](https://huggingface.co/DeepFloyd/IF-I-XL-v1.0). A few alternative cli flags are needed due to the model size, the expected input resolution, and the text encoder conventions.
503+
504+
### LoRA Dreambooth
505+
This training configuration requires ~28 GB VRAM.
506+
507+
```sh
508+
export MODEL_NAME="DeepFloyd/IF-I-XL-v1.0"
509+
export INSTANCE_DIR="dog"
510+
export OUTPUT_DIR="dreambooth_dog_lora"
511+
512+
accelerate launch train_dreambooth_lora.py \
513+
--report_to wandb \
514+
--pretrained_model_name_or_path=$MODEL_NAME \
515+
--instance_data_dir=$INSTANCE_DIR \
516+
--output_dir=$OUTPUT_DIR \
517+
--instance_prompt="a sks dog" \
518+
--resolution=64 \ # The input resolution of the IF unet is 64x64
519+
--train_batch_size=4 \
520+
--gradient_accumulation_steps=1 \
521+
--learning_rate=5e-6 \
522+
--scale_lr \
523+
--max_train_steps=1200 \
524+
--validation_prompt="a sks dog" \
525+
--validation_epochs=25 \
526+
--checkpointing_steps=100 \
527+
--pre_compute_text_embeddings \ # Pre compute text embeddings to that T5 doesn't have to be kept in memory
528+
--tokenizer_max_length=77 \ # IF expects an override of the max token length
529+
--text_encoder_use_attention_mask # IF expects attention mask for text embeddings
530+
```
531+
532+
### Full Dreambooth
533+
Due to the size of the optimizer states, we recommend training the full XL IF model with 8bit adam.
534+
Using 8bit adam and the rest of the following config, the model can be trained in ~48 GB VRAM.
535+
536+
For full dreambooth, IF requires very low learning rates. With higher learning rates model quality will degrade.
537+
538+
```sh
539+
export MODEL_NAME="DeepFloyd/IF-I-XL-v1.0"
540+
541+
export INSTANCE_DIR="dog"
542+
export OUTPUT_DIR="dreambooth_if"
543+
544+
accelerate launch train_dreambooth.py \
545+
--pretrained_model_name_or_path=$MODEL_NAME \
546+
--instance_data_dir=$INSTANCE_DIR \
547+
--output_dir=$OUTPUT_DIR \
548+
--instance_prompt="a photo of sks dog" \
549+
--resolution=64 \ # The input resolution of the IF unet is 64x64
550+
--train_batch_size=4 \
551+
--gradient_accumulation_steps=1 \
552+
--learning_rate=1e-7 \
553+
--max_train_steps=150 \
554+
--validation_prompt "a photo of sks dog" \
555+
--validation_steps 25 \
556+
--text_encoder_use_attention_mask \ # IF expects attention mask for text embeddings
557+
--tokenizer_max_length 77 \ # IF expects an override of the max token length
558+
--pre_compute_text_embeddings \ # Pre compute text embeddings to that T5 doesn't have to be kept in memory
559+
--use_8bit_adam \ #
560+
--set_grads_to_none \
561+
--skip_save_text_encoder # do not save the full T5 text encoder with the model
562+
```

0 commit comments

Comments
 (0)