You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/dreambooth/README_flux.md
+41-8Lines changed: 41 additions & 8 deletions
Original file line number
Diff line number
Diff line change
@@ -8,8 +8,10 @@ The `train_dreambooth_flux.py` script shows how to implement the training proced
8
8
>
9
9
> Flux can be quite expensive to run on consumer hardware devices and as a result finetuning it comes with high memory requirements -
10
10
> a LoRA with a rank of 16 (w/ all components trained) can exceed 40GB of VRAM for training.
11
-
> For more tips & guidance on training on a resource-constrained device please visit [`@bghira`'s guide](https://github.com/bghira/SimpleTuner/blob/main/documentation/quickstart/FLUX.md)
12
11
12
+
> For more tips & guidance on training on a resource-constrained device and general good practices please check out these great guides and trainers for FLUX:
@@ -120,15 +124,23 @@ To better track our training experiments, we're using the following flags in the
120
124
> [!NOTE]
121
125
> If you want to train using long prompts with the T5 text encoder, you can use `--max_sequence_length` to set the token limit. The default is 77, but it can be increased to as high as 512. Note that this will use more resources and may slow down the training in some cases.
122
126
123
-
> [!TIP]
124
-
> You can pass `--use_8bit_adam` to reduce the memory requirements of training. Make sure to install `bitsandbytes` if you want to do so.
125
-
126
127
## LoRA + DreamBooth
127
128
128
129
[LoRA](https://huggingface.co/docs/peft/conceptual_guides/adapter#low-rank-adaptation-lora) is a popular parameter-efficient fine-tuning technique that allows you to achieve full-finetuning like performance but with a fraction of learnable parameters.
129
130
130
131
Note also that we use PEFT library as backend for LoRA training, make sure to have `peft>=0.6.0` installed in your environment.
131
132
133
+
### Prodigy Optimizer
134
+
Prodigy is an adaptive optimizer that dynamically adjusts the learning rate learned parameters based on past gradients, allowing for more efficient convergence.
135
+
By using prodigy we can "eliminate" the need for manual learning rate tuning. read more [here](https://huggingface.co/blog/sdxl_lora_advanced_script#adaptive-optimizers).
136
+
137
+
to use prodigy, specify
138
+
```bash
139
+
--optimizer="prodigy"
140
+
```
141
+
> [!TIP]
142
+
> When using prodigy it's generally good practice to set- `--learning_rate=1.0`
As mentioned, Flux Dreambooth LoRA training is very memory intensive Here are some options (some still experimental) for a more memory efficient training.
213
+
### Image Resolution
214
+
An easy way to mitigate some of the memory requirements is through `--resolution`. `--resolution` refers to the resolution for input images, all the images in the train/validation dataset are resized to this.
215
+
Note that by default, images are resized to resolution of 512, but it's good to keep in mind in case you're accustomed to training on higher resolutions.
216
+
### Gradient Checkpointing and Accumulation
217
+
*`--gradient accumulation` refers to the number of updates steps to accumulate before performing a backward/update pass.
218
+
by passing a value > 1 you can reduce the amount of backward/update passes and hence also memory reqs.
219
+
* with `--gradient checkpointing` we can save memory by not storing all intermediate activations during the forward pass.
220
+
Instead, only a subset of these activations (the checkpoints) are stored and the rest is recomputed as needed during the backward pass. Note that this comes at the expanse of a slower backward pass.
221
+
### 8-bit-Adam Optimizer
222
+
When training with `AdamW`(doesn't apply to `prodigy`) You can pass `--use_8bit_adam` to reduce the memory requirements of training.
223
+
Make sure to install `bitsandbytes` if you want to do so.
224
+
### latent caching
225
+
When training w/o validation runs, we can pre-encode the training images with the vae, and then delete it to free up some memory.
226
+
to enable `latent_caching`, first, use the version in [this PR](https://github.com/huggingface/diffusers/blob/1b195933d04e4c8281a2634128c0d2d380893f73/examples/dreambooth/train_dreambooth_lora_flux.py), and then pass `--cache_latents`
194
227
## Other notes
195
-
Thanks to `bghira` for their help with reviewing & insight sharing ♥️
228
+
Thanks to `bghira`and `ostris`for their help with reviewing & insight sharing ♥️
0 commit comments