Skip to content

[docs] Simplify loading guide #2694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 4, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,8 @@
title: Low-Rank Adaptation of Large Language Models (LoRA)
title: Training
- sections:
- local: conceptual/pipeline_explained
title: Pipelines explained
- local: conceptual/philosophy
title: Philosophy
- local: conceptual/contribution
Expand Down
197 changes: 197 additions & 0 deletions docs/source/en/conceptual/pipeline_explained.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,197 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Pipelines explained

Having an easy and accessible way to use a diffusion system for inference is essential to using 🧨 Diffusers. Diffusion systems often consist of multiple components like parameterized models, tokenizers, and schedulers that interact in complex ways. That is why we designed the [`DiffusionPipeline`] to wrap the complexity of the entire diffusion system into an easy-to-use API, while remaining flexible enough to be adapted for other use cases.

This guide provides a high-level explanation of what a pipeline is, what *variants* are, and how a pipeline and all its components are loaded.

## Pipeline

A pipeline like [`StableDiffusionPipeline`] and [`StableDiffusionImg2ImgPipeline`] consist of multiple components: parameterized models (`unet`, `vae`, `text_encoder`), tokenizers, and schedulers. When you call on a pipeline for inference, these components interact with each other to generate an output. The purpose of the pipeline is to wrap the complexity of the entire diffusion system into an easy-to-use API, while remaining flexible enough to be customized for other use cases.

For instance, you can load a pipeline locally to remain anonymous and build self-contained applications. Or you could also customize what components are loaded in a pipeline. 🧨 Diffusers make it really easy for you to swap out compatible models and schedulers in a pipeline, so you can explore the balance and trade-offs between using different schedulers and models.

```python
from diffusers import DiffusionPipeline, EulerDiscreteScheduler, DPMSolverMultistepScheduler

repo_id = "runwayml/stable-diffusion-v1-5"

scheduler = EulerDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler")
stable_diffusion = DiffusionPipeline.from_pretrained(repo_id, scheduler=scheduler)
```

[`SchedulerMixin.from_pretrained`] loads the scheduler configuration file from a subfolder in the Stable Diffusion pipeline repository, and then the scheduler instance is passed to the `scheduler` argument in [`DiffusionPipeline.from_pretrained`]. This works because the [`StableDiffusionPipeline`] defines its scheduler with the `scheduler` attribute. You can't use a different keyword like `sampler` because it isn't defined in `StableDiffusionPipeline.__init__`.

### Checkpoint variants

In addition to the original pipeline checkpoints stored in a repository, there may also be *checkpoint variants*. A variant is typically checkpoint weights stored in a lower precision and lower storage data type like `fp16` or they may be non-exponential mean averaged (EMA) weights so you can resume finetuning from a checkpoint. Variants are advantageous in specific scenarios - half-precision checkpoints only requires half the bandwidth and storage - but they're so similar to the original checkpoint that you shouldn't create a new checkpoint for them. Variants have **exactly** the same serialization format and model structure as the original checkpoints. The weights have the same tensor shapes.

This means other serialization formats, such as [Safetensors](./using-diffusers/using_safetensors), are not considered checkpoint variants because their weights are identical to the original checkpoint. It may also be tempting to consider different model structures as variants, such as [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) and [`stable-diffusion-2`](https://huggingface.co/stabilityai/stable-diffusion-2). However, these checkpoints aren't considered variants because `stable-diffusion-v1-5` uses a different `CLIPTextModel` than `stable-diffusion-2`.

<Tip>

💡 When the checkpoints have identical model structures, but they were trained on different datasets and with a different training setup, they should be stored in separate repositories instead of variations (for example, [`stable-diffusion-v1-4`] and [`stable-diffusion-v1-5`]).

</Tip>

You can't use a variant stored in a different floating point type to continue training or load it on a CPU, and non-EMA variants shouldn't be used for inference.

## How pipeline loading works

As a class method, [`DiffusionPipeline.from_pretrained`] is responsible for two things:

- Download the latest version of the folder structure required for inference and cache it. If the latest folder structure is available in the local cache, [`DiffusionPipeline.from_pretrained`] reuses the cache and won't redownload the files.
- Load the cached weights into the correct pipeline [class](./api/pipelines/overview#diffusers-summary) - retrieved from the `model_index.json` file - and return an instance of it.

The pipelines underlying folder structure corresponds directly with their class instances. For example, the [`StableDiffusionPipeline`] corresponds to the folder structure in [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5).

```python
from diffusers import DiffusionPipeline

repo_id = "runwayml/stable-diffusion-v1-5"
pipeline = DiffusionPipeline.from_pretrained(repo_id)
print(pipeline)
```

You'll see pipeline is an instance of [`StableDiffusionPipeline`], which consists of seven components:

- `"feature_extractor"`: a [`~transformers.CLIPFeatureExtractor`] from 🤗 Transformers.
- `"safety_checker"`: a [component](https://github.com/huggingface/diffusers/blob/e55687e1e15407f60f32242027b7bb8170e58266/src/diffusers/pipelines/stable_diffusion/safety_checker.py#L32) for screening against harmful content.
- `"scheduler"`: an instance of [`PNDMScheduler`].
- `"text_encoder"`: a [`~transformers.CLIPTextModel`] from 🤗 Transformers.
- `"tokenizer"`: a [`~transformers.CLIPTokenizer`] from 🤗 Transformers.
- `"unet"`: an instance of [`UNet2DConditionModel`].
- `"vae"` an instance of [`AutoencoderKL`].

```json
StableDiffusionPipeline {
"feature_extractor": [
"transformers",
"CLIPFeatureExtractor"
],
"safety_checker": [
"stable_diffusion",
"StableDiffusionSafetyChecker"
],
"scheduler": [
"diffusers",
"PNDMScheduler"
],
"text_encoder": [
"transformers",
"CLIPTextModel"
],
"tokenizer": [
"transformers",
"CLIPTokenizer"
],
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}
```

Compare the components of the pipeline instance to the [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) folder structure, and you'll see there is a separate folder for each of the components in the repository:

```
.
├── feature_extractor
│   └── preprocessor_config.json
├── model_index.json
├── safety_checker
│   ├── config.json
│   └── pytorch_model.bin
├── scheduler
│   └── scheduler_config.json
├── text_encoder
│   ├── config.json
│   └── pytorch_model.bin
├── tokenizer
│   ├── merges.txt
│   ├── special_tokens_map.json
│   ├── tokenizer_config.json
│   └── vocab.json
├── unet
│   ├── config.json
│   ├── diffusion_pytorch_model.bin
└── vae
├── config.json
├── diffusion_pytorch_model.bin
```

You can access each of the components of the pipeline as an attribute to view its configuration:

```py
pipeline.tokenizer
CLIPTokenizer(
name_or_path="/root/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/39593d5650112b4cc580433f6b0435385882d819/tokenizer",
vocab_size=49408,
model_max_length=77,
is_fast=False,
padding_side="right",
truncation_side="right",
special_tokens={
"bos_token": AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
"eos_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
"unk_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
"pad_token": "<|endoftext|>",
},
)
```

Every pipeline expects a `model_index.json` file that tells the [`DiffusionPipeline`]:

- which pipeline class to load from `_class_name`
- which version of 🧨 Diffusers was used to create the model in `_diffusers_version`
- what components from which library are stored in the subfolders (`name` corresponds to the component and subfolder name, `library` corresponds to the name of the library to load the class from, and `class` corresponds to the class name)

```json
{
"_class_name": "StableDiffusionPipeline",
"_diffusers_version": "0.6.0",
"feature_extractor": [
"transformers",
"CLIPFeatureExtractor"
],
"safety_checker": [
"stable_diffusion",
"StableDiffusionSafetyChecker"
],
"scheduler": [
"diffusers",
"PNDMScheduler"
],
"text_encoder": [
"transformers",
"CLIPTextModel"
],
"tokenizer": [
"transformers",
"CLIPTokenizer"
],
"unet": [
"diffusers",
"UNet2DConditionModel"
],
"vae": [
"diffusers",
"AutoencoderKL"
]
}
```
Loading