huggingface · stevhliu · Apr 4, 2023 · Mar 15, 2023 · Mar 20, 2023 · Mar 29, 2023
diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml
@@ -93,6 +93,8 @@
     title: Low-Rank Adaptation of Large Language Models (LoRA)
   title: Training
 - sections:
+  - local: conceptual/pipeline_explained
+    title: Pipelines explained
   - local: conceptual/philosophy
     title: Philosophy
   - local: conceptual/contribution

diff --git a/docs/source/en/conceptual/pipeline_explained.mdx b/docs/source/en/conceptual/pipeline_explained.mdx
@@ -0,0 +1,197 @@
+<!--Copyright 2023 The HuggingFace Team. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
+an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
+specific language governing permissions and limitations under the License.
+-->
+
+# Pipelines explained
+
+Having an easy and accessible way to use a diffusion system for inference is essential to using 🧨 Diffusers. Diffusion systems often consist of multiple components like parameterized models, tokenizers, and schedulers that interact in complex ways. That is why we designed the [`DiffusionPipeline`] to wrap the complexity of the entire diffusion system into an easy-to-use API, while remaining flexible enough to be adapted for other use cases.
+
+This guide provides a high-level explanation of what a pipeline is, what *variants* are, and how a pipeline and all its components are loaded.
+
+## Pipeline
+
+A pipeline like [`StableDiffusionPipeline`] and [`StableDiffusionImg2ImgPipeline`] consist of multiple components: parameterized models (`unet`, `vae`, `text_encoder`), tokenizers, and schedulers. When you call on a pipeline for inference, these components interact with each other to generate an output. The purpose of the pipeline is to wrap the complexity of the entire diffusion system into an easy-to-use API, while remaining flexible enough to be customized for other use cases.
+
+For instance, you can load a pipeline locally to remain anonymous and build self-contained applications. Or you could also customize what components are loaded in a pipeline. 🧨 Diffusers make it really easy for you to swap out compatible models and schedulers in a pipeline, so you can explore the balance and trade-offs between using different schedulers and models.
+
+```python
+from diffusers import DiffusionPipeline, EulerDiscreteScheduler, DPMSolverMultistepScheduler
+
+repo_id = "runwayml/stable-diffusion-v1-5"
+
+scheduler = EulerDiscreteScheduler.from_pretrained(repo_id, subfolder="scheduler")
+stable_diffusion = DiffusionPipeline.from_pretrained(repo_id, scheduler=scheduler)
+```
+
+[`SchedulerMixin.from_pretrained`] loads the scheduler configuration file from a subfolder in the Stable Diffusion pipeline repository, and then the scheduler instance is passed to the `scheduler` argument in [`DiffusionPipeline.from_pretrained`]. This works because the [`StableDiffusionPipeline`] defines its scheduler with the `scheduler` attribute. You can't use a different keyword like `sampler` because it isn't defined in `StableDiffusionPipeline.__init__`.
+
+### Checkpoint variants
+
+In addition to the original pipeline checkpoints stored in a repository, there may also be *checkpoint variants*. A variant is typically checkpoint weights stored in a lower precision and lower storage data type like `fp16` or they may be non-exponential mean averaged (EMA) weights so you can resume finetuning from a checkpoint. Variants are advantageous in specific scenarios - half-precision checkpoints only requires half the bandwidth and storage - but they're so similar to the original checkpoint that you shouldn't create a new checkpoint for them. Variants have **exactly** the same serialization format and model structure as the original checkpoints. The weights have the same tensor shapes.
+
+This means other serialization formats, such as [Safetensors](./using-diffusers/using_safetensors), are not considered checkpoint variants because their weights are identical to the original checkpoint. It may also be tempting to consider different model structures as variants, such as [`stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) and [`stable-diffusion-2`](https://huggingface.co/stabilityai/stable-diffusion-2). However, these checkpoints aren't considered variants because `stable-diffusion-v1-5` uses a different `CLIPTextModel` than `stable-diffusion-2`.
+
+<Tip>
+
+💡 When the checkpoints have identical model structures, but they were trained on different datasets and with a different training setup, they should be stored in separate repositories instead of variations (for example, [`stable-diffusion-v1-4`] and [`stable-diffusion-v1-5`]).
+
+</Tip>
+
+You can't use a variant stored in a different floating point type to continue training or load it on a CPU, and non-EMA variants shouldn't be used for inference.
+
+## How pipeline loading works
+
+As a class method, [`DiffusionPipeline.from_pretrained`] is responsible for two things:
+
+- Download the latest version of the folder structure required for inference and cache it. If the latest folder structure is available in the local cache, [`DiffusionPipeline.from_pretrained`] reuses the cache and won't redownload the files.
+- Load the cached weights into the correct pipeline [class](./api/pipelines/overview#diffusers-summary) - retrieved from the `model_index.json` file - and return an instance of it.
+
+The pipelines underlying folder structure corresponds directly with their class instances. For example, the [`StableDiffusionPipeline`] corresponds to the folder structure in [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5).
+
+```python
+from diffusers import DiffusionPipeline
+
+repo_id = "runwayml/stable-diffusion-v1-5"
+pipeline = DiffusionPipeline.from_pretrained(repo_id)
+print(pipeline)
+```
+
+You'll see pipeline is an instance of [`StableDiffusionPipeline`], which consists of seven components:
+
+- `"feature_extractor"`: a [`~transformers.CLIPFeatureExtractor`] from 🤗 Transformers.
+- `"safety_checker"`: a [component](https://github.com/huggingface/diffusers/blob/e55687e1e15407f60f32242027b7bb8170e58266/src/diffusers/pipelines/stable_diffusion/safety_checker.py#L32) for screening against harmful content.
+- `"scheduler"`: an instance of [`PNDMScheduler`].
+- `"text_encoder"`: a [`~transformers.CLIPTextModel`] from 🤗 Transformers.
+- `"tokenizer"`: a [`~transformers.CLIPTokenizer`] from 🤗 Transformers.
+- `"unet"`: an instance of [`UNet2DConditionModel`].
+- `"vae"` an instance of [`AutoencoderKL`].
+
+```json
+StableDiffusionPipeline {
+  "feature_extractor": [
+    "transformers",
+    "CLIPFeatureExtractor"
+  ],
+  "safety_checker": [
+    "stable_diffusion",
+    "StableDiffusionSafetyChecker"
+  ],
+  "scheduler": [
+    "diffusers",
+    "PNDMScheduler"
+  ],
+  "text_encoder": [
+    "transformers",
+    "CLIPTextModel"
+  ],
+  "tokenizer": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "unet": [
+    "diffusers",
+    "UNet2DConditionModel"
+  ],
+  "vae": [
+    "diffusers",
+    "AutoencoderKL"
+  ]
+}
+```
+
+Compare the components of the pipeline instance to the [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) folder structure, and you'll see there is a separate folder for each of the components in the repository:
+
+```
+.
+├── feature_extractor
+│   └── preprocessor_config.json
+├── model_index.json
+├── safety_checker
+│   ├── config.json
+│   └── pytorch_model.bin
+├── scheduler
+│   └── scheduler_config.json
+├── text_encoder
+│   ├── config.json
+│   └── pytorch_model.bin
+├── tokenizer
+│   ├── merges.txt
+│   ├── special_tokens_map.json
+│   ├── tokenizer_config.json
+│   └── vocab.json
+├── unet
+│   ├── config.json
+│   ├── diffusion_pytorch_model.bin
+└── vae
+    ├── config.json
+    ├── diffusion_pytorch_model.bin
+```
+
+You can access each of the components of the pipeline as an attribute to view its configuration:
+
+```py
+pipeline.tokenizer
+CLIPTokenizer(
+    name_or_path="/root/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/39593d5650112b4cc580433f6b0435385882d819/tokenizer",
+    vocab_size=49408,
+    model_max_length=77,
+    is_fast=False,
+    padding_side="right",
+    truncation_side="right",
+    special_tokens={
+        "bos_token": AddedToken("<|startoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
+        "eos_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
+        "unk_token": AddedToken("<|endoftext|>", rstrip=False, lstrip=False, single_word=False, normalized=True),
+        "pad_token": "<|endoftext|>",
+    },
+)
+```
+
+Every pipeline expects a `model_index.json` file that tells the [`DiffusionPipeline`]:
+
+- which pipeline class to load from `_class_name`
+- which version of 🧨 Diffusers was used to create the model in `_diffusers_version`
+- what components from which library are stored in the subfolders (`name` corresponds to the component and subfolder name, `library` corresponds to the name of the library to load the class from, and `class` corresponds to the class name)
+
+```json
+{
+  "_class_name": "StableDiffusionPipeline",
+  "_diffusers_version": "0.6.0",
+  "feature_extractor": [
+    "transformers",
+    "CLIPFeatureExtractor"
+  ],
+  "safety_checker": [
+    "stable_diffusion",
+    "StableDiffusionSafetyChecker"
+  ],
+  "scheduler": [
+    "diffusers",
+    "PNDMScheduler"
+  ],
+  "text_encoder": [
+    "transformers",
+    "CLIPTextModel"
+  ],
+  "tokenizer": [
+    "transformers",
+    "CLIPTokenizer"
+  ],
+  "unet": [
+    "diffusers",
+    "UNet2DConditionModel"
+  ],
+  "vae": [
+    "diffusers",
+    "AutoencoderKL"
+  ]
+}
+```