Skip to content

[docs] Textual inversion inference #3473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@
title: Text-guided image-inpainting
- local: using-diffusers/depth2img
title: Text-guided depth-to-image
- local: using-diffusers/textual_inversion_inference
title: Textual inversion
- local: using-diffusers/reusing_seeds
title: Improve image quality with deterministic generation
- local: using-diffusers/reproducibility
Expand Down
80 changes: 80 additions & 0 deletions docs/source/en/using-diffusers/textual_inversion_inference.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Textual inversion

[[open-in-colab]]

The [`StableDiffusionPipeline`] supports textual inversion, a technique that enables a model like Stable Diffusion to learn a new concept from just a few sample images. This gives you more control over the generated images and allows you to tailor the model towards specific concepts. You can get started quickly with a collection of community created concepts in the [Stable Diffusion Conceptualizer](https://huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer).

This guide will show you how to run inference with textual inversion using a pre-learned concept from the Stable Diffusion Conceptualizer. If you're interested in teaching a model new concepts with textual inversion, take a look at the [Textual Inversion](./training/text_inversion) training guide.

Login to your Hugging Face account:

```py
from huggingface_hub import notebook_login

notebook_login()
```

Import the necessary libraries, and create a helper function to visualize the generated images:

```py
import os
import torch

import PIL
from PIL import Image

from diffusers import StableDiffusionPipeline
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer


def image_grid(imgs, rows, cols):
assert len(imgs) == rows * cols

w, h = imgs[0].size
grid = Image.new("RGB", size=(cols * w, rows * h))
grid_w, grid_h = grid.size

for i, img in enumerate(imgs):
grid.paste(img, box=(i % cols * w, i // cols * h))
return grid
```

Pick a Stable Diffusion checkpoint and a pre-learned concept from the [Stable Diffusion Conceptualizer](https://huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer):

```py
pretrained_model_name_or_path = "runwayml/stable-diffusion-v1-5"
repo_id_embeds = "sd-concepts-library/cat-toy"
```

Now you can load a pipeline, and pass the pre-learned concept to it:

```py
pipeline = StableDiffusionPipeline.from_pretrained(pretrained_model_name_or_path, torch_dtype=torch.float16).to("cuda")

pipeline.load_textual_inversion(repo_id_embeds)
```

Create a prompt with the pre-learned concept by using the special placeholder token `<cat-toy>`, and choose the number of samples and rows of images you'd like to generate:

```py
prompt = "a grafitti in a favela wall with a <cat-toy> on it"

num_samples = 2
num_rows = 2
```

Then run the pipeline (feel free to adjust the parameters like `num_inference_steps` and `guidance_scale` to see how they affect image quality), save the generated images and visualize them with the helper function you created at the beginning:

```py
all_images = []
for _ in range(num_rows):
images = pipe(prompt, num_images_per_prompt=num_samples, num_inference_steps=50, guidance_scale=7.5).images
all_images.extend(images)

grid = image_grid(all_images, num_samples, num_rows)
grid
```

<div class="flex justify-center">
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/textual_inversion_inference.png">
</div>