Skip to content

Commit 00c76f6

Browse files
stevhliusayakpaul
andauthored
[docs] Textual inversion inference (#3473)
* add textual inversion inference to docs * add to toctree --------- Co-authored-by: Sayak Paul <[email protected]>
1 parent e343443 commit 00c76f6

File tree

2 files changed

+82
-0
lines changed

2 files changed

+82
-0
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@
4444
title: Text-guided image-inpainting
4545
- local: using-diffusers/depth2img
4646
title: Text-guided depth-to-image
47+
- local: using-diffusers/textual_inversion_inference
48+
title: Textual inversion
4749
- local: using-diffusers/reusing_seeds
4850
title: Improve image quality with deterministic generation
4951
- local: using-diffusers/reproducibility
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# Textual inversion
2+
3+
[[open-in-colab]]
4+
5+
The [`StableDiffusionPipeline`] supports textual inversion, a technique that enables a model like Stable Diffusion to learn a new concept from just a few sample images. This gives you more control over the generated images and allows you to tailor the model towards specific concepts. You can get started quickly with a collection of community created concepts in the [Stable Diffusion Conceptualizer](https://huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer).
6+
7+
This guide will show you how to run inference with textual inversion using a pre-learned concept from the Stable Diffusion Conceptualizer. If you're interested in teaching a model new concepts with textual inversion, take a look at the [Textual Inversion](./training/text_inversion) training guide.
8+
9+
Login to your Hugging Face account:
10+
11+
```py
12+
from huggingface_hub import notebook_login
13+
14+
notebook_login()
15+
```
16+
17+
Import the necessary libraries, and create a helper function to visualize the generated images:
18+
19+
```py
20+
import os
21+
import torch
22+
23+
import PIL
24+
from PIL import Image
25+
26+
from diffusers import StableDiffusionPipeline
27+
from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer
28+
29+
30+
def image_grid(imgs, rows, cols):
31+
assert len(imgs) == rows * cols
32+
33+
w, h = imgs[0].size
34+
grid = Image.new("RGB", size=(cols * w, rows * h))
35+
grid_w, grid_h = grid.size
36+
37+
for i, img in enumerate(imgs):
38+
grid.paste(img, box=(i % cols * w, i // cols * h))
39+
return grid
40+
```
41+
42+
Pick a Stable Diffusion checkpoint and a pre-learned concept from the [Stable Diffusion Conceptualizer](https://huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer):
43+
44+
```py
45+
pretrained_model_name_or_path = "runwayml/stable-diffusion-v1-5"
46+
repo_id_embeds = "sd-concepts-library/cat-toy"
47+
```
48+
49+
Now you can load a pipeline, and pass the pre-learned concept to it:
50+
51+
```py
52+
pipeline = StableDiffusionPipeline.from_pretrained(pretrained_model_name_or_path, torch_dtype=torch.float16).to("cuda")
53+
54+
pipeline.load_textual_inversion(repo_id_embeds)
55+
```
56+
57+
Create a prompt with the pre-learned concept by using the special placeholder token `<cat-toy>`, and choose the number of samples and rows of images you'd like to generate:
58+
59+
```py
60+
prompt = "a grafitti in a favela wall with a <cat-toy> on it"
61+
62+
num_samples = 2
63+
num_rows = 2
64+
```
65+
66+
Then run the pipeline (feel free to adjust the parameters like `num_inference_steps` and `guidance_scale` to see how they affect image quality), save the generated images and visualize them with the helper function you created at the beginning:
67+
68+
```py
69+
all_images = []
70+
for _ in range(num_rows):
71+
images = pipe(prompt, num_images_per_prompt=num_samples, num_inference_steps=50, guidance_scale=7.5).images
72+
all_images.extend(images)
73+
74+
grid = image_grid(all_images, num_samples, num_rows)
75+
grid
76+
```
77+
78+
<div class="flex justify-center">
79+
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/textual_inversion_inference.png">
80+
</div>

0 commit comments

Comments
 (0)