|
| 1 | +# Textual inversion |
| 2 | + |
| 3 | +[[open-in-colab]] |
| 4 | + |
| 5 | +The [`StableDiffusionPipeline`] supports textual inversion, a technique that enables a model like Stable Diffusion to learn a new concept from just a few sample images. This gives you more control over the generated images and allows you to tailor the model towards specific concepts. You can get started quickly with a collection of community created concepts in the [Stable Diffusion Conceptualizer](https://huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer). |
| 6 | + |
| 7 | +This guide will show you how to run inference with textual inversion using a pre-learned concept from the Stable Diffusion Conceptualizer. If you're interested in teaching a model new concepts with textual inversion, take a look at the [Textual Inversion](./training/text_inversion) training guide. |
| 8 | + |
| 9 | +Login to your Hugging Face account: |
| 10 | + |
| 11 | +```py |
| 12 | +from huggingface_hub import notebook_login |
| 13 | + |
| 14 | +notebook_login() |
| 15 | +``` |
| 16 | + |
| 17 | +Import the necessary libraries, and create a helper function to visualize the generated images: |
| 18 | + |
| 19 | +```py |
| 20 | +import os |
| 21 | +import torch |
| 22 | + |
| 23 | +import PIL |
| 24 | +from PIL import Image |
| 25 | + |
| 26 | +from diffusers import StableDiffusionPipeline |
| 27 | +from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer |
| 28 | + |
| 29 | + |
| 30 | +def image_grid(imgs, rows, cols): |
| 31 | + assert len(imgs) == rows * cols |
| 32 | + |
| 33 | + w, h = imgs[0].size |
| 34 | + grid = Image.new("RGB", size=(cols * w, rows * h)) |
| 35 | + grid_w, grid_h = grid.size |
| 36 | + |
| 37 | + for i, img in enumerate(imgs): |
| 38 | + grid.paste(img, box=(i % cols * w, i // cols * h)) |
| 39 | + return grid |
| 40 | +``` |
| 41 | + |
| 42 | +Pick a Stable Diffusion checkpoint and a pre-learned concept from the [Stable Diffusion Conceptualizer](https://huggingface.co/spaces/sd-concepts-library/stable-diffusion-conceptualizer): |
| 43 | + |
| 44 | +```py |
| 45 | +pretrained_model_name_or_path = "runwayml/stable-diffusion-v1-5" |
| 46 | +repo_id_embeds = "sd-concepts-library/cat-toy" |
| 47 | +``` |
| 48 | + |
| 49 | +Now you can load a pipeline, and pass the pre-learned concept to it: |
| 50 | + |
| 51 | +```py |
| 52 | +pipeline = StableDiffusionPipeline.from_pretrained(pretrained_model_name_or_path, torch_dtype=torch.float16).to("cuda") |
| 53 | + |
| 54 | +pipeline.load_textual_inversion(repo_id_embeds) |
| 55 | +``` |
| 56 | + |
| 57 | +Create a prompt with the pre-learned concept by using the special placeholder token `<cat-toy>`, and choose the number of samples and rows of images you'd like to generate: |
| 58 | + |
| 59 | +```py |
| 60 | +prompt = "a grafitti in a favela wall with a <cat-toy> on it" |
| 61 | + |
| 62 | +num_samples = 2 |
| 63 | +num_rows = 2 |
| 64 | +``` |
| 65 | + |
| 66 | +Then run the pipeline (feel free to adjust the parameters like `num_inference_steps` and `guidance_scale` to see how they affect image quality), save the generated images and visualize them with the helper function you created at the beginning: |
| 67 | + |
| 68 | +```py |
| 69 | +all_images = [] |
| 70 | +for _ in range(num_rows): |
| 71 | + images = pipe(prompt, num_images_per_prompt=num_samples, num_inference_steps=50, guidance_scale=7.5).images |
| 72 | + all_images.extend(images) |
| 73 | + |
| 74 | +grid = image_grid(all_images, num_samples, num_rows) |
| 75 | +grid |
| 76 | +``` |
| 77 | + |
| 78 | +<div class="flex justify-center"> |
| 79 | + <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/textual_inversion_inference.png"> |
| 80 | +</div> |
0 commit comments