Skip to content

[SD-XL] Add inpainting #4098

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 14, 2023
Merged

[SD-XL] Add inpainting #4098

merged 8 commits into from
Jul 14, 2023

Conversation

patrickvonplaten
Copy link
Contributor

@patrickvonplaten patrickvonplaten commented Jul 14, 2023

SD-XL inpainting

This PR solves: #4080 and is ready for a review.

Inpainting works well for both the vanilla case and the "Ensemble of Expert Denoisers case".

You can try the following to see for yourself:

Vanilla inpainting:

import torch
from diffusers import StableDiffusionXLInpaintPipeline
from diffusers.utils import load_image

pipe = StableDiffusionXLInpaintPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")

prompt = "A red cat sitting on a bench"
image = pipe(prompt=prompt, image=init_image, mask_image=mask_image, num_inference_steps=50, strength=0.80).images[0]

Ensemble of Expert of denoisers

which should give slightly better quality:

from diffusers import StableDiffusionXLInpaintPipeline
from diffusers.utils import load_image

pipe = StableDiffusionXLInpaintPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")

refiner = StableDiffusionXLInpaintPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-0.9",
    text_encoder_2=pipe.text_encoder_2,
    vae=pipe.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner.to("cuda")

img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")

prompt = "A red cat sitting on a bench"
num_inference_steps = 75
high_noise_frac = 0.7

image = pipe(
    prompt=prompt,
    image=init_image,
    mask_image=mask_image,
    num_inference_steps=num_inference_steps,
    strength=0.80,
    denoising_start=high_noise_frac,
    output_type="latent",
).images
image = refiner(
    prompt=prompt,
    image=image,
    mask_image=mask_image,
    num_inference_steps=num_inference_steps,
    denoising_start=high_noise_frac,
).images[0]

aaa

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 14, 2023

The documentation is not available anymore as the PR was closed or merged.

@gkorepanov
Copy link

Hi, is there already a SD XL checkpoint with unet having 9 channels? Seems like no specific inpainting model was released for SD XL, but without it inpainting results are meaningless (I mean there is little to no semantic match between inpainted regions and present regions in generated images)

@AmericanPresidentJimmyCarter
Copy link
Contributor

Is this the same method as InpaintLegacy in SD? There are now 3 inpainting methods for LDM, the "InpaintLegacy" method, the model using extra channels from RunwayML, and PSLD. I think we should maintain consistent naming.

@AmericanPresidentJimmyCarter
Copy link
Contributor

And yes, I agree with @gkorepanov , the "InpaintLegacy" method is more or less useless.

@patrickvonplaten
Copy link
Contributor Author

StableDiffusionInpaintPipelineLegacy is deprecated and will be removed. Everything you were able to do with StableDiffusionInpaintPipelineLegacy you can now do with StableDiffusionInpaintPipeline

In that sense there will only be one "true" StableDiffusionXLInpaintPipeline

@adhikjoshi
Copy link

As inpaiting checkpoint isn't there, does it affects quality in general?

@patrickvonplaten
Copy link
Contributor Author

As inpaiting checkpoint isn't there, does it affects quality in general?

Works pretty well for me for now, I recommend making sure to pass strength=0.7 or strength=0.8.

I think the checkpoint will however have problems when you want to replace the masked area with something very different to what was there before.

from diffusers.utils import load_image

pipe = StableDiffusionXLImg2ImgPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-refiner-0.9", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really intended to use the refiner model for general img2img? I've been trying to understand this - I've also seen it here for example - but I think I am missing something. My understanding is that the refiner model is intended as a kind of de-noising and/or fidelity-increasing step and it isn't good at generating the kind of baseline content of the image. If that's correct, feels like it'd perform poorly for img2img with lower strength values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use both! The refiner might be better suited for images that look already like the prompt which is the case here. We should maybe improve the docs after the official release.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha - thanks for clarifying!

@@ -981,8 +981,6 @@ def __call__(
generator,
do_classifier_free_guidance,
)
init_image = init_image.to(device=device, dtype=masked_image_latents.dtype)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never used actually and was a copy-paste bug I think

@patrickvonplaten patrickvonplaten merged commit b024ebb into main Jul 14, 2023
orpatashnik pushed a commit to orpatashnik/diffusers that referenced this pull request Aug 1, 2023
* Add more

* more

* up

* Get ensemble of expert denoisers working

* Fix code

* add tests

* up
orpatashnik pushed a commit to orpatashnik/diffusers that referenced this pull request Aug 1, 2023
* Add more

* more

* up

* Get ensemble of expert denoisers working

* Fix code

* add tests

* up
orpatashnik pushed a commit to orpatashnik/diffusers that referenced this pull request Aug 1, 2023
* Add more

* more

* up

* Get ensemble of expert denoisers working

* Fix code

* add tests

* up
@kashif kashif deleted the add_inpaint_sd_xl branch September 11, 2023 19:07
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* Add more

* more

* up

* Get ensemble of expert denoisers working

* Fix code

* add tests

* up
AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* Add more

* more

* up

* Get ensemble of expert denoisers working

* Fix code

* add tests

* up
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants