Closed
Description
Describe the bug
When using the LMS scheduler with SDXL Img2Img pipeline, there is a lot of noise leftover in the image especially when strength
is closer to 0
. In other words, when the total number of performed steps is "low" (e.g. num_inference_steps=50
and strength=0.1
), the result images are unusably noisy.
Reproduction
Here's some code that first does a prompt-to-image generation, and then an image-to-image from that result with strength =0.1
. The image-to-image result looks like an intermediate latent. Note that the prompt-to-image result looks completely fine. This is reproducible with any input image - I just used a p2i gen because it felt easier to share here.
import torch
from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
from typing import cast
from diffusers import LMSDiscreteScheduler
sdxl_model = cast(StableDiffusionXLPipeline, StableDiffusionXLPipeline.from_pretrained(
'stabilityai/stable-diffusion-xl-base-1.0',
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16",
revision="76d28af79639c28a79fa5c6c6468febd3490a37e",
)).to('cuda')
sdxl_img2img_model = cast(StableDiffusionXLImg2ImgPipeline, StableDiffusionXLImg2ImgPipeline.from_pretrained(
'stabilityai/stable-diffusion-xl-base-1.0',
torch_dtype=torch.float16,
use_safetensors=True,
variant="fp16",
revision="76d28af79639c28a79fa5c6c6468febd3490a37e",
)).to('cuda')
common_config = {'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear'}
scheduler = LMSDiscreteScheduler(**common_config)
sdxl_model.scheduler = scheduler
sdxl_img2img_model.scheduler = scheduler
sdxl_model.watermark = None
generator = torch.Generator(device='cuda')
generator.manual_seed(12345)
params = {
'prompt': ['evening sunset scenery blue sky nature, glass bottle with a galaxy in it'],
'negative_prompt': ['text, watermark'],
"negative_prompt": [''],
"num_inference_steps": 50,
"guidance_scale": 7,
"width": 1024,
"height": 1024
}
sdxl_res = sdxl_model(**params, generator=generator, output_type='pil')
sdxl_img = sdxl_res.images[0]
display(sdxl_img)
img2img_params = {
'prompt': ['evening sunset scenery blue sky nature, glass bottle with a galaxy in it'],
'negative_prompt': ['text, watermark'],
"negative_prompt": [''],
"num_inference_steps": 50,
"guidance_scale": 7,
"image": sdxl_img,
"strength": 0.1
}
sdxl_img2img_res = sdxl_img2img_model(**img2img_params, generator=generator, output_type='pil')
display(sdxl_img2img_res.images[0])
Logs
No response
System Info
diffusers
version: 0.21.4- Platform: Linux-5.4.0-163-generic-x86_64-with-glibc2.31
- Python version: 3.11.5
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- Huggingface_hub version: 0.17.1
- Transformers version: 4.34.0
- Accelerate version: 0.22.0
- xFormers version: not installed
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no