Skip to content

LMS scheduler leaves a lot of noise leftover in the result image when used with SDXL Img2Img Pipeline #5630

Closed
@nhnt11

Description

@nhnt11

Describe the bug

When using the LMS scheduler with SDXL Img2Img pipeline, there is a lot of noise leftover in the image especially when strength is closer to 0. In other words, when the total number of performed steps is "low" (e.g. num_inference_steps=50 and strength=0.1), the result images are unusably noisy.

Reproduction

Here's some code that first does a prompt-to-image generation, and then an image-to-image from that result with strength =0.1. The image-to-image result looks like an intermediate latent. Note that the prompt-to-image result looks completely fine. This is reproducible with any input image - I just used a p2i gen because it felt easier to share here.

import torch
from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline
from typing import cast
from diffusers import LMSDiscreteScheduler

sdxl_model = cast(StableDiffusionXLPipeline, StableDiffusionXLPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
    revision="76d28af79639c28a79fa5c6c6468febd3490a37e",
)).to('cuda')
sdxl_img2img_model = cast(StableDiffusionXLImg2ImgPipeline, StableDiffusionXLImg2ImgPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
    revision="76d28af79639c28a79fa5c6c6468febd3490a37e",
)).to('cuda')

common_config = {'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear'}
scheduler = LMSDiscreteScheduler(**common_config)
sdxl_model.scheduler = scheduler
sdxl_img2img_model.scheduler = scheduler

sdxl_model.watermark = None
generator = torch.Generator(device='cuda')
generator.manual_seed(12345)

params = {
    'prompt': ['evening sunset scenery blue sky nature, glass bottle with a galaxy in it'],
    'negative_prompt': ['text, watermark'],
    "negative_prompt": [''],
    "num_inference_steps": 50,
    "guidance_scale": 7,
    "width": 1024,
    "height": 1024
}
sdxl_res = sdxl_model(**params, generator=generator, output_type='pil')
sdxl_img = sdxl_res.images[0]
display(sdxl_img)

img2img_params = {
    'prompt': ['evening sunset scenery blue sky nature, glass bottle with a galaxy in it'],
    'negative_prompt': ['text, watermark'],
    "negative_prompt": [''],
    "num_inference_steps": 50,
    "guidance_scale": 7,
    "image": sdxl_img,
    "strength": 0.1
}

sdxl_img2img_res = sdxl_img2img_model(**img2img_params, generator=generator, output_type='pil')

display(sdxl_img2img_res.images[0])

Image-to-Image Result:
image

Logs

No response

System Info

  • diffusers version: 0.21.4
  • Platform: Linux-5.4.0-163-generic-x86_64-with-glibc2.31
  • Python version: 3.11.5
  • PyTorch version (GPU?): 2.1.0+cu121 (True)
  • Huggingface_hub version: 0.17.1
  • Transformers version: 4.34.0
  • Accelerate version: 0.22.0
  • xFormers version: not installed
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: no

Who can help?

@yiyixuxu @patrickvonplaten

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingschedulerstaleIssues that haven't received updates

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions