Skip to content

Adaptive Projected Guidance #9626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

hlky
Copy link
Contributor

@hlky hlky commented Oct 9, 2024

What does this PR do?

This PR implements APG (Adaptive Projected Guidance) from Algorithm 1 in Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models.

Algorithm 1 is slightly modified to combine project into normalized_guidance, this simply reduces the number of methods to be copied between pipelines.

APG is added to StableDiffusionPipeline and StableDiffusionXLPipeline. The following parameters are introduced:

adaptive_projected_guidance (`bool`, *optional*):
    Use adaptive projected guidance from [Eliminating Oversaturation and Artifacts of High Guidance Scales
    in Diffusion Models](https://arxiv.org/pdf/2410.02416)
adaptive_projected_guidance_momentum (`float`, *optional*, defaults to `-0.5`):
    Momentum to use with adaptive projected guidance. Use `None` to disable momentum.
adaptive_projected_guidance_rescale_factor (`float`, *optional*, defaults to `15.0`):
    Rescale factor to use with adaptive projected guidance.

Default values are taken from Stable Diffusion XL in Table 10. The existing eta parameter is used, rather than adding a new parameter, as per the docstring eta is only used in DDIMScheduler so this should be ok.

Fixes #9585

Example usage:

import torch
from diffusers import StableDiffusionXLPipeline

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    variant="fp16",
    torch_dtype=torch.float16,
)
pipe.safety_checker = None
pipe.enable_model_cpu_offload()
pipe.enable_vae_tiling()
prompt = "A 4k dslr photo of a raccoon wearing an astronaut helmet, photorealistic."

generator = torch.Generator().manual_seed(694208027600)
image = pipe(prompt, guidance_scale=15, generator=generator).images[0]
image

generator = torch.Generator().manual_seed(694208027600)
image = pipe(
    prompt,
    guidance_scale=15,
    adaptive_projected_guidance=True,
    adaptive_projected_guidance_momentum=-0.5,
    adaptive_projected_guidance_rescale_factor=15.0,
    generator=generator,
).images[0]
image

CFG:
original

APG:
apg

There's certainly some improvement, however further testing would be beneficial to confirm the findings in the paper.

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

cc @yiyixuxu @asomoza

@Msadat97
Copy link

Msadat97 commented Oct 9, 2024

Thanks for your interest in our work! Please note that we always convert the output of the model to the denoised predictions (pred_x0) and compute the guidance there. We found that APG performs better when applied to the denoised predictions. We also have a discussion on this step in Section 5.2 (Figure 12).

Ideally, APG should be implemented like this:

x0_pred_text = get_x0_from_noise(noise_pred_text, latents, t)
x0_pred_uncond = get_x0_from_noise(noise_pred_uncond, latents, t)
x0_guided = normalized_guidance(...)
noise_pred = get_noise_from_x0(x0_guided, latents, t)

(A better solution would be having a flag that allows the users to choose whether APG should be applied to the model output or the denoised prediction)

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Oct 9, 2024

hi @hlky

Thanks for the PR! I'm a bit reluctant to add this to SD and SDXL, as these pipelines are already getting bloated and can become overwhelming for newcomers, especially given that this is not the only CFG alternative and won't be the last one.

@apolinario has suggested ideas to make guidance a separate "component" that you can swap out just like schedulers - I'm happy to explore that now! I will draft a PR soon, and we can work together and experiment with different ideas from there! This may also fit better in an experimental project that we are working on to make a composable pipeline that targets the community and company users and allows them to mix and match different features without writing much code. So, we will see!

@hlky
Copy link
Contributor Author

hlky commented Oct 9, 2024

@Msadat97 Thanks! I missed that section. Applying the guidance to the denoised predictions does indeed produce much better results:
apg

@yiyixuxu After fixing this locally for denoised predictions I'd have to agree, it needs a little more than present here, specifically we'd actually need some changes to schedulers to allow easy conversion between noised and denoised predictions. Making guidance a separate component sounds like a great idea, hope to see that in soon, I'd be happy to work with you on that and any changes to schedulers.

@xziayro
Copy link

xziayro commented Oct 11, 2024

@Msadat97 Thanks! I missed that section. Applying the guidance to the denoised predictions does indeed produce much better results: apg

@yiyixuxu After fixing this locally for denoised predictions I'd have to agree, it needs a little more than present here, specifically we'd actually need some changes to schedulers to allow easy conversion between noised and denoised predictions. Making guidance a separate component sounds like a great idea, hope to see that in soon, I'd be happy to work with you on that and any changes to schedulers.

@hlky Can you share/commit the change that result to this enhancement?
Thx a lot

@hlky hlky force-pushed the adaptive-projected-guidance branch from a528e6c to c7e62c4 Compare October 11, 2024 09:31
@hlky
Copy link
Contributor Author

hlky commented Oct 11, 2024

Certainly, I've pushed those changes, however please note this will only work with some schedulers like Euler, and while it does work for 2nd order schedulers like Heun and DPM2 it's not 100% as the sigma used is incorrect for the 2nd order step, it won't work for schedulers like DDIM. The issue linked above aims to add methods to handle this for each scheduler.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Nov 6, 2024

hi @hlky just so you know I made a Guider class in here and this is the use case I have in mind to try next #9672

Copy link
Contributor

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Nov 30, 2024
@a-r-r-o-w a-r-r-o-w added wip consider-for-modular-diffusers Things to consider adding support for in Modular Diffusers (with the help of community) and removed stale Issues that haven't received updates labels Nov 30, 2024
@hlky hlky mentioned this pull request Dec 10, 2024
@hlky hlky closed this Apr 15, 2025
@hlky hlky deleted the adaptive-projected-guidance branch April 15, 2025 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
consider-for-modular-diffusers Things to consider adding support for in Modular Diffusers (with the help of community) wip
Projects
None yet
Development

Successfully merging this pull request may close these issues.

APG: Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
7 participants