-
Notifications
You must be signed in to change notification settings - Fork 6k
fix compatability issue between PAG and IP-adapter #8379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1172,6 +1172,10 @@ def __call__( | |||
self.do_classifier_free_guidance, | |||
) | |||
|
|||
# expand the image embeddings if we are using perturbed-attention guidance | |||
for i in range(len(image_embeds)): | |||
image_embeds[i] = image_embeds[i].repeat(prompt_embeds.shape[0] // latents.shape[0], 1, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This throws an error with the PLUS versions of IP Adapters each image_embeds
is a 4D tensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for finding the error! I found the cause of the error and, thanks to this, came up with a more elegant design. I will upload the revised code with the results soon!
Hi @asomoza. Thank you for your thorough review and awesome showcases! As you detected errors when using IP-adapter plus, I dug into the code run and found out the cause of the problem. The issue was because the IP-adapter image embedding was not properly copied, unlike the I have attached the example code and the results. In the grid image, the IP adapter scale increases to the right, and the PAG scale increases downward. Example code and resultsPAG only
PAG with CFG
It would be fantastic to see PAG easily usable in Diffusers in the near future. If there's anything I can assist with, please let me know. Thank you! |
What does this PR do?
I fixed the IP-adapter compatibility issue of the proposed PAG Mixin.
First, I found that
load_ip_adapter
overwrites the loadedPAGIdentitySelfAttnProcessor2_0
withAttnProcessor
orAttnProcessor2_0
(seeuent.py
). So, I changed the code to keep the original processor if it is not a cross-attention processor.Second, I also found that even if I use PAG only (not using CFG), the image embeddings of the IP-adapter are only applied to one of
noise_pred_uncond
ornoise_pred_perturb
(I can't remember the exact variable 😭). I checked this by changing_apply_perturbed_attention_guidance
inpag_utils.py
line 133 to use onlynoise_pred_uncond
ornoise_pred_perturb
. If I do not use classifier-free guidance, the results should be the same, but the final results are completely different: one has the image condition applied, and the other does not. So, I copiedimage_embeds
inpipeline_stable_diffusion_xl.py
similar to how latents are copied when using CFG (latent_model_input = torch.cat([latents] * (prompt_embeds.shape[0] // latents.shape[0]))
). I'm not sure this is the right approach because I couldn't identify the exact location whereimage_embeds
are only applied to single latents.Finally, I changed
do_perturbed_attention_guidance
ofPAGMixin
to work consistently even whenpag_scale
is 0. This is because if we useenable_pag(...)
, the attention processor is changed toPAGIdentitySelfAttnProcessor2_0
even thoughpag_scale
is 0. This causes errors when a single latent passes throughPAGIdentitySelfAttnProcessor2_0
, which expects copied and concatenated latents.Example code and results
I attached the example code and the results. In the grid image, the IP adapter scale increases to the right, and the PAG scale increases downward.
IP-adapter + PAG (without CFG)
IP-adapter + PAG (with CFG)
In my opinion, using PAG reduces artifacts and improves the overall composition both with and without CFG. It is very encouraging to see that PAG works well with the IP-adapter.