-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Add VisualCloze #11377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add VisualCloze #11377
Conversation
@lzyhha thanks for your contribution. Could you please add some code snippets and results to the thread? |
Cc: @asomoza as well for testing if possible. |
Hello, here are some test codes and their results: Model Card. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hi, really nice and thank you for your work. Currently diffusers doesn't have |
Okay, I will make the necessary modifications. Additionally, I noticed that the call method is not functioning properly in the documentation. Could you please help check the cause? |
Co-authored-by: Álvaro Somoza <[email protected]>
Hello, we have removed einops from the code while ensuring the correctness of the results. @asomoza |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this. I just added few minor comments.
I am unsure about self.denoise()
. On one hand I see its value but since it deviates from our usual pipeline implementations, I will defer the decision to the other reviewers.
Hello, we have made changes to the code based on your suggestions. @sayakpaul |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I left some further comments and I will let the other reviewers comment here.
@@ -89,6 +89,7 @@ The table below lists all the pipelines currently available in 🤗 Diffusers an | |||
| [UniDiffuser](unidiffuser) | text2image, image2text, image variation, text variation, unconditional image generation, unconditional audio generation | | |||
| [Value-guided planning](value_guided_sampling) | value guided sampling | | |||
| [Wuerstchen](wuerstchen) | text2image | | |||
| [VisualCloze](visualcloze) | text2image, image2image, subject driven generation, inpainting, style transfer, image restoration, image editing, [depth,normal,edge,pose]2image, [depth,normal,edge,pose]-estimation, virtual try-on, image relighting | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc: @stevhliu do you think it's alright?
Co-authored-by: Sayak Paul <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the awesome PR, and congrats on the release of your work! Just some minor changes that are needed before we can proceed to merge
|
||
# Generate the target image latents by denoising the initial noise | ||
# using the provided prompts and guidance scale | ||
cloze_latents = self.denoise( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yiyixuxu This is quite different from our usual pipeline design, but there is benefit to having it here to reduce duplicated code. Could you review this part as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should follow kandinsky/stable cascade here to have two pipeline connected here, since there are two denoising loops
https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/stable_cascade
https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/kandinsky2_2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a-r-r-o-w Hello, may I ask if we should modify the code to use two concatenated pipelines before merging? I noticed that the difference compared to Kandinsky/Stable Cascade is that we reuse the same network across the two denoising loops. If needed, I can make the modification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lzyhha Having two pipelines here will indeed be beneficial and help us in maintaining the implementation.
If I understand correctly, the first pipeline can be the "generation" pipeline and the second pipeilne can be the "upsampling" pipeline. The second pipeline can take in the outputs of the first pipeline similar to how it's done in stable cascade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a-r-r-o-w OK, I'll make the revision in a few days.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a-r-r-o-w Hello, I have completed the corresponding modifications and conducted testing. For the second stage, I directly used the FluxFillPipeline as the VisualClozeUpsamplingPipeline. We can move forward from here.
@bot /style |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, the PR looks good to merge! I'll run the example scripts to verify and try to merge by tomorrow
@a-r-r-o-w Hello, I have upgraded Ruff in my environment to version 0.9.10 and resolved the errors from the previously failed workflows. |
What does this PR do?
Add VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning, an in-context learning based universal image generation framework, along with corresponding tests and documentation.
Here are some test codes and their results: Model Card.
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.