-
Notifications
You must be signed in to change notification settings - Fork 6k
Flux Redux #9988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flux Redux #9988
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
black-forest-labs/FLUX.1-Redux-dev only support image prompt, right? |
@wsxwd yes |
pooled_prompt_embeds, | ||
_, | ||
) = self.encode_prompt( | ||
prompt=[""] * batch_size, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even though flux-dev-redux is an image variation pipeline and does not take text input, in the original implementation, it still puts an empty string through the text encoders to create the prompt embeds. It then concatenates with the image embeds to create model inputs.
However, the model itself does not seem to be controlled by text prompts at all, i.e., if you change the empty prompt to a different prompt, the generation does not seem to be affected by the prompt itself, even though the output might be slightly different.
So here for now, we do that only if the text encoders are explicitly added to the redux pipeline; otherwise, we create zero prompt embeds. @asomoza , could you test it a bit and let me know if it affects the generation?
If, in any case, the prompt embeds generated from an empty string are better, we can add prompt/prompt_embeds argument to the redux pipeline, this way users can save the prompt embeds and use them as inputs, so that they don't have to load the text encoders.
I added the scripts for both use cases in the PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repeat the prompt a lot and it starts to impact things. the token count of the image space is just very large so its importance very much outweighs the smaller text input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you share an example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one was not provided by the user who reported this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look great and seem to be working (I haven't tried matching numerically to original but I believe you've covered it)! We can handle tests, docs and example usage doc string in the combined PR. Thanks!
|
||
logger = logging.get_logger(__name__) # pylint: disable=invalid-name | ||
|
||
EXAMPLE_DOC_STRING = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example needs update with correct pipeline usage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍🏽
So does the redux support text prompt input now? |
yes it works with prompts. you need attention bias to upweight the importance of the prompt vs img tokens. |
part of #9985
This PR adds Flux Redux
TO-DO
I tested empty prompt vs. zero prompt embeds, I think the results are similar; in that case, we can recommend running redux without text_encoders, Here are the results: left is with text_encoders, and the right is without
cc @asomoza here, can you test a little bit and let me know what you think
to use with t5 (same as in original impl,
prompt=""
)run without t5 (use zero prompt embeds)