Flux Redux #9988

yiyixuxu · 2024-11-22T00:52:43Z

part of #9985

This PR adds Flux Redux

TO-DO

test if we use all zero prompts will make a difference in output, if not we do not need t5 when redux is used
doc and tests
test flux with all other pipelines, including the control + fill model that just released (this will be in a different PR)

I tested empty prompt vs. zero prompt embeds, I think the results are similar; in that case, we can recommend running redux without text_encoders, Here are the results: left is with text_encoders, and the right is without

cc @asomoza here, can you test a little bit and let me know what you think

to use with t5 (same as in original impl, `prompt=""`)

# test 1
import torch
from PIL import Image

device = "cuda"
dtype = torch.bfloat16

from diffusers import FluxPriorReduxPipeline, FluxPipeline
from diffusers.utils import load_image

repo_redux = "YiYiXu/yiyi-redux"
repo_base = "black-forest-labs/FLUX.1-dev"

pipe = FluxPipeline.from_pretrained(repo_base, torch_dtype=torch.bfloat16)
pipe_prior_redux = FluxPriorReduxPipeline.from_pretrained(
    repo_redux, 
    text_encoder=pipe.text_encoder,
    tokenizer=pipe.tokenizer,
    text_encoder_2=pipe.text_encoder_2,
    tokenizer_2=pipe.tokenizer_2,
    torch_dtype=dtype
)
pipe_prior_redux.to(device)

img_path = "/raid/yiyi/flux-new/assets/robot.webp"
image = Image.open(img_path).convert("RGB")

pipe_prior_output = pipe_prior_redux(image)


pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
latents = torch.load("/raid/yiyi/flux-new/redux_latents.pt")
print(latents.shape)
image = pipe(
    guidance_scale=2.5,
    height=768,
    width=1360,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0),
    latents=latents,
    **pipe_prior_output,
).images[0]
image.save("yiyi_test_5_out.png")

# Clean up memory
del pipe
del pipe_prior_redux
import gc
gc.collect()
torch.cuda.empty_cache()

run without t5 (use zero prompt embeds)

# test 2 (zero prompt embeds)

import torch
from PIL import Image
from diffusers import FluxPriorReduxPipeline, FluxPipeline

device = "cuda"
dtype = torch.bfloat16


repo_redux = "YiYiXu/yiyi-redux"
repo_base = "black-forest-labs/FLUX.1-dev"

pipe_prior_redux = FluxPriorReduxPipeline.from_pretrained(repo_redux, torch_dtype=dtype)
pipe_prior_redux.to(device)

img_path = "/raid/yiyi/flux-new/assets/robot.webp"
image = Image.open(img_path).convert("RGB")
pipe_prior_output = pipe_prior_redux(image)

pipe = FluxPipeline.from_pretrained(
    repo_base, 
    text_encoder=None,
    tokenizer=None,
    text_encoder_2=None,
    tokenizer_2=None,
    torch_dtype=torch.bfloat16
)
pipe.to(device) #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
latents = torch.load("/raid/yiyi/flux-new/redux_latents.pt")
print(latents.shape)
image = pipe(
    guidance_scale=2.5,
    height=768,
    width=1360,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0),
    latents=latents,
    **pipe_prior_output,
).images[0]
image.save("yiyi_test_5_out_2.png")```

HuggingFaceDocBuilderDev · 2024-11-22T00:59:03Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

wsxwd · 2024-11-22T04:03:27Z

black-forest-labs/FLUX.1-Redux-dev only support image prompt, right？

yiyixuxu · 2024-11-22T06:23:15Z

@wsxwd yes

yiyixuxu · 2024-11-22T09:39:32Z

src/diffusers/pipelines/flux/pipeline_flux_prior_redux.py

+                pooled_prompt_embeds,
+                _,
+            ) = self.encode_prompt(
+                prompt=[""] * batch_size,


Even though flux-dev-redux is an image variation pipeline and does not take text input, in the original implementation, it still puts an empty string through the text encoders to create the prompt embeds. It then concatenates with the image embeds to create model inputs.

However, the model itself does not seem to be controlled by text prompts at all, i.e., if you change the empty prompt to a different prompt, the generation does not seem to be affected by the prompt itself, even though the output might be slightly different.

So here for now, we do that only if the text encoders are explicitly added to the redux pipeline; otherwise, we create zero prompt embeds. @asomoza , could you test it a bit and let me know if it affects the generation?

If, in any case, the prompt embeds generated from an empty string are better, we can add prompt/prompt_embeds argument to the redux pipeline, this way users can save the prompt embeds and use them as inputs, so that they don't have to load the text encoders.

I added the scripts for both use cases in the PR description.

repeat the prompt a lot and it starts to impact things. the token count of the image space is just very large so its importance very much outweighs the smaller text input

can you share an example?

one was not provided by the user who reported this

a-r-r-o-w

The changes look great and seem to be working (I haven't tried matching numerically to original but I believe you've covered it)! We can handle tests, docs and example usage doc string in the combined PR. Thanks!

a-r-r-o-w · 2024-11-22T14:52:55Z

src/diffusers/pipelines/flux/pipeline_flux_prior_redux.py

+
+logger = logging.get_logger(__name__)  # pylint: disable=invalid-name
+
+EXAMPLE_DOC_STRING = """


Example needs update with correct pipeline usage

DN6

Nice 👍🏽

asomoza · 2024-11-22T23:07:00Z

With more complex images the difference it's also minimal and the image quality it's the same, so with the added benefit of using less VRAM, without the T5 is the way to go if people want to use this.

source	with TE	without TE

the different variations and image sizes works flawlessly too.

lhjlhj11 · 2025-01-10T06:20:36Z

part of #9985

This PR adds Flux Redux

TO-DO

* [x]  test if we use all zero prompts will make a difference in output, if not we do not need t5 when redux is used

* [ ]  doc and tests

* [ ]  test flux with all other pipelines, including the control + fill model that just released (this will be in a different PR)

I tested empty prompt vs. zero prompt embeds, I think the results are similar; in that case, we can recommend running redux without text_encoders, Here are the results: left is with text_encoders, and the right is without

cc @asomoza here, can you test a little bit and let me know what you think

to use with t5 (same as in original impl, `prompt=""`)

# test 1
import torch
from PIL import Image

device = "cuda"
dtype = torch.bfloat16

from diffusers import FluxPriorReduxPipeline, FluxPipeline
from diffusers.utils import load_image

repo_redux = "YiYiXu/yiyi-redux"
repo_base = "black-forest-labs/FLUX.1-dev"

pipe = FluxPipeline.from_pretrained(repo_base, torch_dtype=torch.bfloat16)
pipe_prior_redux = FluxPriorReduxPipeline.from_pretrained(
    repo_redux, 
    text_encoder=pipe.text_encoder,
    tokenizer=pipe.tokenizer,
    text_encoder_2=pipe.text_encoder_2,
    tokenizer_2=pipe.tokenizer_2,
    torch_dtype=dtype
)
pipe_prior_redux.to(device)

img_path = "/raid/yiyi/flux-new/assets/robot.webp"
image = Image.open(img_path).convert("RGB")

pipe_prior_output = pipe_prior_redux(image)


pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
latents = torch.load("/raid/yiyi/flux-new/redux_latents.pt")
print(latents.shape)
image = pipe(
    guidance_scale=2.5,
    height=768,
    width=1360,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0),
    latents=latents,
    **pipe_prior_output,
).images[0]
image.save("yiyi_test_5_out.png")

# Clean up memory
del pipe
del pipe_prior_redux
import gc
gc.collect()
torch.cuda.empty_cache()

run without t5 (use zero prompt embeds)

# test 2 (zero prompt embeds)

import torch
from PIL import Image
from diffusers import FluxPriorReduxPipeline, FluxPipeline

device = "cuda"
dtype = torch.bfloat16


repo_redux = "YiYiXu/yiyi-redux"
repo_base = "black-forest-labs/FLUX.1-dev"

pipe_prior_redux = FluxPriorReduxPipeline.from_pretrained(repo_redux, torch_dtype=dtype)
pipe_prior_redux.to(device)

img_path = "/raid/yiyi/flux-new/assets/robot.webp"
image = Image.open(img_path).convert("RGB")
pipe_prior_output = pipe_prior_redux(image)

pipe = FluxPipeline.from_pretrained(
    repo_base, 
    text_encoder=None,
    tokenizer=None,
    text_encoder_2=None,
    tokenizer_2=None,
    torch_dtype=torch.bfloat16
)
pipe.to(device) #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
latents = torch.load("/raid/yiyi/flux-new/redux_latents.pt")
print(latents.shape)
image = pipe(
    guidance_scale=2.5,
    height=768,
    width=1360,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0),
    latents=latents,
    **pipe_prior_output,
).images[0]
image.save("yiyi_test_5_out_2.png")```

So does the redux support text prompt input now?

bghira · 2025-01-10T12:48:25Z

yes it works with prompts. you need attention bias to upweight the importance of the prompt vs img tokens.

a-r-r-o-w and others added 6 commits November 21, 2024 06:53

update

2829679

Merge branch 'main' into flux-new

be67dbd

add

f56ffb1

update

7e4df06

Merge remote-tracking branch 'origin/flux-fill-yiyi' into flux-new

9ea52da

add

a592002

remove change to flux pipeline

293d830

yiyixuxu commented Nov 22, 2024

View reviewed changes

yiyixuxu requested review from DN6, a-r-r-o-w and asomoza November 22, 2024 09:43

make image batch work

0f0c9ce

a-r-r-o-w approved these changes Nov 22, 2024

View reviewed changes

DN6 approved these changes Nov 22, 2024

View reviewed changes

asomoza approved these changes Nov 22, 2024

View reviewed changes

Base automatically changed from flux-new to main November 23, 2024 11:41

yiyixuxu mentioned this pull request Nov 24, 2024

Flux Fill, Canny, Depth, Redux #9985

Merged

yiyixuxu closed this Nov 24, 2024

yiyixuxu deleted the flux-redux branch November 24, 2024 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flux Redux #9988

Flux Redux #9988

yiyixuxu commented Nov 22, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 22, 2024

wsxwd commented Nov 22, 2024

yiyixuxu commented Nov 22, 2024

yiyixuxu Nov 22, 2024 •

edited

Loading

bghira Nov 23, 2024 •

edited

Loading

yiyixuxu Nov 23, 2024

bghira Nov 23, 2024

a-r-r-o-w left a comment

a-r-r-o-w Nov 22, 2024

DN6 left a comment

asomoza commented Nov 22, 2024

lhjlhj11 commented Jan 10, 2025

TO-DO

to use with t5 (same as in original impl, `prompt=""`)

run without t5 (use zero prompt embeds)

bghira commented Jan 10, 2025


		logger = logging.get_logger(__name__) # pylint: disable=invalid-name

		EXAMPLE_DOC_STRING = """

Flux Redux #9988

Flux Redux #9988

Conversation

yiyixuxu commented Nov 22, 2024 • edited Loading

TO-DO

to use with t5 (same as in original impl, prompt="")

run without t5 (use zero prompt embeds)

HuggingFaceDocBuilderDev commented Nov 22, 2024

wsxwd commented Nov 22, 2024

yiyixuxu commented Nov 22, 2024

yiyixuxu Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

bghira Nov 23, 2024 • edited Loading

Choose a reason for hiding this comment

yiyixuxu Nov 23, 2024

Choose a reason for hiding this comment

bghira Nov 23, 2024

Choose a reason for hiding this comment

a-r-r-o-w left a comment

Choose a reason for hiding this comment

a-r-r-o-w Nov 22, 2024

Choose a reason for hiding this comment

DN6 left a comment

Choose a reason for hiding this comment

asomoza commented Nov 22, 2024

lhjlhj11 commented Jan 10, 2025

TO-DO

to use with t5 (same as in original impl, prompt="")

run without t5 (use zero prompt embeds)

bghira commented Jan 10, 2025

yiyixuxu commented Nov 22, 2024 •

edited

Loading

to use with t5 (same as in original impl, `prompt=""`)

yiyixuxu Nov 22, 2024 •

edited

Loading

bghira Nov 23, 2024 •

edited

Loading

to use with t5 (same as in original impl, `prompt=""`)