-
Notifications
You must be signed in to change notification settings - Fork 5.9k
The Modular Diffusers #9672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
The Modular Diffusers #9672
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Very cool! |
… completely yet (look into later)
hi this is very interesting! I'm making a Python pipeline flow visual scripting tool, that can auto-convert functions to visual nodes for fast and modular UI blocks demo. Itself is a pip package: https://pypi.org/project/nozyio/ I wanted to integrate diffusers with my flow nodes UI project but found its not very modular. But this PR may change that! Looking forward to see how this evolves. github: https://github.com/oozzy77/nozyio happy to connect! |
@oozzy77 thanks! |
Hi super willing to join slack channel with you! What’s the workspace
channel I should join?or you can invite me ***@***.***
…On Thu, Oct 31, 2024 at 4:59 AM YiYi Xu ***@***.***> wrote:
@oozzy77 <https://github.com/oozzy77> thanks!
do you want to join a slack channel with me? if you want to experiment
building something with this PR I'm eager to hear your feedback and iterate
base on that
—
Reply to this email directly, view it on GitHub
<#9672 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BMBK3ZHNSKN56N262LBH3WLZ6FCBNAVCNFSM6AAAAABP5SYMXOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINBYGM3DQMBYGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@oozzy77 I sent an invite! |
@a-r-r-o-w latent is one example
there is also in your case for upscaling, I think it should be
open to suggestions/discussions |
if it is an upscaler that takes latents as input, I think it is most convenient to be used on its own, (like in UI, it would be its own node/pipeline) maybe make a map like this so it can be used to create different presets? AUTO_UPSCALE_BLOCKS = OrderedDict([
("text_encoder", StableDiffusionXLTextEncoderStep),
("ip_adapter", StableDiffusionXLAutoIPAdapterStep),
("image_encoder", StableDiffusionXLAutoVaeEncoderStep),
("before_denoise", StableDiffusionXLAutoBeforeDenoiseStep),
("upscale", AutoUpscaleStep),
("denoise", StableDiffusionXLAutoDenoiseStep),
("decode", StableDiffusionXLAutoDecodeStep)
]) make a preset for end-to-end pipeline class SDXLAutoUpscaleBlocks(SequentialPipelineBlocks):
block_classes = list(AUTO_UPSCALE_BLOCKS.values())
block_names = list(AUTO_UPSCALE_BLOCKS.keys())
auto_pipe_upscaled = ModularPipeline.from_block(SDXLAutoUpscaleBlocks()) just the upscaler node used in stand-alone upscaler_block = AUTO_UPSCALE_BLOCKS["upscale"]()
upcaler_node = ModularPipeline.from_block(upscaler_block) |
Did a pass on the examples and the info shared instead of looking through the code too much (following @a-r-r-o-w's philosophy). Some comments first.
What if the user combines the inputs that are supported? How do we infer for such situations? For example, what if I provide a
This is very convenient! However, I wonder if the user could restrict the level of info they want to see. I got a bit lost after the args started appearing. Maybe something to consider in the later iterations. Misc:
Now, I tried to use the SDXL refiner: Codeimport torch
from diffusers import ModularPipeline, StableDiffusionXLAutoPipeline
from diffusers.pipelines.components_manager import ComponentsManager
# Load models
components = ComponentsManager()
components.add_from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
# Create pipeline
pipe = ModularPipeline.from_block(StableDiffusionXLAutoPipeline())
pipe.update_states(**components.components)
pipe.to("cuda")
# Run inference
prompt = "A majestic lion jumping from a big stone at night"
height = 1024
width = 1024
output = pipe(prompt=prompt, height=height, width=width, num_inference_steps=30)
images = output.intermediates.get("images").images
latents = output.intermediates.get("latents")
print(f"{latents.shape=}")
images[0].save("output_modular.png")
# Clear things
del components, pipe
torch.cuda.empty_cache()
# Load refiner
components = ComponentsManager()
components.add_from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16)
# Create pipeline
pipe = ModularPipeline.from_block(StableDiffusionXLAutoPipeline())
pipe.update_states(**components.components)
pipe.to("cuda")
pipe.register_to_config(requires_aesthetics_score=False)
# Refine outputs.
output = pipe(prompt=prompt, image_latents=latents, num_inference_steps=30)
images = output.intermediates.get("images").images
images[0].save("output_refiner_modular.png") It leads to: ValueError: Model expects an added time embedding vector of length 2560, but a vector of 2816 was created. Please make sure to disable `requires_aesthetics_score` with `pipe.register_to_config(requires_aesthetics_score=False)` to make sure `target_size` (1024, 1024) is correctly used by the model. Questions:
|
@sayakpaul these are really good feedbacks! thank you! for refiner, you have to do refiner_pipeline.update_states(**components.get(["text_encoder_2","tokenizer_2", "vae", "scheduler"]), unet=components.get("refiner_unet"), force_zeros_for_empty_prompt=True, requires_aesthetics_score=True) it is a bit verbose as you can see, and it's the case in general on how we load the
open to better API, but probably not components because we also update config with it
open to suggestions on how to do better here, currently each
These are pretty important! We don't have to wait to improve in later iterations. Let's make it better now if it's possible. maybe we don't have to print out the docstring (the args etc), we can direct user to use |
@sayakpaul # Loading Models
components = ComponentsManager()
components.add_from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
# load just the refiner UNet (reuse the text_encoders that's already in components)
+ refiner_unet = UNet2DConditionModel.from_pretrained(
+ "stabilityai/stable-diffusion-xl-refiner-1.0",
+ subfolder="unet",
+ torch_dtype=torch.float16
+ )
+ components.add("refiner_unet", refiner_unet)
# this make sure all models stay in cpu until forward pass is invoked and may be put back on cpu when more GPU memory is needed
+ components.enable_auto_cpu_offload()
# I think we don't need to do this:
# 1. pipe's states are managed by `components`; if we want to delete everything, delete components in components manager is enough
# 2. GPU memory is already managed by `components`, i.e. if we need more memory to run refiner pipeline,
# the other unet from base repo will be offload to cpu.
# We can also add methods to unload/delete models if more explicit control is needed but overall I think we don't need to
# delete a model unless we are certain we do not need them anymore
# 3. in this particular use case, we still need the text_encoders so don't recommend deleting them and reloading again here
- # Clear components and free CUDA memory before loading refiner
- del components, pipe
- torch.cuda.empty_cache()
-
- # Load complete refiner pipeline
- components = ComponentsManager()
- components.add_from_pretrained(
- "stabilityai/stable-diffusion-xl-refiner-1.0",
- torch_dtype=torch.float16
- )
# Refiner Pipeline Setup
refiner_pipeline = ModularPipeline.from_block(StableDiffusionXLAutoPipeline())
refiner_pipeline.update_states(
**components.get(["text_encoder_2", "tokenizer_2", "vae", "scheduler"]),
+ unet=components.get("refiner_unet"), # Using explicitly loaded UNet
- unet=components.get("unet"), # Using UNet from complete pipeline
force_zeros_for_empty_prompt=True,
requires_aesthetics_score=True
) Click to expand the codeimport torch
from diffusers import ModularPipeline, StableDiffusionXLAutoPipeline, UNet2DConditionModel
from diffusers.pipelines.components_manager import ComponentsManager
# Load models
components = ComponentsManager()
components.add_from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
refiner_unet = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0", subfolder="unet", torch_dtype=torch.float16)
components.add("refiner_unet", refiner_unet)
components.enable_auto_cpu_offload()
# Create pipeline
pipe = ModularPipeline.from_block(StableDiffusionXLAutoPipeline())
pipe.update_states(**components.components)
pipe.to("cuda")
# Run inference
prompt = "A majestic lion jumping from a big stone at night"
height = 1024
width = 1024
output = pipe(prompt=prompt, height=height, width=width, num_inference_steps=30)
images = output.intermediates.get("images").images
latents = output.intermediates.get("latents")
print(f"{latents.shape=}")
images[0].save("output_modular.png")
# Create pipeline
refiner_pipeline = ModularPipeline.from_block(StableDiffusionXLAutoPipeline())
refiner_pipeline.update_states(
**components.get(["text_encoder_2", "tokenizer_2", "vae", "scheduler"]),
unet=components.get("refiner_unet"),
force_zeros_for_empty_prompt=True,
requires_aesthetics_score=True
)
refiner_pipeline.to("cuda")
# Refine outputs.
output = refiner_pipeline(prompt=prompt, image_latents=latents, num_inference_steps=30)
images = output.intermediates.get("images").images
images[0].save("output_refiner_modular.png") can you help me:
|
@sayakpaul
could be just
happy to explore this too, if you can share a POC that'd be great! |
I think this is a valid assumption except for the situations where we don't have enough CPU RAM (48GBs might be low).
I think we could cover the refiner use case (and alike) under the theme of "reusing components between workflows". We could make it clear that to make the most out of reusing, it's recommended to first load all the components needed for the workflows users want to try out and keep them on CPU. Users will always have the option to load any ad-hoc component component they may may have forgotten in the beginning. If we can make this clear in the docs with examples, I think that should be enough. WDYT?
Yeah
Sure, happy to do that. I will branch off of this PR and try to open a PR. Would that work? |
I finished testing and doing a PoC with the callbacks so I can update the step progress inside an UI. So discussing here a question about the implementation, since we now have the So I did this for the PoC to match current implementation: if data.callback_on_step_end is not None:
callback_kwargs = {}
for k in data.callback_on_step_end_tensor_inputs:
callback_kwargs[k] = getattr(data, k)
callback_outputs = data.callback_on_step_end(self, i, t, callback_kwargs)
data.latents = callback_outputs.pop("latents", data.latents)
data.prompt_embeds = callback_outputs.pop("prompt_embeds", data.prompt_embeds)
data.added_cond_kwargs["text_embeds"] = callback_outputs.pop("text_embeds", data.added_cond_kwargs["text_embeds"])
data.added_cond_kwargs["time_ids"] = callback_outputs.pop("time_ids", data.added_cond_kwargs["time_ids"]) but it could be something like this which is better to me: if data.callback_on_step_end is not None:
data.callback_on_step_end(self, i, t, data) what are your thoughts on this @yiyixuxu? |
@asomoza if data.callback_on_step_end is not None:
data.callback_on_step_end(self, i, t, data) |
@sayakpaul
|
It's looking really nice. Obviously there are a lot of intricacies here that I might not have picked up, so in my initial pass I just tried to focus on parts that felt a little unclear to me. I tried to break it down by the major components in Modular Diffusers. Components ManagerMy understanding here is that Components Manager is responsible for loading all models, schedulers, etc into the Modular Pipeline and performing memory management for the loaded components. Where it felt a bit unintuitive was trying determine which model repos can be used with For example, This snippet will load all the components of the base SDXL Pipelines into Component Manager # Load models
components = ComponentsManager()
components.add_from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
) But if I want to load a ControlNet Model via a model repo I cannot. I have to create the object and add to Components Manager via the components.add_from_pretrained("xinsir/controlnet-union-sdxl-1.0", torch_dtype=torch.float16) Since I'm familiar with the library, I realise that this is following our existing Pipeline loading logic. But I think it might make sense to support adding individual model components through PipelineBlockMy understanding here is that a The Let's say I want to add a PipelineBlock that has a model associated with the step. In the example below I want to create block that automatically extracts a depth map from an image so that I can use it with a ControlNet. Can I add the depth model to the class DepthBlock(PipelineBlock):
@property
def inputs(self) -> List[InputParam]:
control_image = InputParam(
name="control_image",
required=True,
)
return control_image
def __init__(self) -> None:
super().__init__()
# If I load in a model in pipeline block is it possible to move the the componets manager?
depth_preprocessor = DepthPreprocessor.from_pretrained("depth-anything/Depth-Anything-V2-Large-hf")
def __call__(self, pipeline, state: PipelineState) -> PipelineState:
data = self.get_block_state(state)
control_image = data.control_image
depth_image = self.depth_processor(control_image)
data.control_image = depth_image
self.add_block_state(data, state)
return pipeline, state When initializing class StableDiffusionXLDecodeLatentsStep(PipelineBlock):
expected_components = ["vae"]
model_name = "stable-diffusion-xl" And then in the def __init__(self):
super().__init__()
self.components["vae"] = None
self.auxiliaries["image_processor"] = VaeImageProcessor(vae_scale_factor=8) I found it a bit confusing as to why we are setting Are the class attributes at the top of the block needed? As far as I can tell from skimming the code, we operate on block instances everywhere? Can we define PipelineBlocks in such a way? IMO a bit more Pythonic and makes the Blocks feel a bit more like mini-Pipelines. You can also add type enforcement check on the components too. LMK if I'm missing something here. class StableDiffusionXLTextEncoderStep(PipelineBlock):
def __init__(
self,
text_encoder=None,
text_encoder_2=None,
tokenizer=None,
tokenizer_2=None,
force_zeros_for_empty_prompt=True,
):
super().__init__()
# this would set expected_configs
self.register_to_config(force_zeros_for_empty_prompt=force_zeros_for_empty_prompt)
# this would set expected_components
self.register_component({
text_encoder=text_encoder,
text_encoder_2=text_encoder_2,
tokenizer=tokenizer,
tokenizer_2=tokenizer_2
}) Another thing I wasn't quite able to figure out the exact scope of Here let's say we are encoding a prompt. In the example (
data.prompt_embeds,
data.negative_prompt_embeds,
data.pooled_prompt_embeds,
data.negative_pooled_prompt_embeds,
) = pipeline.encode_prompt(
data.prompt,
data.prompt_2,
data.device,
1,
data.do_classifier_free_guidance,
data.negative_prompt,
data.negative_prompt_2,
prompt_embeds=None,
negative_prompt_embeds=None,
pooled_prompt_embeds=None,
negative_pooled_prompt_embeds=None,
lora_scale=data.text_encoder_lora_scale,
clip_skip=data.clip_skip,
) The Can my_modular_pipe.pipeline_block['text_encoder_step'].encode_prompt() I think Modular actually supports this workflow already. Is it also considered bad practice to set components as attributes in the blocks as use them that way? Something like? @torch.no_grad()
def __call__(self, pipeline, state: PipelineState) -> PipelineState:
# Get inputs and intermediates
data = self.get_block_state(state)
self.check_inputs(pipeline, data)
prompt_embeds = self.text_encoder(data.prompt) Regarding Auxillaries, Is there a strong reason to not have these objects just be considered components as well? Auto WorkflowI am a little apprehensive about introducing Auto workflows in V1. IMO it's better to let users get Modular Pipeline, Block State, Pipeline StateI like these a lot and I'm pretty much aligned on how they work. One small nit that is unrelated to the actual functionality (just putting out here for consideration) @torch.no_grad()
def __call__(self, pipeline, state: PipelineState) -> PipelineState:
# Get inputs and intermediates
data = self.get_block_state(state) Obviously the work here is very extensive and I'm still playing around with it. LMK if I've misunderstood some concepts or if I should open PRs to try and clarify any of these points. |
Thanks! These are super nice feedback! I'll address all of them, but I want to focus on PipelineBlock first because I think it is where most confusion comes from, and it indicates to me that this is where most work needs to be done to improve it! I just had enough time to think about these 2 aspect you mentioned: (1) the design choice on making pipeline blocks stateless and (2) the class attribute vs 1. Stateless Design ChoiceYes, in the current design, Pipelineblocks (
I like to think there are two stages in Modular diffusers:
# Define the depth block you were working on
class DepthBlock(PipelineBlock):
...
# another one for canny images
class CannyBlock(PipelineBlock):
...
# Combine these two into one with conditional logic
class AutoControlInputBlock(AutoPipelineBlocks):
block_classes = [DepthBlock, CannyBlock]
block_names = ["depth", "canny"]
block_trigger_inputs = ["depth_image", "canny_image"]
# combine in sequential orders
class CompleteControlNetPipeline(SequentialPipelineBlocks):
block_classes = [AutoControlInputBlock, PrepareLatentBlock, DenoiseBlock, DecodeBlock]
block_names = ["control_input", "prepare", "denoise", "decode"] you can keep composing for as long as you want, but once you're done and you want to use it now, we enter the "Runtime Stage" and that's when the pipeline blocks become stateful
# Create Modularpipeline with the block you just made
controlnet_node = ModularPipeline.from_block(CompleteControlNetPipeline())
# Load models and components
controlnet_node.update_states(**components.components)
# Run inference
image = controlnet_node(control_image=my_image, prompt="a cat", output="images") I made pipeline blocks stateless since model loading isn't needed during composition - it's only required at runtime. The design you proposed here will make pipeline block stateful. That means each pipeline block will need to manage model components themselves, and you will have to load models into each pipeline blocks and then compose them somehow. It is a possible alternative design, but I think it might need a different system to support it and it is more complex. class StableDiffusionXLTextEncoderStep(PipelineBlock):
def __init__(
self,
text_encoder=None,
text_encoder_2=None,
tokenizer=None,
tokenizer_2=None,
force_zeros_for_empty_prompt=True,
):
super().__init__()
# this would set expected_configs
self.register_to_config(force_zeros_for_empty_prompt=force_zeros_for_empty_prompt)
# this would set expected_components
self.register_component({
text_encoder=text_encoder,
text_encoder_2=text_encoder_2,
tokenizer=tokenizer,
tokenizer_2=tokenizer_2
}) 2. Component Initialization and Class AttributesAbout your comment on Component Initialization here:
I totally agree that it is very confusing that we have both The class attributes
I like to think these class attributes However, I don't think we need both the class attribute I think it might be better to remove the class DepthBlock(PipelineBlock):
expected_components = [
ComponentSpec(
name="depth_processor",
class_name=["depth_anything", "DepthPreprocessor"],
default_repo="depth-anything/Depth-Anything-V2-Large-hf"
)
]
@property
def inputs(self) -> List[InputParam]:
return [InputParam(
name="control_image",
required=True,
)]
def __call__(self, pipeline, state: PipelineState) -> PipelineState:
data = self.get_block_state(state)
depth_image = pipeline.depth_processor(data.control_image)
data.control_image = depth_image
self.add_block_state(data, state)
return pipeline, state This way, we would also be able to support the use case you described here:
currently, indeed, you would always have to add the models to What do you think? |
@DN6 Scope of Pipeline Block MethodsRegarding your questions about PipelineBlock scope and global pipeline methods, here:
Yes, you can define methods on pipeline blocks level. Currently, we have two places where methods can live:
components as attributes in blocksregarding this question
yes, with current design, it would be a bad practice since Pipeline blocks are stateless, and all the model components should be managed at the global pipeline level and passed to each pipeline block at run time through the If you think a stateful pipeline block design is more intuitive, I'd be happy to explore with you for that too:) A few things to keep in mind if we want to explore the alternative stateful design:
|
now for Auto Workflow I agree it is probably not that important for our current diffuser users, but I consider it crucial for UI use case. Since one of our goal is to eliminate the barrier between us and the UI community/professionals, I think it makes sense for us release with it let me explain a bit! Auto Workflow fits really really well with how workflows are developed. Alvaro's guides (for example, https://huggingface.co/blog/OzzyGT/outpainting-differential-diffusion) give a pretty good sense of the process. It is usually an iterative process: the user does not necessarily know exactly what's needed in the beginning, so they start with something basic and gradually add/remove features and modify part of the workflows until they get satisfactory results. Without auto workflows, they'd have to rebuild their workflow each time they want to try something different. It is not a very nice experience. With Auto Workflows (node build with auto workflow), they can pretty much just stick to the same node and just change the input nodes as they need. also, there is the number of nodes. comfy currently faces this challenge that there are too many nodes and it's bit of overwhelming for users. without auto workflow, we'd have the same issue. with auto workflow, we currently have like 5 nodes, prompt_encode/ image_encode/decode/denoise/ip-adapter. so it is very manageable I think maybe we can have different guides targeting on different user group and only talk about auto workflow for the ones targeting on UI/professionals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm better understanding some of the things here after working with it for a bit. I'll try to provide some general thoughts and introduce some ideas I had when you were initially starting the modular diffusers development:
-
No strong opinions on whether the PipelineBlocks should be stateful or not. We could ideally support both cases similar to what's done in the Diffusers Hooks.
-
Each PipelineBlock IMO should only contain minimal implementation, i.e. bare-minimum single functionality and not handle too many overlapping cases. For example,
pipeline.encode_prompt
and similar that operate on both unconditional and conditional branches should probably just support one optionprompt
. The pipeline block can then invoke this method twice - once for positive, once for negative. -
If methods like
encode_prompt
could have a functional equivalent that can be invoked from outside a pipeline/pipeline-blocks, I think it would be super helpful for re-using in trainers instead of rolling our own minified implementation. -
We should consider batching vs non-batching inference. Currently with existing pipelines, we always batch negative and positive prompt embeds. This increase memory required from intermediate activation states by 2x. For a low VRAM mode, this might be an important consideration. (It's not very important. We can always add a BatchedInferenceHook or something to the
model::forward
to split the args/kwargs along batch dimension) -
Currently, the invocation mode is eager. Something like:
I_AM_AT_BLOCK_X -> DO_I_HAVE_THE_INPUTS_I_REQUIRE? ---> YES ---> PERFORM_COMPUTATION_AND_PROCEED_TO_NEXT_BLOCK |--> NO ---> RAISE_ERROR
If we're somewhere deep inside the execution stage and then error out (maybe due to a missing input), all computation done till now is lost for a silly error. This is very frustrating (I've personally faced it multiple times during model integrations). IMO we have an opportunity to improve this (perhaps, some time in the near future if not for now). Since we already know that each block requires a set of inputs and outputs, regardless of what the other blocks do, we can topologically traverse the graph of blocks in reverse to determine if all inputs/outputs mapping is correct. If not, we can early-error out and let the user know. If yes, we can proceed with computation.
Note that this won't help identify issues in cases where we simply forgot to pass an input to a model or something, but it'll be helpful in block-development cases -- we're simply doing a static analysis to make sure that the invocation graph makes sense on a high-level from the pipeline one creates. -
Regarding
_execution_device
anddtype
on pipeline, I think we should remove it and instead infer device/dtype from the module that is going to do the processing next. For example, if my text encoder is in float16 but transformer is in bfloat16,dtype
on pipeline will returnfloat16
. So, prompt embeds will be in different dtype leading to error on transformer unless we explicitly write some logic to handle this in the pipeline. Writing it per-model block is prone to errors and can introduce lossy conversions, so it might be nice to push to keep the pipeline as a simple container holding modules and remove any notion of module state from it, and handle these device/dtype-changes more centrally (likepipeline.prepare_inputs_for_model(model, inputs)
) (just my thoughts and not really at issue here) -
Have we thought about how a pipeline created by a user can be shared via an exported file, say on the Hub, for ease of distribution?
return noise_cfg | ||
|
||
|
||
class CFGGuider: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR; let's try to separate algorithms from the modeling/pipeline implementations as much as possible. If we can decouple CFG nicely, I believe we would have lot more composibility and options for testing. Let's try to write these in a manner that works with existing pipelines too if we invoke __call__
with a guider object.
I like this design. For some time now, I've wanted to add support for different guidance techniques (STG, Perturbed Attention guidance, Energy-based CFG, Skip layer guidance, etc.) to all existing models/pipelines where applicable. As I'm working on something similar, I'll share some thoughts.
These techniques are independent of the model/pipeline, so it makes sense to me that we should not tie in that logic too strongly to the pipelines. At the moment, our pipelines only accept parameters like guidance_scale
, guidance_rescale
, true_cfg_scale
, and similar. This is not really scalable if we want composability while supporting latest research techniques. So, this design of being able to initialize "guiders" is super cool, since we can parameterize them however we want and since it's decoupled from the pipelines __call__
and model forward
itself.
To provide some more details of what I've been trying , this is some pseudo-code:
from diffusers.hooks import HookRegistry, PerturbedAttentionGuidanceHook
class GuidanceMixin:
def register_modules(self, denoiser: torch.nn.Module, ...) -> None:
...
def unregister_modules(self, denoiser: torch.nn.Module, ...) -> None:
...
def prepare_inputs(self, **kwargs) -> Any:
parameters = inspect.signature(self._prepare_inputs).parameters
ignored_kwargs = {k for k in kwargs.keys() if k not in parameters}
input_kwargs = {k: v for k, v in kwargs.items() if k in parameters}
return self._prepare_inputs(**input_kwargs)
def __call__(self, **kwargs) -> Any:
parameters = inspect.signature(self.forward).parameters
ignored_kwargs = {k for k in kwargs.keys() if k not in parameters}
input_kwargs = {k: v for k, v in kwargs.items() if k in parameters}
return self.forward(**input_kwargs)
def _prepare_inputs(self, **kwargs) -> Any:
raise NotImplementedError
class ClassifierFreeGuidance(GuidanceMixin):
def __init__(self, scale: float) -> None:
self.scale = scale
def _prepare_inputs(self, latents: torch.Tensor, prompt_embeds: torch.Tensor, negative_prompt_embeds: Optional[torch.Tensor] = None, generator: Optional[torch.Generator] = None) -> torch.Tensor:
if self.scale > 1.0:
latents = torch.cat([latents, torch.zeros_like(latents).normal_(generator=generator)])
prompt_embeds = torch.cat([negative_prompt_embeds, prompt_embeds])
return {"latents": latents, "prompt_embeds": prompt_embeds}
def forward(self, x_uncond: torch.Tensor, x_cond: torch.Tensor) -> torch.Tensor:
return x_uncond + self.scale * (x_cond - x_uncond)
class PerturbedAttentionGuidance(GuidanceMixin):
def __init__(self, scale: float, cfg_scale: float, layers: Union[str, List[str]]) -> None:
self.scale = scale
self.cfg_scale = scale
self.layers = [layers] if isinstance(layers, str) else layers
def register_modules(self, denoiser: torch.nn.Module, ...) -> None:
for name, submodule in denoiser.named_modules():
if any(regex_match(name, layer_name) for layer_name in self.layers):
registry = HookRegistry.check_if_exists_or_initialize(submodule)
hook = PerturbedAttentionGuidanceHook()
registry.register_hook(hook)
def prepare_inputs(self, latents: torch.Tensor, prompt_embeds: torch.Tensor, negative_prompt_embeds: Optional[torch.Tensor] = None, generator: Optional[torch.Generator] = None) -> torch.Tensor:
num_additional_latents = (self.scale > 1.0) + (self.cfg_scale > 1.0)
if num_additional_latents > 0:
additional_latents = [torch.zeros_like(latents).normal_(generator=generator) for _ in range(num_additional_latents)]
latents = torch.cat([latents, *additional_latents])
... # Similarly handle prompt embeddings
return ...
def forward(self, x_uncond: torch.Tensor, x_cond: torch.Tensor) -> torch.Tensor:
...
from diffusers import FluxPipeline
from diffusers.guidance import ClassifierFreeGuidance, PerturbedAttentionGuidance
pipe = FluxPipeline.from_pretrained(...)
pipe.to("cuda")
cfg = ClassifierFreeGuidance(scale=7.0)
pag = PerturbedAttentionGuidance(scale=5.0, layers=["transformer_blocks\.(20|24)"])
cfg_output = pipe(..., guidance=cfg)
pag_output = pipe(..., guidance=pag)
In the existing pipelines, we will invoke the prepare_inputs
and __call__
methods in a non-backwards-breaking manner. For the new modular diffusers, we can customize as required. As the guidance objects are lightweight to create, one can modify it on-the-fly, which would be super useful for UI cases and experimentation.
A pet peeve I have is needing to write additional attention processors for a method like PAG. Per model processors are hard to maintain for all kinds of techniques available, with all kinds of permutations possible. This introduces limitations. Since we know that most modeling implementations use our Attention
class, or atleast follow similar naming conventions, one way of making this technique generally applicable is utilizing some sort of pre/post-forward hook that can perform the attention-branch shortcut required in PAG. This would be a single addition to address all models at once, because we follow certain strict naming conventions of layers.
As guiders can be stateful (for example, disabling guidance after certain number of steps should remove the unconditional latent/prompt embeddings, or guidance scale could be adaptive to amount of low-frequency/high-frequency noise in latent), I really like that we can do reset_guider
. IMO, we should mark this as stateful/un-stateful using a flag like _is_stateful = True
(similar to
_is_stateful = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@a-r-r-o-w feel free to take over the guider and refactor it :)
thanks @a-r-r-o-w! insightful as always:)
not sure I understand what this means "Each PipelineBlock IMO should only contain minimal implementation"; but based on the example you provided I think we are aligned. @hlky is working on a refactor on some of the pipeline methods to do just what you described. we are also considering making them class method so it can be invoked outside of pipeline/pipeline-blocks. Please take a look there and share your thoughts! #10726
hook approach sounds good, or a special optimized denoising block. feel free to explore and it can be part of the offloading strategy we offer on components manager, e.g. if user does not have enough memory, we automatically run the non-batch inference.
actually, we are already doing that. when combine a few pipeline blocks in a sequential order, we loop through the blocks to find out the overall
basically say we have 3 blocks we want to combine in sequential order
each block has we look through the blocks,
once we have this I think we can add a happy to work a bit more on this with you! I think it is very important feature. I can start to add some test cases for things we are already covered - and you can help to see if we miss any use cases. what do you think?
agree, feel free to help refacter later!
I think we should share vis hub but haven't explored about that yet - feel free to take a stab on it! |
@yiyixuxu
I think this is fine.
I do prefer that. One case can I think of is if I try to replace a step in a Pipeline e.g. the encode prompt step and then I try
I'm cool with having the components be managed at the global level. I agree it would get complicated if the components are attached to blocks. I think I was trying to convey that class MyPipelineBlock:
def __init__(self, vae):
self.register_component(vae=vae)
vae = AutoencoderKL.from_pretrained("..")
pipe = ModularPipeline.from_block(MyPipelineBlock(vae=vae))
# these would point to the same object
pipe.vae == pipe.blocks("vae_step").vae But the ComponentSpec solution also works for this case 👍🏽 And the other points such as not being able to use different VAE's at different steps makes sense.
I think nothing in the current design prevents stateful blocks though. I think we need a bit more clarity on how to create/manage them correctly. e.g in the DepthBlock solution you proposed class DepthBlock(PipelineBlock):
expected_components = [
ComponentSpec(
name="depth_processor",
class_name=["depth_anything", "DepthPreprocessor"],
default_repo="depth-anything/Depth-Anything-V2-Large-hf"
)
]
@property
def inputs(self) -> List[InputParam]:
return [InputParam(
name="control_image",
required=True,
)]
def __call__(self, pipeline, state: PipelineState) -> PipelineState:
data = self.get_block_state(state)
depth_image = pipeline.depth_processor(data.control_image)
data.control_image = depth_image
self.add_block_state(data, state)
return pipeline, state If we're creating the Another thought I had was, suppose a user has created a custom Pipeline Block with a model component (config and weights), and custom code and is hosting both on the Hub. Would we allows something like |
Let me try to move all the global pipeline methods to the block level first - If I'm able to do that, I think maybe we won't need global pipeline methods at all, so things would be easier
I was thinking something similar to model_index. so we would re-use same approach (potentially code) to handle. So a different library should not be problem (like we do for text_encoders from transformers); and if it's defined in the same file, maybe we can do something similar to what we do for these diffusers modules that we cannot import from top level like here . I haven't really thought through about it, though; if you have good suggestions, let me know!!
I'm not sure how custom code would work for now (like how we share the code on hub and load them), let me know if you have good ideas! but yes I think we should add a loading method to pipeline blocks! we should also allow attaching components manager to the pipeline blocks so that things loaded from the pipeline block will be registered to the components manager adding a loading method on pipeline blocks will also be able to support the use case you described earlier before #9672 (comment)
instead of this (we could still support this in the future after we have the AutoModel class) components.add_from_pretrained("xinsir/controlnet-union-sdxl-1.0", torch_dtype=torch.float16) we could already do something like this: control_block.add_from_pretrained(repo, components_manager = components) because, we will add class info for |
@yiyixuxu Let me try think a bit and put something together for this. |
Hello everyone, I was just browsing around here. It seems like things are going really well! I have a personal project where I'm building a WebUI for diffusion (currently only supporting XL) but I was implementing everything you've already documented in the diffusers documentation, such as Text-to-Image, Image-To-Image, Inpainting, IP-Adapter, Controlnet. I also implemented third-party flows such as InstantID for example. In my project I ended up building a single pipeline that unifies all these functionalities, but this gives me a headache later on to maintain. Still, I needed to build a class that would manage this pipeline to handle situations where I need to pre-process the input before ingesting it into the pipeline, for example, (I'm in the interface and I change the prompt generation mechanism to LWP or Compel) in addition to situations where I need to unload models and load others, remove optimizations and reapply optimizations, etc. I believe this project will make the pipeline building process much easier. Congratulations on your work! In my free time I will try to simulate situations that I have already implemented in this new model to see how easy or difficult it is to use and understand the limitations. |
Hi, I would like to share some changes I made to my local version in order to reduce the verbosity a bit. I took as an example the use of the refiner. This is just a proposal, I have provisionally implemented everything in a subclass of ComponentsManager called ExpandedComponentsManager Direct access to intermediates' attributesFirst I added the "getattr" method to PipelineState so that it is possible to access the intermediates' attributes directly by name. def __getattr__(self, name):
if name in self.intermediates:
value = self.intermediates[name]
return value
if name in self.inputs:
value = self.inputs[name]
return value
raise AttributeError(f"'{self.__class__.__name__}' object has no attribute '{name}'") So instead of using this: images = output.get_intermediate("images").images
latents = output.get_intermediate("latents") if you prefer, you can use this: images = output.images
latents = output.latents Expansion the "add from_pretrained" factory method from ComponentsManagerExpanded the "add from_pretrained" factory method to load individual modules directly, as well as allowing third-party modules to be loaded into the Component Manager. I created an Enum list to fix the component types and make them more accessible, any new component needs to be added to this list. class ComponentType(Enum):
"""
Component Type List
"""
CONTROLNET = "controlnet"
CONTROLNET_GUIDER = "controlnet_guider"
CONTROLNET_PREPROCESSOR = "controlnet_preprocessor"
FEATURE_EXTRACTOR = "feature_extractor"
GUIDER = "guider"
IMAGE_ENCODER = "image_encoder"
SCHEDULER = "scheduler"
TEXT_ENCODER = "text_encoder"
TEXT_ENCODER_2 = "text_encoder_2"
TOKENIZER = "tokenizer"
TOKENIZER_2 = "tokenizer_2"
UNET = "unet"
VAE = "vae" Usage e.g: manager = ExpandedComponentsManager()
manager.add_from_pretrained("SG161222/RealVisXL_V5.0", torch_dtype=torch.float16)
manager.add_from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0",
class_name="UNet2DConditionModel",
prefix="refiner",
component_type=ComponentType.UNET,
subfolder="unet",
variant="fp16",
torch_dtype=torch.float16)
manager.add_from_pretrained("lllyasviel/Annotators", class_name=LineartAnimeDetector, component_type=ComponentType.CONTROLNET_PREPROCESSOR) # example of third-party component Factory method for the pipelineI created a factory method for the pipeline to centralize, to be responsible for creating and injecting components into the new pipeline automatically. If there is a need to specify components, as in the case of the refiner, they must be passed in the method's components attribute. The refiner's unet can be passed directly with the prefix, the method will already handle cases with a prefix and assign them to the appropriate component type. base_pipeline = manager.create_pipeline(StableDiffusionXLAutoPipeline)
refiner_pipeline = manager.create_pipeline(StableDiffusionXLAutoPipeline,
components=["text_encoder_2", "tokenizer_2", "vae", "scheduler", "refiner_unet"],
force_zeros_for_empty_prompt=True,
requires_aesthetics_score=True) Full reproductionimport torch
from enum import Enum
from typing import Any, Dict, List, Optional, Type, Union
from diffusers import ModularPipeline, StableDiffusionXLAutoPipeline
import diffusers
from diffusers.pipelines.components_manager import ComponentsManager
from diffusers.utils import logging
from controlnet_aux import LineartAnimeDetector
logger = logging.get_logger(__name__) # pylint: disable=invalid-name
class ComponentType(Enum):
"""
Component Type List
"""
CONTROLNET = "controlnet"
CONTROLNET_GUIDER = "controlnet_guider"
CONTROLNET_PREPROCESSOR = "controlnet_preprocessor"
FEATURE_EXTRACTOR = "feature_extractor"
GUIDER = "guider"
IMAGE_ENCODER = "image_encoder"
SCHEDULER = "scheduler"
TEXT_ENCODER = "text_encoder"
TEXT_ENCODER_2 = "text_encoder_2"
TOKENIZER = "tokenizer"
TOKENIZER_2 = "tokenizer_2"
UNET = "unet"
VAE = "vae"
class ExpandedComponentsManager(ComponentsManager):
"""Manages pipeline components and scoped states."""
def __init__(self):
super().__init__()
def create_pipeline(
self,
pipeline_block,
components: Optional[List[str]] = None,
**default_kwargs
) -> ModularPipeline:
"""
Creates a pre-configured ModularPipeline from a pipeline block.
Args:
pipeline_block: The pipeline block class (e.g., StableDiffusionXLAutoPipeline) or instance.
components (Optional[List[str]]): List of component names to load. If a name includes a prefix
(e.g., "refiner_unet"), the prefix is stripped, and the base name (e.g., "unet") is matched
against expected_components. If no prefix is present (e.g., "text_encoder"), the full name
is used as-is. Overrides expected_components if provided.
**default_kwargs: Additional components or configuration overrides (e.g., force_zeros_for_empty_prompt).
Returns:
ModularPipeline: The configured pipeline instance.
Notes:
- If a component name like "refiner_unet" is provided, it is loaded into the "unet" slot of
expected_components.
- Names without a prefix (e.g., "text_encoder") are treated as direct matches for expected_components.
- If "unet" is not explicitly provided and exists in the manager, it is used as a fallback.
"""
# Instantiate the pipeline block if it's a class
if isinstance(pipeline_block, type):
pipe = ModularPipeline.from_block(pipeline_block())
else:
pipe = ModularPipeline.from_block(pipeline_block)
# Initialize components dictionary
components_dict: Dict[str, Any] = {}
# Determine which components to load
if components is not None:
# Process explicitly specified components
for comp_name in components:
if "_" in comp_name:
# Split into potential prefix and base name
prefix, base_name = comp_name.rsplit("_", 1)
# Check if base_name matches an expected component
if base_name in pipe.expected_components:
if comp_name in self.components:
components_dict[base_name] = self.get([comp_name])[comp_name]
else:
logger.warning(f"Component '{comp_name}' not found in ComponentsManager.")
else:
# Treat as a full name if base_name isn't an expected component
if comp_name in self.components:
components_dict[comp_name] = self.get([comp_name])[comp_name]
else:
logger.warning(f"Component '{comp_name}' not found in ComponentsManager.")
else:
# No underscore, use the name directly
if comp_name in self.components:
components_dict[comp_name] = self.get([comp_name])[comp_name]
else:
logger.warning(f"Component '{comp_name}' not found in ComponentsManager.")
else:
# Default to expected_components from the pipeline block
components_dict = self.get([key for key in pipe.expected_components if key in self.components])
# Apply any additional kwargs (overrides or extras)
components_dict.update(default_kwargs)
pipe.update_states(**components_dict)
return pipe
def add_from_pretrained(
self,
pretrained_model_name_or_path: str,
class_name: Union[str, Type] = "DiffusionPipeline",
component_type: Optional[ComponentType] = None,
prefix: Optional[str] = None,
**kwargs
) -> None:
"""
Load components from a pretrained model and add them to the ComponentsManager.
Args:
pretrained_model_name_or_path (str): The path to a pretrained model directory or a model identifier
from the Hugging Face model hub (e.g., "SG161222/RealVisXL_V5.0").
class_name (str or Type, optional): The name of the pipeline class (e.g., "DiffusionPipeline") or
the class itself to instantiate the model. Defaults to "DiffusionPipeline". If a string, it is
resolved from the `diffusers` module; if a class, it is used directly.
component_type (ComponentType, optional): Type of the component to determine its name if the loaded
module has no `components` attribute. Takes precedence over `component_name`.
prefix (str, optional): Prefix to prepend to all component names loaded from this model. If provided,
components will be named as "{prefix}_{component_name}". Must be a non-empty string.
**kwargs: Additional keyword arguments passed to the `from_pretrained` method of the pipeline class.
Returns:
None
Raises:
ValueError: If `class_name` is a string not found in `diffusers`, or if `prefix` is invalid.
AttributeError: If the loaded module does not support `from_pretrained`.
"""
# Resolve the pipeline class
if isinstance(class_name, str):
try:
class_module = getattr(diffusers, class_name)
except AttributeError:
raise ValueError(f"Class '{class_name}' not found in the 'diffusers' module.")
elif isinstance(class_name, type):
class_module = class_name
else:
raise ValueError(f"'class_name' must be a string or a class, got {type(class_name)}.")
# Load the pretrained model
module = class_module.from_pretrained(pretrained_model_name_or_path, **kwargs)
# Validate prefix
if prefix is not None:
if not isinstance(prefix, str) or not prefix.strip():
raise ValueError("'prefix' must be a non-empty string.")
# Helper function to generate component name with optional prefix
def get_component_name(base_name: str) -> str:
return f"{prefix}_{base_name}" if prefix else base_name
# Module has no components attribute (single component)
if not hasattr(module, "components"):
name = component_type.value if component_type and component_type.value else None
if not name:
raise ValueError("'component_type' must be provided when it is a single component.")
component_name = get_component_name(name)
if component_name not in self.components:
self.add(component_name, module)
else:
logger.warning(
f"Component '{component_name}' already exists in ComponentsManager and will not be added. "
f"To overwrite, remove it first with remove('{component_name}'), or use a different prefix."
)
return
# Module has components (e.g., a pipeline with multiple parts)
for name, component in module.components.items():
if component is None:
continue
component_name = get_component_name(name)
if component_name not in self.components:
self.add(component_name, component)
else:
logger.warning(
f"Component '{component_name}' already exists in ComponentsManager and will not be added. "
f"To overwrite, remove it first with remove('{component_name}'), "
f"or use a different prefix (e.g., prefix='{prefix}_2' if prefix='{prefix}')."
)
# The process starts here
manager = ExpandedComponentsManager()
manager.add_from_pretrained("SG161222/RealVisXL_V5.0", torch_dtype=torch.float16)
manager.add_from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0",
class_name="UNet2DConditionModel",
prefix="refiner",
component_type=ComponentType.UNET,
subfolder="unet",
variant="fp16",
torch_dtype=torch.float16)
#manager.add_from_pretrained("lllyasviel/Annotators", class_name=LineartAnimeDetector, component_type=ComponentType.CONTROLNET_PREPROCESSOR) # example of third-party component
manager.enable_auto_cpu_offload()
# Create pipelines
base_pipeline = manager.create_pipeline(StableDiffusionXLAutoPipeline)
refiner_pipeline = manager.create_pipeline(StableDiffusionXLAutoPipeline,
components=["text_encoder_2", "tokenizer_2", "vae", "scheduler", "refiner_unet"],
force_zeros_for_empty_prompt=True,
requires_aesthetics_score=True)
prompt = "A majestic lion jumping from a big stone at night"
height = 1024
width = 1024
# First Step - Base Pipeline
output = base_pipeline(prompt=prompt, height=height, width=width, num_inference_steps=30)
output.images[0][0].save("step1_output.png")
image_latents = output.latents
# Second Step - Refiner
output = refiner_pipeline(prompt=prompt, image_latents=image_latents, num_inference_steps=30)
output.images[0][0].save("step2_output.png") Other ideas I was testing:State scope: Shared, Session, Unique.
But I don't know if the applicability of this would be useful. In fact, I was trying to automatically inject the outputs of the base pipeline as input to the refiner pipeline. In this case, image_latents. But for this, it would be necessary to have a fixed list of expected inputs and expected outputs, and they would need to have the same name. It couldn't be "latents" for output and "image_latents" for input. I think this could break the idea of AutoPipeline since we don't want to restrict the possible inputs and outputs. But that was just a delusion of mine, I can't remember now any other situation besides the refiner that could pass these inputs automatically. |
Getting Started with Modular Diffusers
With Modular Diffusers, we introduce a unified pipeline system that simplifies how you work with diffusion models. Instead of creating separate pipelines for each task, Modular Diffusers let you:
Write Only What's New: You won't need to rewrite the entire pipeline from scratch. You can create pipeline blocks just for your new workflow's unique aspects and reuse existing blocks for existing functionalities.
Assemble Like LEGO®: You can mix and match blocks in flexible ways. This allows you to write dedicated blocks for specific workflows, and then assemble different blocks into a pipeline that that can be used more conveniently for multiple workflows. Here we will walk you through how to use a pipeline like this we built with Modular diffusers! In later sections, we will also go over how to assemble and build new pipelines!
Quick Start with
StableDiffusionXLAutoPipeline
Auto Workflow Selection
The pipeline automatically adapts to your inputs:
prompt
image
inputimage
andmask_image
control_image
Auto Documentations
We care a great deal about documentation here at Diffusers, and Modular Diffusers carries this mission forward. All our pipeline blocks comes with complete docstrings that automatically compose as you build your pipelines. This means
inspect your pipeline
see an example of output
use
get_execution_blocks
to see which blocks will run for your inputs/workflow, for example, if you want to run a text-to-image controlnet workflow, you can do thissee the docstring relevant to your inputs/workflow
Advanced Workflows
Once you've created the auto pipeline, you can use it for different features as long as you add the required components and pass the required inputs.
Here is an example you can run for a more complex workflow using controlnet/IP-Adapter/Lora/PAG
check out more usage examples here
test1: complete testing script for `StableDiffusionXLAutoPipeline`
Modular Setup
StableDiffusionXLAutoPipeline
is a very convenient preset; Just like the LEGO sets, you can break it down and reassemble and rearrange the pipeline blocks however you want. A more modular setup would look like this:With this setup, you precompute embeddings and reuse them across different denoise backends or with different inference parameters such as
guidance_scale,
num_inference_steps,
or use different schedulers. You can modify your workflow by simply adding/removing/swapping blocks without recomputing the entire pipeline over and over again.check out the full example script here
test2: modular setup
This is the full testing script I used for more configuration, including inpainting/refiner/union controlnet/APGtest3: modular setup with IPAdapter
Developer Guide: Building with Modular Diffusers
Core Components Overview
The Modular Diffusers architecture consists of four main components:
ModularPipeline
The main interface for creating and running modular pipelines. Unlike traditional pipelines, you don't write it from scratch - it builds itself from pipeline blocks! Example usage:
PipelineBlock
The fundamental building block, similar to a mellon/comfy node. Each block:
__call__(pipeline, state) -> (pipeline, state)
MultiPipelineBlocks
Combines multiple blocks into a bigger one! These combined blocks behave just like single blocks - with their own inputs, outputs, and components, but they are able to handle more complex workflows!
We have two types of MultiPipelineBlocks available, you can use them to combine individual blocks into ready-to-use sets (Like LEGO® presets!)
SequentialPipelineBlocks
AutoPipelineBlocks
AutoPipelineBlocks
makes the complexif.. else..
logic in your code disappear! with this, you can write blocks for specific use case to keep your code path clean; and useAutoPipelineBlocks
to combine blocks into convenient presets that can provide a better user experience :)ControlNetDenoiseStep
step will be dispatched when "control_image" is passed from the user, otherwise, it will run the defaultDenoseStep
PipelineState and BlockStates
PipelineState
andBlockStates
manage dataflow between/inside blocks; they make debugging really easy! feel free to print out them at any given time to have an overview of all the shapes/types/values of your pipeline/block statesDifferential Diffusion Example
Here we'll show you a new way to build with Modular Diffusers. Let's look at implementing a Differential Diffusion pipeline as an example. (https://differential-diffusion.github.io/). It is, in a sense, an image-to-image workflow, so we can start with the preset of pipeline blocks we used to build our current img2img pipeline (
IMAGE2IMAGE_BLOCKS
) and see how we can build this new pipeline with them!It seems like we can reuse the
"text_encoder"
,"ip_adapter"
,"image_encoder"
,"input"
,"prepare_add_cond"
and"decode"
steps from img2img workflow out-of-box. The"set_timesteps"
step in Differential Diffusion is the same as the one we use for text-to-image (i.e. it does not takestrength
parameter), so we just useStableDiffusionXLSetTimestepsStep
. It uses a different denoising method so we will need to write a new"denoise"
step, and the"prepare_latents"
step is also a little bit different, so we will write a new one too.Here are the changes needed to create the Differential Diffusion version of these blocks:
StableDiffusionXLImg2ImgPrepareLatentsStep
:StableDiffusionXLDenoiseStep
step: we remove inpaint-related logics and added diff-diff specific logicThat's all there is to it! Once you've made these 2 diff-diff blocks, you can create a preset(pre-assembled sets of blocks) and then build your pipeline from it.
to use it
Complete Example: Implementing Differential Diffusion Pipeline
Diffusers as seen in nodes
coming up soon....
Next Steps