How to generate two images simultaneously? #11609

adha9990 · 2025-05-25T20:04:35Z

adha9990
May 25, 2025

Hi, I’m a student researching few-shot defect image generation using Stable Diffusion. I’m trying to generate both defect images and corresponding defect masks simultaneously.

In the DualAnoDiff project, the team achieved this functionality by modifying the older version of train_dreambooth_lora.py and LoRAAttnProcessor2_0 in the Diffusers library. Specifically, they adjusted the LoRAAttnProcessor2_0 code to allow the simultaneous generation of two outputs: a defective image and a defect mask.

Problem

However, in the latest version of the Diffusers library, LoRAAttnProcessor2_0 has been simplified to just a pass implementation, I understand that Diffusers now use PEFT to manage LoRA,but I am unsure of how to adapt or extend the LoRAAttnProcessor2_0 in the new version to achieve the same dual output functionality.

Request

Could you provide guidance on how to modify the new LoRAAttnProcessor2_0 in the Diffusers library so that it can generate both the defective image and the defect mask? Alternatively, if there are any recommended examples or modifications to achieve this, I would greatly appreciate your input.

A code example would be incredibly helpful.

Thank you for your help!

Here is the error report from my test: I referred to pipeline_stable_diffusion.py doubled the dimensions, but an error occurred during prediction in self.unet. I know the error is due to incorrect dimensions, but I'm not sure how to modify it to generate two images.

nohup: ignoring input
05/25/2025 19:36:57 - INFO - __main__ - Distributed environment: DistributedType.MULTI_GPU  Backend: nccl
Num processes: 2
Process index: 0
Local process index: 0
Device: cuda:0

Mixed precision type: bf16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'prediction_type', 'sample_max_value', 'dynamic_thresholding_ratio', 'clip_sample_range', 'variance_type', 'rescale_betas_zero_snr', 'timestep_spacing', 'thresholding'} was not found in config. Values will be initialized to default values.
05/25/2025 19:36:57 - INFO - __main__ - Distributed environment: DistributedType.MULTI_GPU  Backend: nccl
Num processes: 2
Process index: 1
Local process index: 1
Device: cuda:1

Mixed precision type: bf16

{'force_upcast', 'mid_block_add_attention', 'scaling_factor', 'use_quant_conv', 'latents_std', 'latents_mean', 'use_post_quant_conv', 'shift_factor'} was not found in config. Values will be initialized to default values.
All model checkpoint weights were used when initializing AutoencoderKL.

All the weights of AutoencoderKL were initialized from the model checkpoint at ./models/stable-diffusion-2-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use AutoencoderKL for predictions without further training.
{'time_embedding_act_fn', 'class_embeddings_concat', 'cross_attention_norm', 'resnet_out_scale_factor', 'projection_class_embeddings_input_dim', 'addition_embed_type_num_heads', 'conv_out_kernel', 'addition_embed_type', 'dropout', 'only_cross_attention', 'time_embedding_dim', 'num_class_embeds', 'num_attention_heads', 'transformer_layers_per_block', 'timestep_post_act', 'resnet_time_scale_shift', 'attention_type', 'time_embedding_type', 'conv_in_kernel', 'encoder_hid_dim', 'addition_time_embed_dim', 'upcast_attention', 'reverse_transformer_layers_per_block', 'time_cond_proj_dim', 'class_embed_type', 'mid_block_only_cross_attention', 'resnet_skip_time_act', 'mid_block_type', 'encoder_hid_dim_type'} was not found in config. Values will be initialized to default values.
All model checkpoint weights were used when initializing UNet2DConditionModel.

All the weights of UNet2DConditionModel were initialized from the model checkpoint at ./models/stable-diffusion-2-base.
If your task is similar to the task the model of the checkpoint was trained on, you can already use UNet2DConditionModel for predictions without further training.
05/25/2025 19:36:59 - INFO - __main__ - ***** Running training *****
05/25/2025 19:36:59 - INFO - __main__ -   Num examples = 18
05/25/2025 19:36:59 - INFO - __main__ -   Num batches each epoch = 9
05/25/2025 19:36:59 - INFO - __main__ -   Num Epochs = 112
05/25/2025 19:36:59 - INFO - __main__ -   Instantaneous batch size per device = 1
05/25/2025 19:36:59 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 2
05/25/2025 19:36:59 - INFO - __main__ -   Gradient Accumulation steps = 1
05/25/2025 19:36:59 - INFO - __main__ -   Total optimization steps = 1000

Steps:   0%|          | 0/1000 [00:00<?, ?it/s]
Steps:   0%|          | 1/1000 [00:00<11:15,  1.48it/s]
Steps:   0%|          | 1/1000 [00:00<11:15,  1.48it/s, loss=0.011, lr=2e-5]
Steps:   0%|          | 2/1000 [00:00<06:57,  2.39it/s, loss=0.011, lr=2e-5]
Steps:   0%|          | 2/1000 [00:00<06:57,  2.39it/s, loss=0.136, lr=2e-5]
Steps:   0%|          | 3/1000 [00:01<05:32,  3.00it/s, loss=0.136, lr=2e-5]
Steps:   0%|          | 3/1000 [00:01<05:32,  3.00it/s, loss=0.113, lr=2e-5]
Steps:   0%|          | 4/1000 [00:01<04:52,  3.40it/s, loss=0.113, lr=2e-5]
Steps:   0%|          | 4/1000 [00:01<04:52,  3.40it/s, loss=0.0096, lr=2e-5]
Steps:   0%|          | 5/1000 [00:01<04:30,  3.68it/s, loss=0.0096, lr=2e-5]
Steps:   0%|          | 5/1000 [00:01<04:30,  3.68it/s, loss=0.097, lr=2e-5] 
Steps:   1%|          | 6/1000 [00:01<04:15,  3.88it/s, loss=0.097, lr=2e-5]
Steps:   1%|          | 6/1000 [00:01<04:15,  3.88it/s, loss=0.0461, lr=2e-5]
Steps:   1%|          | 7/1000 [00:02<04:06,  4.03it/s, loss=0.0461, lr=2e-5]
Steps:   1%|          | 7/1000 [00:02<04:06,  4.03it/s, loss=0.083, lr=2e-5] 
Steps:   1%|          | 8/1000 [00:02<04:00,  4.12it/s, loss=0.083, lr=2e-5]
Steps:   1%|          | 8/1000 [00:02<04:00,  4.12it/s, loss=0.104, lr=2e-5]
Steps:   1%|          | 9/1000 [00:02<03:47,  4.36it/s, loss=0.104, lr=2e-5]
Steps:   1%|          | 9/1000 [00:02<03:47,  4.36it/s, loss=0.0234, lr=2e-5]{'image_encoder'} was not found in config. Values will be initialized to default values.


Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]�[A{'timestep_spacing', 'prediction_type'} was not found in config. Values will be initialized to default values.
Loaded scheduler as PNDMScheduler from `scheduler` subfolder of ./models/stable-diffusion-2-base.
Instantiating AutoencoderKL model under default dtype torch.bfloat16.
{'force_upcast', 'mid_block_add_attention', 'scaling_factor', 'use_quant_conv', 'latents_std', 'latents_mean', 'use_post_quant_conv', 'shift_factor'} was not found in config. Values will be initialized to default values.
All model checkpoint weights were used when initializing AutoencoderKL.

All the weights of AutoencoderKL were initialized from the model checkpoint at ./models/stable-diffusion-2-base/vae.
If your task is similar to the task the model of the checkpoint was trained on, you can already use AutoencoderKL for predictions without further training.
Loaded vae as AutoencoderKL from `vae` subfolder of ./models/stable-diffusion-2-base.
Loaded feature_extractor as CLIPImageProcessor from `feature_extractor` subfolder of ./models/stable-diffusion-2-base.
Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of ./models/stable-diffusion-2-base.

Loading pipeline components...: 100%|██████████| 6/6 [00:00<00:00, 69.75it/s]
05/25/2025 19:37:01 - INFO - __main__ - Running validation... 
Generating 4 images.
prompt: a vfx with sks
prompt_fg: sks

{'use_karras_sigmas', 'euler_at_final', 'solver_order', 'solver_type', 'use_beta_sigmas', 'thresholding', 'use_exponential_sigmas', 'use_flow_sigmas', 'flow_shift', 'use_lu_lambdas', 'lambda_min_clipped', 'prediction_type', 'sample_max_value', 'variance_type', 'rescale_betas_zero_snr', 'timestep_spacing', 'lower_order_final', 'algorithm_type', 'dynamic_thresholding_ratio', 'final_sigmas_type'} was not found in config. Values will be initialized to default values.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/train_dreambooth_lora.py", line 1573, in <module>
[rank0]:     main(args)
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/train_dreambooth_lora.py", line 1499, in main
[rank0]:     images = log_validation(
[rank0]:              ^^^^^^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/train_dreambooth_lora.py", line 164, in log_validation
[rank0]:     image = pipeline(**pipeline_args, generator=generator).images
[rank0]:             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 1029, in __call__
[rank0]:     noise_pred = self.unet(
[rank0]:                  ^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/accelerate/utils/operations.py", line 818, in forward
[rank0]:     return model_forward(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/accelerate/utils/operations.py", line 806, in __call__
[rank0]:     return convert_to_fp32(self.model_forward(*args, **kwargs))
[rank0]:                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
[rank0]:     return func(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/diffusers/models/unets/unet_2d_condition.py", line 1214, in forward
[rank0]:     sample, res_samples = downsample_block(
[rank0]:                           ^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/diffusers/models/unets/unet_2d_blocks.py", line 1270, in forward
[rank0]:     hidden_states = attn(
[rank0]:                     ^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/diffusers/models/transformers/transformer_2d.py", line 427, in forward
[rank0]:     hidden_states = block(
[rank0]:                     ^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/wtlee4070s/bluestar/diffusion-model-test/diffusers/models/attention.py", line 558, in forward
[rank0]:     hidden_states = attn_output + hidden_states
[rank0]:                     ~~~~~~~~~~~~^~~~~~~~~~~~~~~
[rank0]: RuntimeError: The size of tensor a (8192) must match the size of tensor b (4096) at non-singleton dimension 1

Steps:   1%|          | 9/1000 [00:02<05:00,  3.30it/s, loss=0.0234, lr=2e-5]
[rank0]:[W525 19:37:02.018928834 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
W0525 19:37:03.081000 115018 site-packages/torch/distributed/elastic/multiprocessing/api.py:900] Sending process 115072 closing signal SIGTERM
E0525 19:37:03.145000 115018 site-packages/torch/distributed/elastic/multiprocessing/api.py:874] failed (exitcode: 1) local_rank: 0 (pid: 115071) of binary: /home/wtlee4070s/bluestar/diffusion-model-test/.env/bin/python3.12
Traceback (most recent call last):
  File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 50, in main
    args.func(args)
  File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1189, in launch_command
    multi_gpu_launcher(args)
  File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/accelerate/commands/launch.py", line 815, in multi_gpu_launcher
    distrib_run.run(args)
  File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/torch/distributed/run.py", line 883, in run
    elastic_launch(
  File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 139, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/wtlee4070s/bluestar/diffusion-model-test/.env/lib/python3.12/site-packages/torch/distributed/launcher/api.py", line 270, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
train_dreambooth_lora.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2025-05-25_19:37:03
  host      : wtlee4070s
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 115071)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

karanlvm · 2025-06-04T06:53:49Z

karanlvm
Jun 4, 2025

Hey @adha9990 ,

I think in the latest version of diffusers, LoRAAttnProcessor2_0 has been reduced to just a stub:

class LoRAAttnProcessor2_0:
    def __init__(self):
        pass

All the LoRA logic is now handled through PEFT (peft library), so modifying this class no longer affects attention behavior or output shape. That’s why adding mask generation here won’t work anymore.

If you would still like to output both a defective image and a mask, I think you could subclass UNet2DConditionModel and add a small mask_head CNN to the forward pass:

from diffusers.models.unet_2d_condition import UNet2DConditionModel
import torch.nn as nn

class UNet2DConditionWithMask(UNet2DConditionModel):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.mask_head = nn.Sequential(
            nn.Conv2d(self.config.block_out_channels[-1], 64, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(64, 1, kernel_size=1),
            nn.Sigmoid()
        )

    def forward(self, *args, **kwargs):
        result = super().forward(*args, **kwargs)
        x = result["sample"] if isinstance(result, dict) else result
        mask = self.mask_head(x)
        return {"sample": x, "mask": mask}

And you could use it like this -

unet = UNet2DConditionWithMask.from_pretrained(
    "./models/stable-diffusion-2-base/unet", torch_dtype=torch.bfloat16
)

pipeline = StableDiffusionPipeline.from_pretrained(
    "./models/stable-diffusion-2-base",
    unet=unet,
    torch_dtype=torch.bfloat16
)

and outputs through -

out = pipeline(prompt="a vfx with sks")
image = out.images[0]
mask = out["mask"][0].detach().cpu().numpy()

Let me know if this works!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to generate two images simultaneously? #11609

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to generate two images simultaneously? #11609

Uh oh!

adha9990 May 25, 2025

Problem

Request

Replies: 1 comment

Uh oh!

karanlvm Jun 4, 2025

adha9990
May 25, 2025

karanlvm
Jun 4, 2025