Fix compatibility with pipeline when loading model with device_map on single gpu #10390

SunMarc · 2024-12-26T15:28:35Z

What does this PR do?

This PR fixes an device issue with the pipeline when we load a diffusers model separately with device_map in a single gpu case. We can't move the whole pipeline to device as the diffusers model have hooks on it (as we set force_hooks=True) and the following check raise an error :

        def module_is_sequentially_offloaded(module):
            if not is_accelerate_available() or is_accelerate_version("<", "0.14.0"):
                return False

            return hasattr(module, "_hf_hook") and (
                isinstance(module._hf_hook, accelerate.hooks.AlignDevicesHook)
                or hasattr(module._hf_hook, "hooks")
                and isinstance(module._hf_hook.hooks[0], accelerate.hooks.AlignDevicesHook)
            )

        # .to("cuda") would raise an error if the pipeline is sequentially offloaded, so we raise our own to make it clearer
        pipeline_is_sequentially_offloaded = any(
            module_is_sequentially_offloaded(module) for _, module in self.components.items()
        )

The model shouldn't need to have hook in a single gpu case.

Two issues that needs to be solved in follow-up PR ?

module_is_sequentially_offloaded check was there initially for sequential offloaded model but we shouldn't allow to move the model that have AlignDevicesHook in general. So maybe we can rename the function and change the error message ?
Right now it only works in single-gpu, maybe to fix for multi-gpu case, we can just raise a warning that we won't move this specific module instead of an error ? I'm fine also to suggest using reset_device_map() if the goal is to put all models on the same device.

To reproduce :

import torch
from diffusers import FluxPipeline, FluxTransformer2DModel

model_id = "black-forest-labs/Flux.1-Dev"
dtype = torch.bfloat16

transformer = FluxTransformer2DModel.from_pretrained(
    model_id,
    subfolder="transformer",
    torch_dtype=dtype,
    device_map="auto",
)
if hasattr(transformer, "hf_device_map"):
    print(transformer.hf_device_map)

pipe = FluxPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.bfloat16).to("cuda")
prompt = "A cat holding a sign that says hello world"
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.0, generator=torch.Generator().manual_seed(42)).images[0]
image.save("test_3_out.png")

SunMarc · 2024-12-26T15:29:47Z

src/diffusers/models/modeling_utils.py

@@ -937,7 +935,6 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P
                            offload_folder=offload_folder,
                            offload_state_dict=offload_state_dict,
                            dtype=torch_dtype,
-                            force_hooks=force_hook,


These are not needed because we actually removes all hooks and dispatch the model again in the pipeline logic.

If the model were to be used independently of the pipeline, would removing this be sensible?

Only if the user used device_map with only one gpu. Should be fine to be honest. The user actually expect to be able to move the model without any issues if the model in dispatch on only one gpu.

SunMarc · 2024-12-26T15:31:30Z

src/diffusers/pipelines/pipeline_utils.py

+
+        is_pipeline_device_mapped = self.hf_device_map is not None and len(self.hf_device_map) > 1
+        if is_pipeline_device_mapped:
+            raise ValueError(
+                "It seems like you have activated a device mapping strategy on the pipeline which doesn't allow explicit device placement using `to()`. You can call `reset_device_map()` first and then call `to()`."
+            )
+


moved up because otherwise it will trigger the pipeline_is_sequentially_offloaded error first when passing device_map to the pipeline

HuggingFaceDocBuilderDev · 2024-12-26T15:35:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul

Thanks, Marc!

sayakpaul · 2024-12-26T15:53:36Z

src/diffusers/models/modeling_utils.py

@@ -937,7 +935,6 @@ def from_pretrained(cls, pretrained_model_name_or_path: Optional[Union[str, os.P
                            offload_folder=offload_folder,
                            offload_state_dict=offload_state_dict,
                            dtype=torch_dtype,
-                            force_hooks=force_hook,


If the model were to be used independently of the pipeline, would removing this be sensible?

src/diffusers/pipelines/pipeline_utils.py

sayakpaul · 2024-12-26T15:59:57Z

@SunMarc regarding

Right now it only works in single-gpu, maybe to fix for multi-gpu case, we can just raise a warning that we won't move this specific module instead of an error ? I'm fine also to suggest using reset_device_map() if the goal is to put all models on the same device.

This is the fix that I am suggesting:

diff --git a/src/diffusers/pipelines/pipeline_utils.py b/src/diffusers/pipelines/pipeline_utils.py
index a504184ea..393dc83c8 100644
--- a/src/diffusers/pipelines/pipeline_utils.py
+++ b/src/diffusers/pipelines/pipeline_utils.py
@@ -388,6 +388,7 @@ class DiffusionPipeline(ConfigMixin, PushToHubMixin):
 
         device = device or device_arg
         pipeline_has_bnb = any(any((_check_bnb_status(module))) for _, module in self.components.items())
+        is_any_pipeline_component_device_mapped = any(getattr(module, "hf_device_map", None) is not None for _, module in self.components.items()) 
 
         # throw warning if pipeline is in "offloaded"-mode but user tries to manually set to GPU.
         def module_is_sequentially_offloaded(module):
@@ -411,7 +412,7 @@ class DiffusionPipeline(ConfigMixin, PushToHubMixin):
             module_is_sequentially_offloaded(module) for _, module in self.components.items()
         )
         if device and torch.device(device).type == "cuda":
-            if pipeline_is_sequentially_offloaded and not pipeline_has_bnb:
+            if not is_any_pipeline_component_device_mapped and pipeline_is_sequentially_offloaded and not pipeline_has_bnb:
                 raise ValueError(
                     "It seems like you have activated sequential model offloading by calling `enable_sequential_cpu_offload`, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline `.to('cpu')` or consider removing the move altogether if you use sequential offloading."
                 )
@@ -429,7 +430,7 @@ class DiffusionPipeline(ConfigMixin, PushToHubMixin):
 
         # Display a warning in this case (the operation succeeds but the benefits are lost)
         pipeline_is_offloaded = any(module_is_offloaded(module) for _, module in self.components.items())
-        if pipeline_is_offloaded and device and torch.device(device).type == "cuda":
+        if not is_any_pipeline_component_device_mapped and pipeline_is_offloaded and device and torch.device(device).type == "cuda":
             logger.warning(
                 f"It seems like you have activated model offloading by calling `enable_model_cpu_offload`, but are now manually moving the pipeline to GPU. It is strongly recommended against doing so as memory gains from offloading are likely to be lost. Offloading automatically takes care of moving the individual components {', '.join(self.components.keys())} to GPU when needed. To make sure offloading works as expected, you should consider moving the pipeline back to CPU: `pipeline.to('cpu')` or removing the move altogether if you use offloading."
             )
@@ -454,10 +455,11 @@ class DiffusionPipeline(ConfigMixin, PushToHubMixin):
 
             # This can happen for `transformer` models. CPU placement was added in
             # https://github.com/huggingface/transformers/pull/33122. So, we guard this accordingly.
-            if is_loaded_in_4bit_bnb and device is not None and is_transformers_version(">", "4.44.0"):
-                module.to(device=device)
-            elif not is_loaded_in_4bit_bnb and not is_loaded_in_8bit_bnb:
-                module.to(device, dtype)
+            if getattr(module, "hf_device_map", None) is None or (len(module.hf_device_map) == 1 and module.hf_device_map != {'': 'cpu'}):
+                if is_loaded_in_4bit_bnb and device is not None and is_transformers_version(">", "4.44.0"):
+                    module.to(device=device)
+                elif not is_loaded_in_4bit_bnb and not is_loaded_in_8bit_bnb:
+                    module.to(device, dtype)
 
             if (
                 module.dtype == torch.float16

LMK WDYT. I think the the if/else is messy and is not ideal. But this was to show a solution.

SunMarc · 2024-12-27T14:10:08Z

This is roughly the solution I was thinking.
What could be better is to add more condition to pipeline_is_sequentially_offloaded so that we don't have to add is_any_pipeline_component_device_mapped condition everywhere and add a warning when we don't move the model because the device_map is length is superior to 2 here:
if getattr(module, "hf_device_map", None) is None or (len(module.hf_device_map) == 1 and module.hf_device_map != {'': 'cpu'}):

Co-authored-by: Sayak Paul <[email protected]>

sayakpaul · 2024-12-27T14:31:22Z

What could be better is to add more condition to pipeline_is_sequentially_offloaded so that we don't have to add is_any_pipeline_component_device_mapped condition everywhere and add a warning when we don't move the model because the device_map is length is superior to 2 here:
if getattr(module, "hf_device_map", None) is None or (len(module.hf_device_map) == 1 and module.hf_device_map != {'': 'cpu'}):

That sounds good to me! That sounds like it would better cover different kinds of situations more easily. Do you want to tackle it in this PR?

yiyixuxu

thanks @SunMarc
changes looks good to me, feel free to merge
for follow-up PRs, I think:

for this use case, agree we should raise a warning and skip moving the module so that it would work; additionally, for this particular case, when the passed model component already has a device map, we should update the pipeline's hf_device_map with this component's device_map, no?

from diffusers import FluxPipeline, FluxTransformer2DModel

model_id = "black-forest-labs/Flux.1-Dev"
dtype = torch.bfloat16

transformer = FluxTransformer2DModel.from_pretrained(
    model_id,
    subfolder="transformer",
    torch_dtype=dtype,
    device_map="auto",
)
if hasattr(transformer, "hf_device_map"):
    print(transformer.hf_device_map)

pipe = FluxPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=torch.bfloat16).to("cuda")
prompt = "A cat holding a sign that says hello world"
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.0, generator=torch.Generator().manual_seed(42)).images[0]
image.save("test_3_out.png")

can we look into making this work as well? when we pass a device_map=auto to the pipeline, along with components that are already loaded, we should dispatch that components to the expected device so that the whole thing would work

import torch
from diffusers import FluxPipeline, FluxTransformer2DModel

model_id = "black-forest-labs/Flux.1-Dev"
dtype = torch.bfloat16

transformer = FluxTransformer2DModel.from_pretrained(
    model_id,
    subfolder="transformer",
    torch_dtype=dtype,
)

pipe = FluxPipeline.from_pretrained(model_id, transformer=transformer, device_map="balanced",torch_dtype=torch.bfloat16)
print(pipe.hf_device_map)
for name, component in pipe.components.items():
    print(f"{name}:")
    if hasattr(component, "hf_device_map"):
        print(component.hf_device_map)

prompt = "A cat holding a sign that says hello world"
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.0, generator=torch.Generator().manual_seed(42)).images[0]
image.save("test_out.png")

SunMarc · 2025-01-08T10:41:57Z

for this use case, agree we should raise a warning and skip moving the module so that it would work; additionally, for this particular case, when the passed model component already has a device map, we should update the pipeline's hf_device_map with this component's device_map, no?

For this particular case, I'm not sure what is currently being done cc @sayakpaul but yeah that would make sense !

can we look into making this work as well? when we pass a device_map=auto to the pipeline, along with components that are already loaded, we should dispatch that components to the expected device so that the whole thing would work

Thanks for the snippet ! I'll look into that

sayakpaul · 2025-01-08T10:54:08Z

For this particular case, I'm not sure what is currently being done cc @sayakpaul but yeah that would make sense !

Agreed. It has to be stemmed from here:

diffusers/src/diffusers/pipelines/pipeline_utils.py

Line 879 in e2deb82

final_device_map = _get_final_device_map(

Let's tackle this in a follow-up?

For "auto" pipeline-level device-map, note that currently that is not supported as it was hard to define what that would mean in the context of a pipeline (when we were working on adding device-map support to pipelines, if you remember).

fix device issue in single gpu case

0dbca3f

SunMarc commented Dec 26, 2024

View reviewed changes

SunMarc requested review from sayakpaul and yiyixuxu December 26, 2024 15:31

sayakpaul reviewed Dec 26, 2024

View reviewed changes

Update src/diffusers/pipelines/pipeline_utils.py

54a903b

Co-authored-by: Sayak Paul <[email protected]>

yiyixuxu approved these changes Jan 8, 2025

View reviewed changes

SunMarc merged commit e2deb82 into main Jan 8, 2025
15 checks passed

sayakpaul deleted the remove-force-hook-from-model-init branch January 8, 2025 10:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix compatibility with pipeline when loading model with device_map on single gpu #10390

Fix compatibility with pipeline when loading model with device_map on single gpu #10390

Uh oh!

SunMarc commented Dec 26, 2024 •

edited

Loading

Uh oh!

SunMarc Dec 26, 2024

Uh oh!

sayakpaul Dec 26, 2024

Uh oh!

SunMarc Dec 27, 2024

Uh oh!

SunMarc Dec 26, 2024

Uh oh!

HuggingFaceDocBuilderDev commented Dec 26, 2024

Uh oh!

sayakpaul left a comment

Uh oh!

sayakpaul Dec 26, 2024

Uh oh!

Uh oh!

sayakpaul commented Dec 26, 2024 •

edited

Loading

Uh oh!

SunMarc commented Dec 27, 2024

Uh oh!

sayakpaul commented Dec 27, 2024 •

edited

Loading

Uh oh!

yiyixuxu left a comment •

edited

Loading

Uh oh!

Uh oh!

SunMarc commented Jan 8, 2025

Uh oh!

sayakpaul commented Jan 8, 2025

Uh oh!

Uh oh!

Fix compatibility with pipeline when loading model with device_map on single gpu #10390

Fix compatibility with pipeline when loading model with device_map on single gpu #10390

Uh oh!

Conversation

SunMarc commented Dec 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

To reproduce :

Uh oh!

SunMarc Dec 26, 2024

Choose a reason for hiding this comment

Uh oh!

sayakpaul Dec 26, 2024

Choose a reason for hiding this comment

Uh oh!

SunMarc Dec 27, 2024

Choose a reason for hiding this comment

Uh oh!

SunMarc Dec 26, 2024

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Dec 26, 2024

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul Dec 26, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sayakpaul commented Dec 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SunMarc commented Dec 27, 2024

Uh oh!

sayakpaul commented Dec 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SunMarc commented Jan 8, 2025

Uh oh!

sayakpaul commented Jan 8, 2025

Uh oh!

Uh oh!

SunMarc commented Dec 26, 2024 •

edited

Loading

sayakpaul commented Dec 26, 2024 •

edited

Loading

sayakpaul commented Dec 27, 2024 •

edited

Loading

yiyixuxu left a comment •

edited

Loading