Attn added kv processor torch 2.0 block #3023

williamberman · 2023-04-08T23:51:29Z

rebased on top of #3021 and #3011

200a8c7 and on is the main commit

HuggingFaceDocBuilderDev · 2023-04-08T23:56:13Z

The documentation is not available anymore as the PR was closed or merged.

williamberman · 2023-04-09T02:02:32Z

src/diffusers/models/attention_processor.py

+        if out_dim == 3:
+            if attention_mask.shape[0] < batch_size * head_size:
+                attention_mask = attention_mask.repeat_interleave(head_size, dim=0)
+        elif out_dim == 4:
+            attention_mask = attention_mask.unsqueeze(1)
+            attention_mask = attention_mask.repeat_interleave(head_size, dim=1)
+


@patrickvonplaten when using the torch built in attention and putting heads as the second dim, we need to make the attention mask also put the heads in the second dim. I'm not sure what the equivalent check for attention_mask.shape[0] < batch_size * head_size is. If we assume the input attention mask is always just the same batch size of the inputs, we don't have to do the check and I think this works. My understanding is that's what the original code was doing anyway since it just repeats by the head size regardless

williamberman · 2023-04-11T00:54:09Z

waiting to merge until #3021 and #3011 are merged

sayakpaul · 2023-04-11T01:39:43Z

src/diffusers/models/attention_processor.py

+        else:
+            raise ValueError(
+                f"unknown cross_attention_norm: {cross_attention_norm}. Should be None, 'layer_norm' or 'group_norm'"
+            )


sayakpaul

Very nice refactor!

sayakpaul · 2023-04-11T01:49:32Z

src/diffusers/models/attention_processor.py

+            value = encoder_hidden_states_value_proj
+
+        # the output of sdp = (batch, num_heads, seq_len, head_dim)
+        # TODO: add support for attn.scale when we move to Torch 2.1


Out of curiosity. Which scale is this one?

sayakpaul · 2023-04-11T01:50:28Z

src/diffusers/models/attention_processor.py

+        elif isinstance(self.norm_cross, nn.GroupNorm):
+            # Group norm norms along the channels dimension and expects
+            # input to be in the shape of (N, C, *). In this case, we want
+            # to norm along the hidden dimension, so we need to move
+            # (batch_size, sequence_length, hidden_size) ->
+            # (batch_size, hidden_size, sequence_length)
+            encoder_hidden_states = encoder_hidden_states.transpose(1, 2)
+            encoder_hidden_states = self.norm_cross(encoder_hidden_states)
+            encoder_hidden_states = encoder_hidden_states.transpose(1, 2)


sayakpaul · 2023-04-11T01:50:45Z

src/diffusers/models/attention_processor.py

+        if not attn.only_cross_attention:
+            key = attn.to_k(hidden_states)
+            value = attn.to_v(hidden_states)
+            key = attn.head_to_batch_dim(key)
+            value = attn.head_to_batch_dim(value)
+            key = torch.cat([encoder_hidden_states_key_proj, key], dim=1)
+            value = torch.cat([encoder_hidden_states_value_proj, value], dim=1)
+        else:
+            key = encoder_hidden_states_key_proj
+            value = encoder_hidden_states_value_proj


sayakpaul · 2023-04-11T01:55:24Z

src/diffusers/pipelines/versatile_diffusion/modeling_text_unet.py

+                    only_cross_attention=only_cross_attention,
+                    cross_attention_norm=cross_attention_norm,
+                    processor=processor,


sayakpaul · 2023-04-11T01:56:42Z

tests/pipelines/unclip/test_unclip_image_variation.py

+        # Check is relaxed because there is not a torch 2.0 sliced attention added kv processor
+        expected_max_diff = 1e-2
+
+        self._test_attention_slicing_forward_pass(
+            test_max_difference=test_max_difference, expected_max_diff=expected_max_diff
+        )


Should this be conditioned on the torch version being used?

Sorry should have noted, intention is to follow up and add the 2.0 sliced attention added kv processor

add AttnAddedKVProcessor2_0 block

williamberman force-pushed the AttnAddedKVProcessor2_0 branch 2 times, most recently from 26db440 to 40ec593 Compare April 9, 2023 00:04

williamberman requested review from patrickvonplaten, pcuenca, sayakpaul and yiyixuxu April 9, 2023 00:04

williamberman force-pushed the AttnAddedKVProcessor2_0 branch from 40ec593 to a38867e Compare April 9, 2023 01:57

williamberman commented Apr 9, 2023

View reviewed changes

patrickvonplaten approved these changes Apr 9, 2023

View reviewed changes

williamberman force-pushed the AttnAddedKVProcessor2_0 branch from a38867e to a480bc0 Compare April 11, 2023 00:52

sayakpaul reviewed Apr 11, 2023

View reviewed changes

williamberman force-pushed the AttnAddedKVProcessor2_0 branch from a480bc0 to fb485ad Compare April 11, 2023 01:43

sayakpaul reviewed Apr 11, 2023

View reviewed changes

williamberman force-pushed the AttnAddedKVProcessor2_0 branch from fb485ad to ac3786a Compare April 11, 2023 17:44

add AttnAddedKVProcessor2_0 block

a65dd58

williamberman force-pushed the AttnAddedKVProcessor2_0 branch from ac3786a to a65dd58 Compare April 11, 2023 22:59

williamberman merged commit ea39cd7 into huggingface:main Apr 11, 2023

w4ffl35 pushed a commit to w4ffl35/diffusers that referenced this pull request Apr 14, 2023

Attn added kv processor torch 2.0 block (huggingface#3023)

222ff51

add AttnAddedKVProcessor2_0 block

dg845 pushed a commit to dg845/diffusers that referenced this pull request May 6, 2023

Attn added kv processor torch 2.0 block (huggingface#3023)

a6cf0f2

add AttnAddedKVProcessor2_0 block

yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023

Attn added kv processor torch 2.0 block (huggingface#3023)

ae7e82d

add AttnAddedKVProcessor2_0 block

AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024

Attn added kv processor torch 2.0 block (huggingface#3023)

3e62bca

add AttnAddedKVProcessor2_0 block

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Attn added kv processor torch 2.0 block #3023

Attn added kv processor torch 2.0 block #3023

Uh oh!

williamberman commented Apr 8, 2023 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 8, 2023 •

edited

Loading

Uh oh!

williamberman Apr 9, 2023

Uh oh!

williamberman commented Apr 11, 2023

Uh oh!

sayakpaul Apr 11, 2023

Uh oh!

sayakpaul left a comment

Uh oh!

sayakpaul Apr 11, 2023

Uh oh!

sayakpaul Apr 11, 2023

Uh oh!

sayakpaul Apr 11, 2023

Uh oh!

sayakpaul Apr 11, 2023

Uh oh!

sayakpaul Apr 11, 2023

Uh oh!

williamberman Apr 11, 2023 •

edited

Loading

Uh oh!

Uh oh!

Attn added kv processor torch 2.0 block #3023

Attn added kv processor torch 2.0 block #3023

Uh oh!

Conversation

williamberman commented Apr 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

williamberman Apr 9, 2023

Choose a reason for hiding this comment

Uh oh!

williamberman commented Apr 11, 2023

Uh oh!

sayakpaul Apr 11, 2023

Choose a reason for hiding this comment

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

sayakpaul Apr 11, 2023

Choose a reason for hiding this comment

Uh oh!

sayakpaul Apr 11, 2023

Choose a reason for hiding this comment

Uh oh!

sayakpaul Apr 11, 2023

Choose a reason for hiding this comment

Uh oh!

sayakpaul Apr 11, 2023

Choose a reason for hiding this comment

Uh oh!

sayakpaul Apr 11, 2023

Choose a reason for hiding this comment

Uh oh!

williamberman Apr 11, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

williamberman commented Apr 8, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 8, 2023 •

edited

Loading

williamberman Apr 11, 2023 •

edited

Loading