Closed
Description
Describe the bug
Edited to reflect the actual issue
In the latest development version, PR #3365 introduced a confusion which makes users believe that PT2 variant of attention processors are fully supported and xformers
is no longer needed. This results in higher VRAM usage under certain situations (using LoRA/custom diffusion without xformers)
If the env is installed with both Pytorch2 and xformers, it will
- raise a warning
- default to PyTorch's native efficient flash attention
diffusers
currently supports the following PT 2.0 variant of attention processors
- AttnProcessor => AttnProcessor2_0
- AttnAddedKVProcessor => AttnAddedKVProcessor2_0
The following are not supported:
- SlicedAttnProcessor
- SlicedAttnAddedKVProcessor
- LoRAAttnProcessor
- CustomDiffusionAttnProcessor
It would be great if users can still use xformers
when calling the pipe.enable_xformers_memory_efficient_attention()
function.
Reproduction
from diffusers import (
DPMSolverMultistepScheduler,
StableDiffusionPipeline,
)
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.unet.load_attn_procs("pytorch_lora_weights.bin")
pipe.enable_xformers_memory_efficient_attention()
prompt = "a photo of a dog"
image = pipe(prompt=prompt, cross_attention_kwargs={"scale": 1.0}).images
Logs
"You have specified using flash attention using xFormers but you have PyTorch 2.0 already installed. "
"We will default to PyTorch's native efficient flash attention implementation provided by PyTorch 2.0.
System Info
diffusers
version: 0.17.0.dev0- Platform: Windows-10-10.0.19045-SP0
- Python version: 3.10.11
- PyTorch version (GPU?): 2.0.0+cu118 (True)
- Huggingface_hub version: 0.13.4
- Transformers version: 4.28.1
- Accelerate version: 0.18.0
- xFormers version: 0.0.19
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No