Closed
Description
Describe the bug
Following SanaPAGPipeline
implementation in #9982,
i cannot get decent output in more than 1% of runs at best.
- most of runs result in what appears to be image with a lot of residual noise and then dc-ae decoder makes it look like sketch-like image with many circular artifacts (see first example image below)
- some of runs result in black-and-white output. adding "rich colors" to prompt makes foreground objects colored, but background remains black-and-white (see second example image below)
- rarely (very rarely) i get decent output
what did i try?
- loading both fp32 and fp16 variants of the model
- loading from separate bf16 repo
- executing in fp16, fp32 and bf16
- enabling/disabling chi and trying to change steps, pag scale, etc.
Reproduction
import torch
import diffusers
# repo_id = 'Efficient-Large-Model/Sana_1600M_1024px_diffusers'
repo_id = 'Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers'
cache_dir = '/mnt/models/Diffusers'
prompt = 'photo of a cute red robot on the surface of moon with planet earth in the background'
negative = ''
dtype = torch.bfloat16
device = torch.device('cuda')
kwargs = {
# 'variant': 'fp16',
'torch_dtype': dtype,
}
pipe = diffusers.SanaPAGPipeline.from_pretrained(repo_id, cache_dir=cache_dir, **kwargs).to(device, dtype)
result = pipe(
prompt = prompt,
negative_prompt = negative,
# num_inference_steps = 20, # default
# guidance_scale = 4.5, # default
# pag_scale = 3.0, # default
# pag_adaptive_scale = 0.0, # default
# height = 1024, # default
# width = 1024, # default
# clean_caption = True, # default
# use_resolution_binning = True, # default
# complex_human_instruction = '...', # default
)
image = result.images[0]
image.save('/tmp/sana.png')
attached are both typical examples of bad output:
Logs
there are several additional issues:
- error when using
UniPC
,DEIS
orSA
schedulers
│ /home/vlado/dev/sdnext/venv/lib/python3.12/site-packages/diffusers/schedulers/scheduling_unipc_multistep.py:396 in set_timesteps │
│ │
│ 395 │ │ │
│ ❱ 396 │ │ self.sigmas = torch.from_numpy(sigmas) │
│ 397 │ │ self.timesteps = torch.from_numpy(timesteps).to(device=device, dtype=torch.int64) │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: At least one stride in the given numpy array is negative, and tensors with negative strides are not currently supported. (You can probably work around this by making a copy of your array with array.copy().)
note: im confirming that flowmatching args are set correctly
note: DPMSolverMultistepScheduler
scheduler works fine, either when left as default or when manually instantiated
- error when using non-zero
pag_adaptive_scale
│ /home/vlado/dev/sdnext/venv/lib/python3.12/site-packages/diffusers/pipelines/pag/pag_utils.py:95 in _get_pag_scale │
│ │
│ 94 │ │ │ signal_scale = self.pag_scale - self.pag_adaptive_scale * (1000 - t) │
│ ❱ 95 │ │ │ if signal_scale < 0: │
│ 96 │ │ │ │ signal_scale = 0 │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Boolean value of Tensor with more than one value is ambiguous
System Info
diffusers==0.32.dev commit=5fb3a985173efaae7ff381b9040c386751d643da
Who can help?
@yiyixuxu @sayakpaul @DN6 @asomoza
@lawrence-cj and @a-r-r-o-w as primary contributors to pr
@hlky for scheduler issues