Open
Description
With new improvements to quantization_config
, memory requirements of models such as SD35 and FLUX.1 are much lower.
However, user must load each model component that he wants quantized manually and then assemble the pipeline.
For example:
quantization_config = BitsAndBytesConfig(...)
transformer = SD3Transformer2DModel.from_pretrained(repo_id, subfolder="transformer", quantization_config=quantization_config)
text_encoder = T5EncoderModel.from_pretrained(repo_id, subfolder="text_encoder_3", quantization_config=quantization_config)
pipe = StableDiffusion3Pipeline.from_pretrained(repo_id, transformer=transformer, text_encoder=text_encoder)
The ask is to allow pipeline loader itself to process quantization_config
and automatically use it on applicable modules if its present
That would allow much simpler use without user needing to know exact internal components of the each model:
quantization_config = BitsAndBytesConfig(...)
pipe = StableDiffusion3Pipeline.from_pretrained(repo_id, quantization_config=quantization_config)
This is a generic ask that should work for pretty much all models, although primary use case is with the most popular models such as SD35 and FLUX.1