Apply applicable `quantization_config` to model components when loading a model

With new improvements to `quantization_config`, memory requirements of models such as SD35 and FLUX.1 are much lower.
However, user must load each model component that he wants quantized manually and then assemble the pipeline.

For example:
```py
quantization_config = BitsAndBytesConfig(...)
transformer = SD3Transformer2DModel.from_pretrained(repo_id, subfolder="transformer", quantization_config=quantization_config)
text_encoder = T5EncoderModel.from_pretrained(repo_id, subfolder="text_encoder_3", quantization_config=quantization_config)
pipe = StableDiffusion3Pipeline.from_pretrained(repo_id, transformer=transformer, text_encoder=text_encoder)
```

The ask is to allow pipeline loader itself to process `quantization_config` and automatically use it on applicable modules if its present  
That would allow much simpler use without user needing to know exact internal components of the each model:

```py
quantization_config = BitsAndBytesConfig(...)
pipe = StableDiffusion3Pipeline.from_pretrained(repo_id, quantization_config=quantization_config)
```

This is a generic ask that should work for pretty much all models, although primary use case is with the most popular models such as SD35 and FLUX.1

@yiyixuxu @sayakpaul @DN6 @asomoza


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply applicable `quantization_config` to model components when loading a model #10327

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Apply applicable quantization_config to model components when loading a model #10327

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Apply applicable `quantization_config` to model components when loading a model #10327