Skip to content

[Quantization] bring quantization to diffusers core #9174

Closed
@sayakpaul

Description

@sayakpaul

Now that we have a working PoC (#9165) of NF4 quantization through bitsandbytes and also this through optimum.quanto, it's time to bring in quantization more formally in diffusers 🎸

In this issue, I want to devise a rough plan to attack the integration. We are going to start with bitsandbytes and then slowly increase the list of our supported quantizers based on community interest. This integration will also allow us to do LoRA fine-tuning of large models like Flux through peft (guide).

Three PRs are expected:

  • Introduce a base quantization config class like we have in transformers.
  • Introduce bitsandbytes related utilities to handle processing, post-processing of layers for injecting bitsandbytes layers. Example is here.
  • Introduce a bitsandbytes config (example) and quantization loader mixin aka QuantizationLoaderMixin. This loader will enable passing a quantization config to from_pretrained() of a ModelMixin and will tackle how to modify and prepare the model for the provided quantization config. This will also allow us to serialize the model according to the quantization config.

Notes:


@DN6 @SunMarc sounds good?

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions