kakaobrain unCLIP #1428

williamberman · 2022-11-25T21:35:05Z

Scoping

Converting and running the model

From the diffusers root directory:

Download weights:

$ wget https://arena.kakaocdn.net/brainrepo/models/karlo-public/v1.0.0.alpha/efdf6206d8ed593961593dc029a8affa/decoder-ckpt-step%3D01000000-of-01000000.ckpt
$ wget https://arena.kakaocdn.net/brainrepo/models/karlo-public/v1.0.0.alpha/4226b831ae0279020d134281f3c31590/improved-sr-ckpt-step%3D1.2M.ckpt
$ wget https://arena.kakaocdn.net/brainrepo/models/karlo-public/v1.0.0.alpha/85626483eaca9f581e2a78d31ff905ca/prior-ckpt-step%3D01000000-of-01000000.ckpt
$ wget https://arena.kakaocdn.net/brainrepo/models/karlo-public/v1.0.0.alpha/0b62380a75e56f073e2844ab5199153d/ViT-L-14_stats.th

Convert the model:

$ python scripts/convert_kakao_brain_unclip_to_diffusers.py \
      --decoder_checkpoint_path ./decoder-ckpt-step\=01000000-of-01000000.ckpt \
      --super_res_unet_checkpoint_path ./improved-sr-ckpt-step\=1.2M.ckpt \
      --prior_checkpoint_path ./prior-ckpt-step\=01000000-of-01000000.ckpt \
      --clip_stat_path ./ViT-L-14_stats.th \
      --dump_path ./unclip_dump

Run the model:

import os
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"

from diffusers import UnCLIPPipeline
import torch
import random
import numpy as np

def set_seed(seed: int):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
set_seed(0)

torch.backends.cuda.matmul.allow_tf32 = False
torch.use_deterministic_algorithms(True)

pipe = UnCLIPPipeline.from_pretrained("./unclip_dump")
pipe = pipe.to('cuda')

image = pipe(["horse"]).images[0]
image.save("./horse.png")

Original:

TODO

scheduler/pipeline

note that this model runs a separate diffusion process for the prior, the decoder, and the super res unet. The super res unet also uses a separate unet as the "last step unet".
TODO fill in more info here :)

prior transformer

new transformer class based off of our existing 2D transformer. This transformer maps over clip embeddings and so won't have the 2D components. There are a few additional parameters around textual embeddings and mapping the output to the clip embeddings dimension
Write script porting weights
Verify against original implementation

Decoder Unet

Pass an additional flag to the down/up blocks indicating if the down/up sample should be a resnet. Currently, the {down, up}samples are {Down,Up}Sample2D's. We want to be able to use a resnet which wraps the sampling instead.
Pass a flag to ResnetBlock2D to use the time embedding projection to scale and shift the norm'ed hidden states instead of just adding them together. It looks like this flag already existed but wasn't implemented yet.
UnCLIPEmbeddingUtils Unet conditioning + additional conditioning embeddings added with time embeddings ->
Port the attention block and split the combined conv block weights. This is ported but is giving small discrepancies (on the order of 1e-3 - 1e-4). It looks like these discrepancies do propagate to larger discrepancies when the whole unet is ran.
Write script porting weights
Verify against original implementation
Make new {Down,Mid,Up} block types The new configuration ends up making existing blocks too hacky. We'll add new block definitions instead.

super resolution 64->256 Unet

Unconditional Unet.
Latents are upsampled (TODO how) before inputted
The super resolution unet looks like it actually wraps two separate unets and has a modified sampling function - https://github.com/kakaobrain/karlo/blob/e105e7643c4e9f30b1b17c7e4354d8474455dcb3/karlo/modules/diffusion/gaussian_diffusion.py#L596 see the model_aux argument
Does not contain any attention mechanism (including self attention)
New block types for modified resnet up/down sample Similar to Decoder unet.
Modify porting code from decoder unet Unet is basically the same structure as decoder except there's no cross or self attention mechanism. Will re-use methods from decoder unet.
Verify against original implementation
Port and verify "last step unet"

super resolution 256 ->1024 Unet

not released with this model!

HuggingFaceDocBuilderDev · 2022-11-25T21:41:05Z

The documentation is not available anymore as the PR was closed or merged.

src/diffusers/models/resnet.py

src/diffusers/models/unet_2d_blocks.py

src/diffusers/models/unet_2d_condition.py

patrickvonplaten

Very nice progress!

…diffusers into kakaobrain_unclip

src/diffusers/models/attention.py

…kakaobrain_unclip

…diffusers into kakaobrain_unclip

@patrickvonplaten

* [wip] attention block updates * [wip] unCLIP unet decoder and super res * [wip] unCLIP prior transformer * [wip] scheduler changes * [wip] text proj utility class * [wip] UnCLIPPipeline * [wip] kakaobrain unCLIP convert script * [unCLIP pipeline] fixes re: @patrickvonplaten remove callbacks move denoising loops into call function * UNCLIPScheduler re: @patrickvonplaten Revert changes to DDPMScheduler. Make UNCLIPScheduler, a modified DDPM scheduler with changes to support karlo * mask -> attention_mask re: @patrickvonplaten * [DDPMScheduler] remove leftover change * [docs] PriorTransformer * [docs] UNet2DConditionModel and UNet2DModel * [nit] UNCLIPScheduler -> UnCLIPScheduler matches existing unclip naming better * [docs] SchedulingUnCLIP * [docs] UnCLIPTextProjModel * refactor * finish licenses * rename all to attention_mask and prep in models * more renaming * don't expose unused configs * final renaming fixes * remove x attn mask when not necessary * configure kakao script to use new class embedding config * fix copies * [tests] UnCLIPScheduler * finish x attn * finish * remove more * rename condition blocks * clean more * Apply suggestions from code review * up * fix * [tests] UnCLIPPipelineFastTests * remove unused imports * [tests] UnCLIPPipelineIntegrationTests * correct * make style Co-authored-by: Patrick von Platen <[email protected]>

@patrickvonplaten

* [wip] attention block updates * [wip] unCLIP unet decoder and super res * [wip] unCLIP prior transformer * [wip] scheduler changes * [wip] text proj utility class * [wip] UnCLIPPipeline * [wip] kakaobrain unCLIP convert script * [unCLIP pipeline] fixes re: @patrickvonplaten remove callbacks move denoising loops into call function * UNCLIPScheduler re: @patrickvonplaten Revert changes to DDPMScheduler. Make UNCLIPScheduler, a modified DDPM scheduler with changes to support karlo * mask -> attention_mask re: @patrickvonplaten * [DDPMScheduler] remove leftover change * [docs] PriorTransformer * [docs] UNet2DConditionModel and UNet2DModel * [nit] UNCLIPScheduler -> UnCLIPScheduler matches existing unclip naming better * [docs] SchedulingUnCLIP * [docs] UnCLIPTextProjModel * refactor * finish licenses * rename all to attention_mask and prep in models * more renaming * don't expose unused configs * final renaming fixes * remove x attn mask when not necessary * configure kakao script to use new class embedding config * fix copies * [tests] UnCLIPScheduler * finish x attn * finish * remove more * rename condition blocks * clean more * Apply suggestions from code review * up * fix * [tests] UnCLIPPipelineFastTests * remove unused imports * [tests] UnCLIPPipelineIntegrationTests * correct * make style Co-authored-by: Patrick von Platen <[email protected]>

williamberman force-pushed the kakaobrain_unclip branch from 580ec44 to f6225cd Compare November 25, 2022 21:37

williamberman force-pushed the kakaobrain_unclip branch 5 times, most recently from 31e8c58 to 7250888 Compare November 27, 2022 23:58