Skip to content

kakaobrain unCLIP #1428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Dec 18, 2022
Merged

Conversation

williamberman
Copy link
Contributor

@williamberman williamberman commented Nov 25, 2022

Scoping
  • Scope prior Transformer
  • Scope decoder Unet
  • Scope super resolution 64->256 Unet
  • Scope super resolution 256 ->1024 Unet
  • Scope scheduler
  • Scope pipeline
Converting and running the model

From the diffusers root directory:

Download weights:

$ wget https://arena.kakaocdn.net/brainrepo/models/karlo-public/v1.0.0.alpha/efdf6206d8ed593961593dc029a8affa/decoder-ckpt-step%3D01000000-of-01000000.ckpt
$ wget https://arena.kakaocdn.net/brainrepo/models/karlo-public/v1.0.0.alpha/4226b831ae0279020d134281f3c31590/improved-sr-ckpt-step%3D1.2M.ckpt
$ wget https://arena.kakaocdn.net/brainrepo/models/karlo-public/v1.0.0.alpha/85626483eaca9f581e2a78d31ff905ca/prior-ckpt-step%3D01000000-of-01000000.ckpt
$ wget https://arena.kakaocdn.net/brainrepo/models/karlo-public/v1.0.0.alpha/0b62380a75e56f073e2844ab5199153d/ViT-L-14_stats.th

Convert the model:

$ python scripts/convert_kakao_brain_unclip_to_diffusers.py \
      --decoder_checkpoint_path ./decoder-ckpt-step\=01000000-of-01000000.ckpt \
      --super_res_unet_checkpoint_path ./improved-sr-ckpt-step\=1.2M.ckpt \
      --prior_checkpoint_path ./prior-ckpt-step\=01000000-of-01000000.ckpt \
      --clip_stat_path ./ViT-L-14_stats.th \
      --dump_path ./unclip_dump

Run the model:

import os
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"

from diffusers import UnCLIPPipeline
import torch
import random
import numpy as np

def set_seed(seed: int):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
set_seed(0)

torch.backends.cuda.matmul.allow_tf32 = False
torch.use_deterministic_algorithms(True)

pipe = UnCLIPPipeline.from_pretrained("./unclip_dump")
pipe = pipe.to('cuda')

image = pipe(["horse"]).images[0]
image.save("./horse.png")

horse

Original:
out_large_orig

TODO
  • e2e verification - only decoder left, I believe I incorrectly added the mask in the attention block!
  • add information about masking added to existing CrossAttention block
  • if necessary, add docs on discrepancies between sample coefficients used in scheduler
  • code docs
  • markdown docs
  • pipeline docs
  • Either keep additive_time_embeddings or replace them with class embeddings
  • Add docs for additive_time_embeddings if we keep them
  • Replace AttentionBlock with CrossAttention
  • tests
  • add mask to alternative attention mechanisms in CrossAttention
  • Add licenses to pipeline_unclip.py, text_proj.py, and scheduling_unclip.py
  • document and/or rename prd_embedding
scheduler/pipeline
  • note that this model runs a separate diffusion process for the prior, the decoder, and the super res unet. The super res unet also uses a separate unet as the "last step unet".
  • TODO fill in more info here :)
prior transformer
  • new transformer class based off of our existing 2D transformer. This transformer maps over clip embeddings and so won't have the 2D components. There are a few additional parameters around textual embeddings and mapping the output to the clip embeddings dimension
  • Write script porting weights
  • Verify against original implementation
Decoder Unet
  • Pass an additional flag to the down/up blocks indicating if the down/up sample should be a resnet. Currently, the {down, up}samples are {Down,Up}Sample2D's. We want to be able to use a resnet which wraps the sampling instead.
  • Pass a flag to ResnetBlock2D to use the time embedding projection to scale and shift the norm'ed hidden states instead of just adding them together. It looks like this flag already existed but wasn't implemented yet.
  • UnCLIPEmbeddingUtils Unet conditioning + additional conditioning embeddings added with time embeddings ->
  • Port the attention block and split the combined conv block weights. This is ported but is giving small discrepancies (on the order of 1e-3 - 1e-4). It looks like these discrepancies do propagate to larger discrepancies when the whole unet is ran.
  • Write script porting weights
  • Verify against original implementation
  • Make new {Down,Mid,Up} block types The new configuration ends up making existing blocks too hacky. We'll add new block definitions instead.
super resolution 64->256 Unet
  • Unconditional Unet.
  • Latents are upsampled (TODO how) before inputted
  • The super resolution unet looks like it actually wraps two separate unets and has a modified sampling function - https://github.com/kakaobrain/karlo/blob/e105e7643c4e9f30b1b17c7e4354d8474455dcb3/karlo/modules/diffusion/gaussian_diffusion.py#L596 see the model_aux argument
  • Does not contain any attention mechanism (including self attention)
  • New block types for modified resnet up/down sample Similar to Decoder unet.
  • Modify porting code from decoder unet Unet is basically the same structure as decoder except there's no cross or self attention mechanism. Will re-use methods from decoder unet.
  • Verify against original implementation
  • Port and verify "last step unet"
super resolution 256 ->1024 Unet

not released with this model!

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 25, 2022

The documentation is not available anymore as the PR was closed or merged.

@williamberman williamberman force-pushed the kakaobrain_unclip branch 5 times, most recently from 31e8c58 to 7250888 Compare November 27, 2022 23:58
Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice progress!

@williamberman williamberman force-pushed the kakaobrain_unclip branch 18 times, most recently from 08590e2 to 95484f6 Compare December 5, 2022 01:45
@williamberman williamberman merged commit 2dcf64b into huggingface:main Dec 18, 2022
sliard pushed a commit to sliard/diffusers that referenced this pull request Dec 21, 2022
* [wip] attention block updates

* [wip] unCLIP unet decoder and super res

* [wip] unCLIP prior transformer

* [wip] scheduler changes

* [wip] text proj utility class

* [wip] UnCLIPPipeline

* [wip] kakaobrain unCLIP convert script

* [unCLIP pipeline] fixes re: @patrickvonplaten

remove callbacks

move denoising loops into call function

* UNCLIPScheduler re: @patrickvonplaten

Revert changes to DDPMScheduler. Make UNCLIPScheduler, a modified
DDPM scheduler with changes to support karlo

* mask -> attention_mask re: @patrickvonplaten

* [DDPMScheduler] remove leftover change

* [docs] PriorTransformer

* [docs] UNet2DConditionModel and UNet2DModel

* [nit] UNCLIPScheduler -> UnCLIPScheduler

matches existing unclip naming better

* [docs] SchedulingUnCLIP

* [docs] UnCLIPTextProjModel

* refactor

* finish licenses

* rename all to attention_mask and prep in models

* more renaming

* don't expose unused configs

* final renaming fixes

* remove x attn mask when not necessary

* configure kakao script to use new class embedding config

* fix copies

* [tests] UnCLIPScheduler

* finish x attn

* finish

* remove more

* rename condition blocks

* clean more

* Apply suggestions from code review

* up

* fix

* [tests] UnCLIPPipelineFastTests

* remove unused imports

* [tests] UnCLIPPipelineIntegrationTests

* correct

* make style

Co-authored-by: Patrick von Platen <[email protected]>
yoonseokjin pushed a commit to yoonseokjin/diffusers that referenced this pull request Dec 25, 2023
* [wip] attention block updates

* [wip] unCLIP unet decoder and super res

* [wip] unCLIP prior transformer

* [wip] scheduler changes

* [wip] text proj utility class

* [wip] UnCLIPPipeline

* [wip] kakaobrain unCLIP convert script

* [unCLIP pipeline] fixes re: @patrickvonplaten

remove callbacks

move denoising loops into call function

* UNCLIPScheduler re: @patrickvonplaten

Revert changes to DDPMScheduler. Make UNCLIPScheduler, a modified
DDPM scheduler with changes to support karlo

* mask -> attention_mask re: @patrickvonplaten

* [DDPMScheduler] remove leftover change

* [docs] PriorTransformer

* [docs] UNet2DConditionModel and UNet2DModel

* [nit] UNCLIPScheduler -> UnCLIPScheduler

matches existing unclip naming better

* [docs] SchedulingUnCLIP

* [docs] UnCLIPTextProjModel

* refactor

* finish licenses

* rename all to attention_mask and prep in models

* more renaming

* don't expose unused configs

* final renaming fixes

* remove x attn mask when not necessary

* configure kakao script to use new class embedding config

* fix copies

* [tests] UnCLIPScheduler

* finish x attn

* finish

* remove more

* rename condition blocks

* clean more

* Apply suggestions from code review

* up

* fix

* [tests] UnCLIPPipelineFastTests

* remove unused imports

* [tests] UnCLIPPipelineIntegrationTests

* correct

* make style

Co-authored-by: Patrick von Platen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants