Skip to content

Commit b38450d

Browse files
Add STG to community pipelines (#10960)
* Support STG for video pipelines * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update pipeline_stg_cogvideox.py * Update pipeline_stg_hunyuan_video.py * Update pipeline_stg_ltx.py * Update pipeline_stg_ltx_image2video.py * Update pipeline_stg_mochi.py * Update pipeline_stg_hunyuan_video.py * Update pipeline_stg_ltx.py * Update pipeline_stg_ltx_image2video.py * Update pipeline_stg_mochi.py * update * remove rescaling * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
1 parent 1357931 commit b38450d

6 files changed

+4434
-0
lines changed

examples/community/README.md

+50
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Please also check out our [Community Scripts](https://github.com/huggingface/dif
1010

1111
| Example | Description | Code Example | Colab | Author |
1212
|:--------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------:|
13+
|Spatiotemporal Skip Guidance (STG)|[Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling](https://arxiv.org/abs/2411.18664) (CVPR 2025) enhances video diffusion models by generating a weaker model through layer skipping and using it as guidance, improving fidelity in models like HunyuanVideo, LTXVideo, and Mochi.|[Spatiotemporal Skip Guidance](#spatiotemporal-skip-guidance)|-|[Junha Hyung](https://junhahyung.github.io/), [Kinam Kim](https://kinam0252.github.io/)|
1314
|Adaptive Mask Inpainting|Adaptive Mask Inpainting algorithm from [Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models](https://github.com/snuvclab/coma) (ECCV '24, Oral) provides a way to insert human inside the scene image without altering the background, by inpainting with adapting mask.|[Adaptive Mask Inpainting](#adaptive-mask-inpainting)|-|[Hyeonwoo Kim](https://sshowbiz.xyz),[Sookwan Han](https://jellyheadandrew.github.io)|
1415
|Flux with CFG|[Flux with CFG](https://github.com/ToTheBeginning/PuLID/blob/main/docs/pulid_for_flux.md) provides an implementation of using CFG in [Flux](https://blackforestlabs.ai/announcing-black-forest-labs/).|[Flux with CFG](#flux-with-cfg)|[Notebook](https://github.com/huggingface/notebooks/blob/main/diffusers/flux_with_cfg.ipynb)|[Linoy Tsaban](https://github.com/linoytsaban), [Apolinário](https://github.com/apolinario), and [Sayak Paul](https://github.com/sayakpaul)|
1516
|Differential Diffusion|[Differential Diffusion](https://github.com/exx8/differential-diffusion) modifies an image according to a text prompt, and according to a map that specifies the amount of change in each region.|[Differential Diffusion](#differential-diffusion)|[![Hugging Face Space](https://img.shields.io/badge/🤗%20Hugging%20Face-Space-yellow)](https://huggingface.co/spaces/exx8/differential-diffusion) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/exx8/differential-diffusion/blob/main/examples/SD2.ipynb)|[Eran Levin](https://github.com/exx8) and [Ohad Fried](https://www.ohadf.com/)|
@@ -93,6 +94,55 @@ pipe = DiffusionPipeline.from_pretrained("stable-diffusion-v1-5/stable-diffusion
9394

9495
## Example usages
9596

97+
### Spatiotemporal Skip Guidance
98+
99+
**Junha Hyung\*, Kinam Kim\*, Susung Hong, Min-Jung Kim, Jaegul Choo**
100+
101+
**KAIST AI, University of Washington**
102+
103+
[*Spatiotemporal Skip Guidance (STG) for Enhanced Video Diffusion Sampling*](https://arxiv.org/abs/2411.18664) (CVPR 2025) is a simple training-free sampling guidance method for enhancing transformer-based video diffusion models. STG employs an implicit weak model via self-perturbation, avoiding the need for external models or additional training. By selectively skipping spatiotemporal layers, STG produces an aligned, degraded version of the original model to boost sample quality without compromising diversity or dynamic degree.
104+
105+
Following is the example video of STG applied to Mochi.
106+
107+
108+
https://github.com/user-attachments/assets/148adb59-da61-4c50-9dfa-425dcb5c23b3
109+
110+
More examples and information can be found on the [GitHub repository](https://github.com/junhahyung/STGuidance) and the [Project website](https://junhahyung.github.io/STGuidance/).
111+
112+
#### Usage example
113+
```python
114+
import torch
115+
from pipeline_stg_mochi import MochiSTGPipeline
116+
from diffusers.utils import export_to_video
117+
118+
# Load the pipeline
119+
pipe = MochiSTGPipeline.from_pretrained("genmo/mochi-1-preview", variant="bf16", torch_dtype=torch.bfloat16)
120+
121+
# Enable memory savings
122+
pipe = pipe.to("cuda")
123+
124+
#--------Option--------#
125+
prompt = "A close-up of a beautiful woman's face with colored powder exploding around her, creating an abstract splash of vibrant hues, realistic style."
126+
stg_applied_layers_idx = [34]
127+
stg_mode = "STG"
128+
stg_scale = 1.0 # 0.0 for CFG
129+
#----------------------#
130+
131+
# Generate video frames
132+
frames = pipe(
133+
prompt,
134+
height=480,
135+
width=480,
136+
num_frames=81,
137+
stg_applied_layers_idx=stg_applied_layers_idx,
138+
stg_scale=stg_scale,
139+
generator = torch.Generator().manual_seed(42),
140+
do_rescaling=do_rescaling,
141+
).frames[0]
142+
143+
export_to_video(frames, "output.mp4", fps=30)
144+
```
145+
96146
### Adaptive Mask Inpainting
97147

98148
**Hyeonwoo Kim\*, Sookwan Han\*, Patrick Kwon, Hanbyul Joo**

0 commit comments

Comments
 (0)