FIFO-Diffusion: Generating Infinite Videos from Text without Training through Rolling Video Denoising

### Model/Pipeline/Scheduler description

The authors propose a novel inference technique based on a pretrained diffusion model for text-conditional video generation. Their approach, called FIFO-Diffusion, is conceptually capable of generating infinitely long videos without training. This is achieved by iteratively performing diagonal denoising, which concurrently processes a series of consecutive frames with increasing noise levels in a queue. Specifically, at each denoising step, this method dequeues a fully denoised frame at the head while enqueuing a new random noise frame at the tail.

However, diagonal denoising is a double-edged sword, as the frames near the tail can take advantage of cleaner ones by forward reference, but such a strategy induces the discrepancy between training and inference. To reduce this gap, the authors introduce latent partitioning to reduce the training-inference gap, and lookahead denoising to leverage the benefit of forward referencing. 

The authors demonstrate promising results on existing pretrained text-to-video generation models such as VideoCrafter, OpenSora Plan, and ZeroScope.


### Open source status

- [X] The model implementation is available.
- [ ] The model weights are available (Only relevant if addition is not a scheduler).

### Provide useful links for the implementation

Project Page: https://jjihwan.github.io/projects/FIFO-Diffusion
Code: https://github.com/jjihwan/FIFO-Diffusion_public
Arxiv: https://arxiv.org/abs/2405.11473
Contact: @jjihwan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIFO-Diffusion: Generating Infinite Videos from Text without Training through Rolling Video Denoising #8274

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FIFO-Diffusion: Generating Infinite Videos from Text without Training through Rolling Video Denoising #8274

Description

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions