A Diffusion Transformer model for 3D video-like data was introduced in HunyuanVideo: A Systematic Framework For Large Video Generative Models by Tencent.
The model can be loaded with the following code snippet.
from diffusers import HunyuanVideoTransformer3DModel
transformer = HunyuanVideoTransformer3DModel.from_pretrained("hunyuanvideo-community/HunyuanVideo", subfolder="transformer", torch_dtype=torch.bfloat16)
[[autodoc]] HunyuanVideoTransformer3DModel
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput