[Docs] refactor text-to-video zero (huggingface#3049)

sayakpaul · web-flow · commit 148b7cfce6bc · 2023-04-12T14:15:26.000+01:00
* fix: norm group test for UNet3D.

* refactor text-to-video zero docs.
diff --git a/pipelines/text_to_video_synthesis/pipeline_text_to_video_zero.py b/pipelines/text_to_video_synthesis/pipeline_text_to_video_zero.py
@@ -374,9 +374,8 @@ def __call__(
                 Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image
                 generation. Can be used to tweak the same generation with different prompts. If not provided, a latents
                 tensor will ge generated by sampling using the supplied random `generator`.
-            output_type (`str`, *optional*, defaults to `"pil"`):
-                The output format of the generate image. Choose between
-                [PIL](https://pillow.readthedocs.io/en/stable/): `PIL.Image.Image` or `np.array`.
+            output_type (`str`, *optional*, defaults to `"numpy"`):
+                The output format of the generated image. Choose between `"latent"` and `"numpy"`.
             return_dict (`bool`, *optional*, defaults to `True`):
                 Whether or not to return a [`~pipelines.stable_diffusion.StableDiffusionPipelineOutput`] instead of a
                 plain tuple.