Open
Description
As the title suggests, we would like a pipeline that supports all three techniques. Currently, we have standalone pipelines for SparseCtrl and ControlNet. A combination of the two might be interesting to see!
Right now video prediction is hard to control as the new frames are highly dependent on the prompt, if we could use images we would have better/finer control. This pipeline would enable apps like Blender to generate new images based on past reference frames and a depth buffer.
Looking at the code it looks like this is doable but before I try, I would like to get the input and suggestions of more expert people on this possible approach (@a-r-r-o-w or @DN6 :) ):
- make pipeline_animatediff_sparsectrl.py and pipeline_animatediff_controlnet.py as similar as possible so diffing shows as much as common code as possible
- refactor the blocks of code that are different into functions
- have these functions work together in a new single pipeline
Does this make sense?