Description
Currently in inpainting pipelines (e.g. StableDiffusionInpaintingPipeline), masked areas are expected to be 100% noise. During the diffussion sampling process the masked areas are then generated w.r.t the visible image. The problem with this process is that it assumes you have no estimate of the underlying masked area such as a rough sketch. This means you have little control other than the text prompt to control the output, in particular structure and colour are difficult to explicitly control.
I propose a parameter to be passed into inpainting pipelines that allow you to skip to a specific point in the sampling process. The parameter will be called skip_time_percentage e.g. if 0.75 is passed then 75% of the sampling steps are skipped. The image and masked regions are initialised by applying forward diffusion to that point in diffusion time. This means the input is actually a noisy version of the initial image rather than pure noise in masked regions.
I have used this before to great success and of course it also speeds up sampling time by skipping a large portion of steps.
This feature has been integrated into the Imagen diffusion pytorch implementation here for some time: https://github.com/lucidrains/imagen-pytorch
Is this something that people would be interested in? I can write up the code and submit a PR if so.