Skip to content

Latest commit

 

History

History
85 lines (53 loc) · 2.13 KB

2020.09.21_DiffWave.md

File metadata and controls

85 lines (53 loc) · 2.13 KB

DiffWave

基本信息
  • 标题: "DiffWave: A Versatile Diffusion Model for Audio Synthesis"
  • 作者:
    • 01 Zhifeng Kong,
    • 02 Wei Ping,
    • 03 Jiaji Huang,
    • 04 Kexin Zhao,
    • 05 Bryan Catanzaro
  • 链接:
  • 文件:

Abstract: 摘要

In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional and unconditional waveform generation. The model is non-autoregressive, and converts the white noise signal into structured waveform through a Markov chain with a constant number of steps at synthesis. It is efficiently trained by optimizing a variant of variational bound on the data likelihood. DiffWave produces high-fidelity audios in different waveform generation tasks, including neural vocoding conditioned on mel spectrogram, class-conditional generation, and unconditional generation. We demonstrate that DiffWave matches a strong WaveNet vocoder in terms of speech quality (MOS: 4.44 versus 4.43), while synthesizing orders of magnitude faster. In particular, it significantly outperforms autoregressive and GAN-based waveform models in the challenging unconditional generation task in terms of audio quality and sample diversity from various automatic and human evaluations.

1·Introduction: 引言

2·Related Works: 相关工作

3·Methodology: 方法

4·Experiments: 实验

5·Results: 结果

6·Conclusions: 结论