You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Models/Vocoder/2024.08.14_PeriodWave.md
+4-4Lines changed: 4 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -76,7 +76,7 @@ Although GAN-based models can generate the high-fidelity waveform signal fast, G
76
76
Recently, the multi-band diffusion (MBD) model \citep{roman2023from} sheds light on the effectiveness of the diffusion model for high-resolution waveform modeling.
77
77
Although previous diffusion-based waveform models ([DiffWave](2020.09.21_DiffWave.md); [WaveGrad](2020.09.02_WaveGrad.md)) existed, they could not model the high-frequency information so the generated waveform only contains low-frequency information.
78
78
Additionally, they still require a lot of iterative steps to generate high-fidelity waveform signals.
79
-
To reduce this issue, [PriorGrad](2021.06.11_PriorGrad.md) introduced a data-driven prior and [FastDiff](../Diffusion/2022.04.21_FastDiff.md) adopted an efficient structure and noise schedule predictor.
79
+
To reduce this issue, [PriorGrad](2021.06.11_PriorGrad.md) introduced a data-driven prior and [FastDiff](2022.04.21_FastDiff.md) adopted an efficient structure and noise schedule predictor.
80
80
However, they do not model the high-frequency information so these models only generate the low-frequency information well.
81
81
82
82
Above all, there is no generator architecture to reflect the natural periodic features of high-resolution waveform signals.
@@ -106,8 +106,8 @@ To reduce this issue, we adopt the DWT for more accurate frequency-wise vector f
106
106
107
107
[WaveNet](2016.09.12_WaveNet.md) has successfully paved the way for high-quality neural waveform generation tasks.
108
108
However, these auto-regressive (AR) models suffer from a slow inference speed.
109
-
To address this limitation, teacher-student distillation-based inverse AR flow methods ([Parallel WaveNet](../Vocoder/2017.11.28_Parallel_WaveNet.md_WaveNet.md); [ClariNet](../E2E/2018.07.19_ClariNet.md)) have been investigated for parallel waveform generation.
110
-
Flow-based models ([FloWaveNet](../Vocoder/2018.11.06_FloWaveNet.mdoWaveNet.md); 2018.10.31_WaveGlow.mdWaveGlow.md); 2020.06.11_NanoFlow.mdNanoFlow.md)) have also been utilized, which can be trained by simply maximizing the likelihood of the data using invertible transformation.
109
+
To address this limitation, teacher-student distillation-based inverse AR flow methods ([Parallel WaveNet](../Vocoder/2017.11.28_Parallel_WaveNet.md); [ClariNet](../E2E/2018.07.19_ClariNet.md)) have been investigated for parallel waveform generation.
110
+
Flow-based models ([FloWaveNet](../Vocoder/2018.11.06_FloWaveNet.md); [WaveGlow](../Vocoder/2018.10.31_WaveGlow.md); [NanoFlow](../Vocoder/2020.06.11_NanoFlow.md)) have also been utilized, which can be trained by simply maximizing the likelihood of the data using invertible transformation.
[DiffWave](2020.09.21_DiffWave.md) and [WaveGrad](2020.09.02_WaveGrad.md) introduced a Mel-conditional diffusion-based neural vocoder that can estimate the gradients of the data density.
128
128
[PriorGrad](2021.06.11_PriorGrad.md) improves the efficiency of the conditional diffusion model by adopting a data-dependent prior distribution for diffusion models instead of a standard Gaussian distribution.
129
-
[FastDiff](../Diffusion/2022.04.21_FastDiff.md) proposed a fast conditional diffusion model by adopting an efficient generator structure and noise schedule predictor.
129
+
[FastDiff](2022.04.21_FastDiff.md) proposed a fast conditional diffusion model by adopting an efficient generator structure and noise schedule predictor.
130
130
Multi-band Diffusion \citep{roman2023from} incorporated multi-band waveform modeling into diffusion models and it significantly improved the performance by band-wise modeling because previous diffusion methods could not model high-frequency information, which only generated the low-frequency representations.
131
131
This model also focused on raw waveform generation from discrete tokens of neural codec model for various audio generation applications including speech, music, and environmental sound.
0 commit comments