WaveNeXt 2: ConvNeXt-Based Fast Neural Vocoders With Residual Denoising and Sub-Modeling for GAN and Diffusion Models




Demo samples

English female
Ground truth
Wavenext1 HiFi-GAN
Wavefit (2 iterations) GAN-WaveNeXt 2 (2 iterations)(proposed)
Wavefit (3 iterations) GAN-WaveNeXt 2 (3 iterations)(proposed)
Wavefit (4 iterations) GAN-WaveNeXt 2 (4 iterations)(proposed)
Wavefit (5 iterations) GAN-WaveNeXt 2 (5 iterations)(proposed)
FastDiff wo/ sub-modeling Diff-WaveNeXt 2 wo/ sub-modeling
FastDiff w/ sub-modeling Diff-WaveNeXt 2 w/ sub-modeling (proposed)

English male
Ground truth
Wavenext1 HiFi-GAN
Wavefit (2 iterations) GAN-WaveNeXt 2 (2 iterations)(proposed)
Wavefit (3 iterations) GAN-WaveNeXt 2 (3 iterations)(proposed)
Wavefit (4 iterations) GAN-WaveNeXt 2 (4 iterations)(proposed)
Wavefit (5 iterations) GAN-WaveNeXt 2 (5 iterations)(proposed)
FastDiff (4 steps) wo/ sub-modeling Diff-WaveNeXt 2 (4 steps) wo/ sub-modeling
FastDiff (4 steps) w/ sub-modeling Diff-WaveNeXt 2 (4 steps) w/ sub-modeling (proposed)