| Ground truth |
| Wavenext1 | HiFi-GAN |
| Wavefit (2 iterations) | GAN-WaveNeXt 2 (2 iterations)(proposed) |
| Wavefit (3 iterations) | GAN-WaveNeXt 2 (3 iterations)(proposed) |
| Wavefit (4 iterations) | GAN-WaveNeXt 2 (4 iterations)(proposed) |
| Wavefit (5 iterations) | GAN-WaveNeXt 2 (5 iterations)(proposed) |
| FastDiff wo/ sub-modeling | Diff-WaveNeXt 2 wo/ sub-modeling |
| FastDiff w/ sub-modeling | Diff-WaveNeXt 2 w/ sub-modeling (proposed) |
| Ground truth |
| Wavenext1 | HiFi-GAN |
| Wavefit (2 iterations) | GAN-WaveNeXt 2 (2 iterations)(proposed) |
| Wavefit (3 iterations) | GAN-WaveNeXt 2 (3 iterations)(proposed) |
| Wavefit (4 iterations) | GAN-WaveNeXt 2 (4 iterations)(proposed) |
| Wavefit (5 iterations) | GAN-WaveNeXt 2 (5 iterations)(proposed) |
| FastDiff (4 steps) wo/ sub-modeling | Diff-WaveNeXt 2 (4 steps) wo/ sub-modeling |
| FastDiff (4 steps) w/ sub-modeling | Diff-WaveNeXt 2 (4 steps) w/ sub-modeling (proposed) |