A fast TTS architecture with conditional flow matching
This project is maintained by shivammehta25
We propose 🍵 Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses conditional flow matching (similar to rectified flows) to speed up ODE-based speech synthesis. Our method:
See below for audio examples, or read our ICASSP 2024 paper for more details. Code is available in our GitHub repository, along with pre-trained models.
You can also try 🍵 Matcha-TTS in your browser on HuggingFace 🤗 spaces.
Click the buttons in the table to load and play the different stimuli.
Currently loaded stimulus: MAT-10 : Sentence 1
Audio player:
Transcription:
It had established periodic regular review of the status of four hundred individuals;
| System | Condition | Sentence 1 | Sentence 2 | Sentence 3 | Sentence 4 | Sentence 5 | Sentence 6 |
|---|---|---|---|---|---|---|---|
| Vocoded speech |
VOC | |
|
|
|
|
|
| Matcha-TTS | MAT-10 | |
|
|
|
|
|
| MAT-4 | |
|
|
|
|
|
|
| MAT-2 | |
|
|
|
|
|
|
| Grad-TTS | GRAD-10 | |
|
|
|
|
|
| GRAD-4 | |
|
|
|
|
|
|
| Grad-TTS+CFM | GCFM-4 | |
|
|
|
|
|
| FastSpeech 2 | FS2 | |
|
|
|
|
|
| VITS | VITS | |
|
|
|
|
|
| System | Sentence 1 | Sentence 2 | Sentence 3 |
|---|---|---|---|
| Matcha-TTS | |||
| Grad-TTS | |||
| Grad-TTS + CFM |
@inproceedings{mehta2024matcha,
title={Matcha-{TTS}: A fast {TTS} architecture with conditional flow matching},
author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
booktitle={Proc. ICASSP},
year={2024}
}