Matcha-TTS

A fast TTS architecture with conditional flow matching

This project is maintained by shivammehta25

Matcha-TTS: A fast TTS architecture with conditional flow matching

Shivam Mehta, Ruibo Tu, Jonas Beskow, Éva Székely, and Gustav Eje Henter

We propose 🍵 Matcha-TTS, a new approach to non-autoregressive neural TTS, that uses conditional flow matching (similar to rectified flows) to speed up ODE-based speech synthesis. Our method:

See below for audio examples, or read our ICASSP 2024 paper for more details. Code is available in our GitHub repository, along with pre-trained models.

You can also try 🍵 Matcha-TTS in your browser on HuggingFace 🤗 spaces.

Stimuli from the listening test

Click the buttons in the table to load and play the different stimuli.

Currently loaded stimulus: MAT-10 : Sentence 1

Audio player:

Transcription:

It had established periodic regular review of the status of four hundred individuals;

System Condition Sentence 1 Sentence 2 Sentence 3 Sentence 4 Sentence 5 Sentence 6
Vocoded
speech
VOC
Matcha-TTS MAT-10
MAT-4
MAT-2
Grad-TTS GRAD-10
GRAD-4
Grad-TTS+CFM GCFM-4
FastSpeech 2 FS2
VITS VITS

Effect of the number of ODE solver steps

Steps:

System Sentence 1 Sentence 2 Sentence 3
Matcha-TTS
Grad-TTS
Grad-TTS + CFM

Citation information

@inproceedings{mehta2024matcha,
  title={Matcha-{TTS}: A fast {TTS} architecture with conditional flow matching},
  author={Mehta, Shivam and Tu, Ruibo and Beskow, Jonas and Sz{\'e}kely, {\'E}va and Henter, Gustav Eje},
  booktitle={Proc. ICASSP},
  year={2024}
}

MatchaTTS