Unified speech and gesture synthesis using flow matching
This project is maintained by shivammehta25
We introduce a new method, Match-TTSG, for diffusion-like joint synthesis of speech and 3D gestures from text. Our main improvements are:
Compared to the previous state of the art, our new method:
To our knowledge, this is the first method synthesising 3D motion using flow matching or rectified flows.
Please check out the examples below and read our arXiv preprint for more details. Code and pre-trained models will be made available in a few weeks.
Click the buttons in the table to load and play the different stimuli.
Currently loaded stimulus: MA-50
Audio player:
Transcription:
I mean it it's not that I'm against it it's just that I just don't have the time and I just sometimes I'm not bothered and that sort of stuff.
Text prompt # | NAT | DIFF | MA | SM | ||
---|---|---|---|---|---|---|
Solver steps | - | 50 | 50 | 500 | 50 | 500 |
1 | ||||||
2 | ||||||
3 | ||||||
4 |
Currently loaded: MA-50 1
I mean it it's not that I'm against it it's just that I just don't have the time and I just sometimes I'm not bothered and that sort of stuff.
Text prompt # | NAT | DIFF | MA | SM | ||
---|---|---|---|---|---|---|
Solver steps | - | 500 | 50 | 500 | 50 | 500 |
1 | ||||||
2 | ||||||
3 | ||||||
4 |
Matched | Mismatched |
---|---|
Currently loaded: MA-50 1
Yeah and then obviously there, there's certain choirs that come down to the church. There's a woman called, I can't remember her name. But she has an incredible voice. Like an amazing voice.
Text prompt # | NAT | DIFF | MA | SM | |
---|---|---|---|---|---|
Solver steps | - | 50 & 500 | 50 | 500 | 500 |
1 | |||||
2 | |||||
3 | |||||
4 |
Currently loaded stimulus: MA-50
Audio player:
Transcription:
The sun slowly rises. Casting a golden hue upon the tranquil landscape. Birds chirp melodiously welcoming the dawn. Nature awakens with a gentle breeze rustling through the leaves creating a harmonious symphony of life. This mesmerizing moment is a reminder that nature's beauty is eternal, an ever-repeating masterpiece that never fails to captivate our senses. As the sun continues its ascent, the world beneath its warm embrace stirs to life. The meandering river, once shrouded in darkness, now glistens like liquid gold, reflecting the radiant morning sky. Each ripple seems to dance to its own rhythm, adding to the symphony of nature's awakening.
Condition | MA-50 | SM-50 | ||
---|---|---|---|---|
Text prompt # | Audio | RTF | Audio | RTF |
1 | 0.0221 | 0.1311 | ||
2 | 0.0213 | 0.1287 | ||
3 | 0.0242 | 0.1304 |