MARS5 an open-source TTS model to replicate performances (from 2-3s of audio reference) in 140+ languages, even for extremely tough prosodic scenarios like sports commentary, movies, anime & more. Join our Discord discord.com/invite/ZzsKTAKM today!
MARS5-TTS is one of the most impressive AI voice models I’ve tested. The AR-NAR pipeline really helps in generating natural prosody, and the ability to tweak speech with punctuation and capitalization is a great touch. Deep cloning delivers solid voice replication, though it takes a bit longer.
What stands out is how well it handles expressive speech, even in tricky cases like sports commentary or anime-style voices. Still, inference stability can improve, and long-form synthesis would be a great addition. Looking forward to seeing how CAMB.AI refines it further.