MARS5-TTS is one of the most impressive AI voice models I’ve tested. The AR-NAR pipeline really helps in generating natural prosody, and the ability to tweak speech with punctuation and capitalization is a great touch. Deep cloning delivers solid voice replication, though it takes a bit longer.
What stands out is how well it handles expressive speech, even in tricky cases like sports commentary or anime-style voices. Still, inference stability can improve, and long-form synthesis would be a great addition. Looking forward to seeing how CAMB.AI refines it further.