
Octave TTS — Describe any AI voice and prompt its emotional delivery
by
Ben LangThe first LLM for text-to-speech. While other TTS just “reads” words, Octave grasps their meaning. Create any AI voice with a descriptive prompt, guide its emotional delivery (angrier! more sarcasm!), and bring your stories to life with human-like expression.
43
Hey Product Hunt! I’m Alan Cowen, CEO and Chief Scientist at Hume AI.
We're launching Octave, the first of a new generation of text-to-speech models. Traditional TTS models focus on the mechanical process of turning letters into sounds. Octave isn't a traditional TTS model, but a voice-enabled LLM, trained on 1000x more language. As a result, it understands the cognitive and emotional aspects of human speech. It reads your script like a human actor, delivering realistic emotions, sarcasm, pace, word emphasis, and more.
And unlike any other other TTS system, it can take explicit instructions to generate any voice you describe and modify its emotional tone and speaking style.
Octave is made possible by Hume's research. We're leading the space in voice-enabled LLMs, and we run large-scale psychology studies to help fine-tune our models to generate the right voices at the right time, drawing on a decade of research at the intersection of emotion science and AI.
We’re launching both a platform for creators and an API for developers. We're also launching the Expressive TTS Arena (arena.hume.ai)—a new public benchmark for evaluating emotion-rich, long-form speech generation with instructions.
Ready to try it?
Try Octave: hume.ai
Join our Discord: https://link.hume.ai/discord
Follow our updates: x.com/hume_ai
I’ll be here all day to answer questions and discuss how this technology evolved from our emotion research. Thank you for checking out Octave!
@achume Excited to see how this transforms content creation and AI interactions!
@masump Thank you, Masum!
Lancepilot
Can you describe how Octave's ability to understand the cognitive and emotional aspects of human speech improves its text-to-speech output?
@odeth_negapatan1 When humans speak, they’re actually using a lot of intelligence to predict word emphases, emotional intonations, pacing, and other speaking styles. That’s what separates human speech from past TTS models. Octave brings that intelligence to text to speech for the first time. It implicitly predicts when things are novel, funny, sarcastic, resentful, etc., and adapts its voice accordingly to deliver that text just as a human would.
🥰 Oh this is so cool! When I use 11labs or OpenAIs voice synths, I usually have to record many takes and then remix snippets to get the right tonality and feel. 11labs. Please buy this company 🙏
@sentry_co This is such a common frustration – and exactly the pain point we wanted to solve with Octave ;)
@sentry_co Thanks for the support! ✨
I asked the CEO of 11labs about this problem the other day in the PH forums in an 11labs AMA. He forgot to reply 😅. So I guess they also know about this pain-point.
Spiritory
What sets Octave apart from traditional TTS models, and how does its training on 1000x more language enhance its performance? Congratulations!
@andy_wong4 Hey Andy! Octave has the intelligence of a cutting edge language model, whereas traditional TTS models are trained on a lot less data and don’t understand the contextual relationship between words and vocal sounds.This makes Octave better in predicting how a sentence should sound as if a skilled actor would be reading it. Try giving it really expressive text with no prompt and see what voice it generates or give it acting instructions like “speak in an angry tone” or “whisper” (with appropriate text)!
Shram
This is next-level for text-to-speech! Traditional TTS often feels robotic because it lacks understanding of the emotional weight behind words. But the way Octave treats speech like a human actor, adjusting tone, pacing, and even sarcasm, makes it sound incredibly natural.
Congrats on the launch!
Best wishes and sending lots of wins to the team :) @achume
@whatshivamdo Thank you Shivam!
@whatshivamdo Thanks! We are excited to see the fun voices you create!
Home Assistant
I hacked together a quick prototype to get this added to Home Assistant. Works great with Home Assistant Voice 😎
@balloob That's awesome! If you tweet this out, please tag us so we can check it out!! https://x.com/hume_ai
@balloob That's awesome to hear Paulus! Feel free to share what you're building on our Discord: https://link.hume.ai/discord
@balloob let’s goooooo 🚀
Now it’s the time to use more human like product to see how it replies to my questions related to all facets of life from professional to personal.
@ajay27324 Thanks Ajay! Enabling AI to engage in more rich and human-like speech, communication, and understanding of our expressions is one of our main goals. Excited to hear how you use Octave!
@ajay27324 Awesome! Let us know what you think. Thanks Ajay :)
Congrats on the launch, Alan and the Hume AI team! Octave’s emotional TTS is a game-changer for content creation. Excited to see where this goes! 🎉
@mikita_aliaksandrovich Hey Mikita! Thanks so much. What kinds of new content do you foresee Octave enabling?
@mikita_aliaksandrovich Thank you!
yoooo this is really sick!! i think this is going to have a big impact on independent storytellers and videographers
@catapultingcupcakes Thanks, Elle! We are also really excited to see how this enables new workflows and possibilities for storytellers.
@catapultingcupcakes Thank you! We appreciate you taking the time to check us out.
Unreal how good this is !
@rashish_tandon Thanks, Rashish!!
So exciting to see this launch! Such an epic achievement from the whole Hume team 💚
@alice_baird Thanks Alice!!
I trained a TTS system myself last year and I am 100% amazed by how well Octave sounds! :)
@jpc Thanks Jakub! Means a lot coming from you.
Congratulations on launching Octave TTS! It's impressive how it goes beyond standard text-to-speech by understanding context and emotion.
How do you ensure the emotional delivery aligns with user intent, and what measures are in place for continuous improvement in voice accuracy?
@shea2 Thanks for the kind words, Shea! Contextually sensitive emotional delivery and expressiveness is ensured by how much we weight feedback, evaluations, and ratings from real humans in our data collection and training process. All of our products are built on methods and approaches from emotion science, in particular the science of how we express ourselves. We plan to continue improving the accuracy and nuance of Octave with further iteration on this approach!
Wowwww I am a big fan!!
@yansong_pang right back at ya, yansong
Congratulation for the launch @achume !
@koshima_satija Thank you, Koshima! We appreciate you checking us out.
Cabana Health
Looks like an awesome product!
@tliu64 Thank you Tom!
Octave is something new in the world of voiceovers! The intonation in voiceovers has always been a challenge, and I think with the intonation feature, it will be much more interesting)
@irina_buzun Thanks for the support! It’s only going to get better from here.