Temporal
p/temporal
Develop failure-proof applications
Chris Messina
Temporal Cloud — Develop failure-proof applications
Featured
35
Avoid the complexity and operational overhead of building your own stack to manage failures, network outages, flaky endpoints, long-running processes and more, to ensure your workflows never fail.
Replies
Maxim Fateev
👋 Greetings, Product Hunt community! I’m @maximfateev the Co-Founder and CTO of Temporal. Temporal lets developers build failure-proof applications in the programming language of your choice. API timeouts, network outages, and server crashes are abstracted away so your code stays laser-focused on its intended function. Retries, state management, long-running processes, and more, are handled for you — automatically. Today, I’m excited to launch Temporal Cloud, the managed service version of our popular open source framework. We already run millions of Temporal workflows for high reliability and high scalability workloads with customers like OpenAI, Nvidia, Cloudflare, and Netflix, and now we’re ready to offer our self-serve offering to the public. Among others, we’ve seen Temporal used for all sorts of interesting use cases, including: 💰 Payment Processing 🤖 AI / ML Orchestration 🛒 Order Management & Bookings 👩‍🦰 Lifecycle Management 💻 Infrastructure Management and CI/CD I launched the first version of this platform as Tech Lead of Amazon Simple Queue Service (SQS) in 2004. Following that, I joined with Samar Abbas at Amazon in 2009 to create Amazon Simple Workflow Service (SWF). Samar then applied those learnings at Microsoft, creating the Durable Task Framework at Azure, and then at Uber, we reunited once more to create Cadence, which we open sourced as Temporal. That means that Temporal benefits from over 20 years of learnings from AWS, Microsoft, and Uber! My mission is to free developers from writing plumbing logic. Engineering teams should focus on writing code that drives business value, not code that covers their asses. Worse, over time, these mundane projects slow developer momentum and inhibit risk taking. If you want to unlock your “10x developers”, you need Temporal to do that grunt work. I’m so confident in the transformation your team will experience with Temporal, that for a limited time, we’re offering $1,000 in credits to try Temporal Cloud. Join over 350K active Temporal developers in building failure-proof applications. We’d love to hear any of your feedback on what we’ve built, as well as any unique use cases you might have. Leave a comment here, and then introduce yourself in our 14K+ developer-strong community on Slack: https://t.mp/slack
Clair Byrd
@maximfateev what were some of the “early inspirations” for solving this problem? The comment about “not writing code to cover their asses” 😂 is providing a hint but would love to hear more
Maxim Fateev
@theclairbyrd It was a bit of a journey. Back at Amazon, I was effectively the tech lead for their Pub/Sub infrastructure. Originally, we built queues on top of Oracle, and then designed our own distributed storage engine, which I believe is still used by Amazon SQS today. Since all of Amazon’s backend at that time ran on Pub/Sub, I ended up talking to every team about their pain points. It was clear that we needed orchestration; Pub/Sub is not the right way to integrate large backend systems. Around that time, Amazon Web Services launched, and through working with distributed systems programming, the need for more “meta” abstractions such as Workflows and Activities became clear, along with the importance of a Workflow History to track each step and allow reliable resumption after a failure. I talked about some of Temporal’s history recently in one of Temporal’s “Spooky Stories” sessions in October:
Chris Messina
Top Hunter
Hunter
This is amazing developer infra I had never heard of ... whose legacy spans Amazon, Microsoft, and Uber devops teams. If you want to focus on writing code that provides business value rather than mere plumbing logic, Temporal, and now Temporal Cloud, is worth checking out. Imagine being able to outsource all that plumbing code that handles API misses and retries, service unavailability, and the other annoying quirks of software development. How much more flow state could you obtain? That's the entire purpose of Temporal. Check it out!
robholland
How can we use Temporal to integrate with legacy systems that aren't viable to modernise?
Maxim Fateev
@robholland Temporal provides a powerful way to integrate with legacy systems that may not be viable to modernize fully. By wrapping calls to the legacy system within a single Temporal Workflow, you gain reliable retries and state management right from the start, which can make legacy interactions more resilient. If you have some flexibility to modify the legacy code over time, you can gradually split it into distinct Activities within the Temporal Workflow. This modular approach allows more efficient retries, as only the affected part of the workflow is retried if a particular segment of the legacy system fails. For cases where the legacy system cannot be adjusted at all, our new Nexus platform provides a reliable interface layer. Nexus can create a consistent and resilient interaction point, whether you’re connecting to other Temporal clusters or legacy/third-party systems, simplifying integrations across disparate technologies.
André J
Intriguing product promise. What are some awesome projects out there in the wild that has been built with this? like top3?
Taylor Khan
@sentry_co We have a huge list of use cases on our website: https://temporal.io/in-use These are use cases where companies have announced their use of Temporal publicly: there are also many interesting cases where the use of Temporal is not public knowledge. A lot of companies participate in our community slack so that's a great place to dig into how engineers and companies are using Temporal privately. My personal "top 3" of the use cases that are public knowledge: - Every Snap Story runs on Temporal
- Turo powers the reservations of >1.5 billion cars via Temporal
- Qualtrics is using Temporal to simplify cross-cloud integration of microservices
If you want to see even more we have tons of content on our Youtube channel where you can find past talks from our Replay conference (we are currently publishing the presentations given at Replay 2024): https://www.youtube.com/@Temporalio
Micha Cassola
Sounds really really cool. What do you use for high availability? I will take a look now definitely! 🤩
Taylor Khan
@michacassola This is a huge topic: we use a combination of cellular architecture, guarantee writes occur across availability zones, plus a ton of automations to handle graceful failover, custom database(s), and users can even enable cross-region failover (which we call a Multi-Region Namespace aka MRN) so that you can withstand regional outages. There's a lot more to say. For more detail I'd start with this talk by Sergey Bykov:
Maxim Fateev
@michacassola Temporal is architected not to have a single point of failure, assuming it uses a highly available datastore. The open source Temporal project integrates with MySQL, PostgreSQL, SQLite, and Cassandra. The Temporal Cloud implemented a custom datastore that provides much higher performance, scalability and availability that any of these databases. When extreme availability (99.99%+) is required, Temporal implements multi-region replication feature. It asynchronously replicates data across multiple completely independent clusters and can even use different datastore types.
Micha Cassola
@maximfateev Very nice indeed! Can I self-host Temporal on random Servers?
Maxim Fateev
@michacassola Yes, Temporal service itself is just a Go binary that you can run practically anywhere. It requires a DB for persistence.
Germán Merlo
Heheh amazing Maxim! Impressed me that part about failure-proof. How you can guarantee that? Being in this industry more than 15 years can't imagine how someone can do it. I'm not saying you cannot, just honestly asking!
Ikenna Paschal
Been using Temporal for years now. Can't do without it anymore. Congrats on the launch!
Katya Prusakova
Super interesting tool! I will definitely use it next time I need to code something event driven 🔥
Giacomo
Congrats! I love temporal
Clair Byrd
@venier that’s awesome to hear—are you using it for stuff right now?
Samuel Bissegger
Congratulations on your launch, @maximfateev and Team! Temporal Cloud seems really useful to improve stability and consistency! I wish you all the best!
Jason De Jesuz
Interesting to see this here now. I integrated temporal into our product close to 2 years ago. And it’s been a great tool for us to monitor critical workflows around core features, building out background tasks, and ensuring that jobs get completed even when they face the fallacies. Great product, unfortunately it’s a bit too expensive for bootstrappers, which is disappointing, but great product for startups with runway and bigger orgs. 🫶🎉
Maxim Fateev
@jasondejesuz Hi Jason, thank you for your support and it’s great to hear how Temporal has been critical to building out your products. At Temporal, we love startups! That’s why we’re delighted to offer our Cloud Startup Program. All startups with less than $30mil in funding can receive $2400 in credits for free to start building awesome, failure-proof applications. With this, you also get access to our team for onboarding, design reviews, and technical assistance
Jason De Jesuz
@maximfateev It’s definitely been a lifesaver more than once. That’s awesome! Thanks for sharing this. 🫶🙏
Jai from Worksaga
Congratulations @mike_partin1 and team on the launch! This product simplifies managing failures and network issues greatly.
Huzaifa Shoukat
Congrats on launching Temporal Cloud! 🎉 How do you handle complex error handling and retries in your workflows?
Angela Zhou
@ihuzaifashoukat Great question! Since this requires more of a thorough answer than I can provide just in a comment, check out our free course on error handling and retries in Temporal: https://learn.temporal.io/course...
Maxim Fateev
@ihuzaifashoukat Temporal relies on the concept of “durable execution”. So your workflow is just a code in the programming language of your choice and the native programming language techniques can be used for error handling. In the languages that support exceptions usually try-catch-finally is used for error handling. Go relies on the explicit passing of errors. The failures are propagated across process boundaries seamlessly. So you can get a stack trace that includes failures from multiple processes written in multiple languages. This is really cool assuming that all communication between this processes is fully asynchronous. See the newly released course on error handling for more details: https://learn.temporal.io/course... Retries of activities that are making external API calls is done automatically according to the specified exponential retry policy. Note that the duration of retries (as well as the duration of an activity execution) is practically unlimited unless the limit is explicitly specified.
David Cardenas Codriansky
Big congrats on the launch, @maximfateev ! Temporal Cloud sounds like a game-changer for simplifying complex workflows—love the focus on reliability and handling failures seamlessly. How does it manage scaling for really high-traffic systems or processes with a lot of dependencies? Excited to see how it keeps everything running smoothly even in the trickiest scenarios!
Taylor Khan
@davidcardenas There are three parts of the scaling equation when using Temporal: 1) The Temporal server 2) The Temporal workers 3) Dependencies 1) Regarding the server: the Cloud offering takes care of that for you. We've got incredibly high-volume use cases including social media giants and big name payment processors. There is some capacity planning involved at this scale: we work with customers to help figure out how much is needed so its mostly painless. 2) Regarding Workers: workers run on customer infrastructure but these are part of what determines max throughput/concurrency control. Cloud customers can get tuning sessions to help adjust for the specifics of their use case(s). We have lots of autoscaling goodies in the works so this will be child's play for most cases in the near future. 3) We can't scale dependencies, that's a user capability. That said we do provide the hooks and tools on the worker side to protect limited resources (such as GPUs or rate-limited APIs) from getting too much traffic. These tools are getting better every month. You may want to check out this talk regarding our scaling capability on the server/data side: https://temporal.io/resources/on...
Maxim Fateev
@davidcardenas Temporal was designed from the ground up to scale practically indefinitely. With our Cloud Datastore we ran tests for a single namespace to up to a million updates per second. We didn’t hit any real system limits and could go higher. I gave the talk at Facebook @Scale conference that gave overview of the Temporal internal architecture: Designing a Workflow Engine From First Principles.
Stu Kendall
Love the demo video linked above. Technical and descriptive. I wish all product launches had something like this. Nice job to the Temporal team.
Tomas Jasovsky
Congrats on the launch, always so exciting to go out with new product releases. Curious, can you say more about how AI companies are using Temporal Cloud to improve AI in production?
Yimin Chen
@hellodusko AI products rely heavily on standard engineering processes that are often unrelated to AI itself, such as improving the reliability and efficiency of training large language models (LLMs) and optimizing distributed compute resources. Temporal is an ideal technology to accelerate these engineering challenges. It automatically handles retries, making transient errors a non-issue, and provides excellent visibility into any long-lasting failures during training or any other processes. This visibility simplifies troubleshooting, boosts the engineering team’s productivity, and enables faster iteration.
Maxim Fateev
@hellodusko Thanks for the support! AI companies choose Temporal Cloud for its reliability in managing complex workflows, especially when it comes to expensive, resource-intensive tasks like GPU processing. With over 90 companies in the AI space using Temporal Cloud, they’ve found it essential for ensuring pipelines run to completion without costly reprocessing due to transient failures. One key advantage is Temporal’s task queue system, which routes only GPU-dependent tasks to specific nodes, conserving GPU resources for when they’re truly needed. For example, a common ML workflow might spin up a dedicated GPU node at the start of a pipeline, direct all necessary tasks to it, and then automatically shut down the node when processing is complete. This efficient use of Temporal Cloud not only prevents idle GPU time but also reliably shuts down resources once the workflow finishes, helping teams manage costs effectively.
Chris Kielkopf
Durable execution of Temporal is one of the best tools I've seen for developer productivity in a long time!