Cleanlab
p/cleanlab
Data you can trust
Cris
Trustworthy Language Model (TLM) — Add trust to any LLM at scale.
4
TLM solves the biggest problem with productionizing GenAI: reliability/hallucinations. Get more accurate outputs than GPT-4, along with trustworthiness scores. Enabling reliable LLM applications like text generation, data enrichment, and RAG at scale.
Replies
Best
Anish Athalye
Hi Product Hunt! We’re super excited to share our new solution for LLM reliability with you. LLMs show great promise as a key component of new applications like AI-powered customer service chat-bots, AI-powered coding assistants, AI-powered data transformation and structured data extraction, and more. However, while nearly every enterprise is experimenting with LLMs, only a small fraction have successfully deployed LLMs for production applications because of a key issue with today’s LLMs: their unreliability and tendency to produce “hallucinations”, or bogus outputs. These hallucinations are a show-stopper for many applications, and early adopters of LLMs for production applications have been bitten by this. Air Canada’s rogue AI chatbot promised customers refunds against airline policies, and a court ruled that the airline must honor the promise (https://thehill.com/business/447...). A lawyer used ChatGPT to help prepare for a court case and now has to answer for its bogus citations (https://www.nytimes.com/2023/05/...). We built Trustworthy Language Model (TLM) to close the gap, addressing the key challenge for deploying LLMs. TLM builds on top of existing LLMs by improving their accuracy and also providing a trustworthiness score for the output, enabling production AI applications. Through extensive benchmarking, we’ve shown that TLM gives higher accuracy than existing LLMs like GPT-4, and its trustworthiness scores are well-calibrated. Learn more about how and why we built TLM in our blog post: https://cleanlab.ai/blog/trustwo... We’re excited to see what the community builds with TLM! Happy to answer any questions you have about TLM, LLM reliability, or data curation / data-centric AI more broadly. —Anish, on behalf of the Cleanlab Team
Avkash Kakdiya
I came across Trustworthy Language Model (TLM) on Product Hunt and wanted to extend my congratulations on its launch.
Albert
Congratulations on the launch of Trustworthy Language Model (TLM), Anish, Cris, and Emily! It's exciting to see a solution tackling the reliability challenges of LLMs head-on. The issue of 'hallucinations' in AI outputs is indeed a critical one, and TLM seems like a game-changer in this space. I'm curious to know more about the benchmarking process for TLM. How did you ensure that the trustworthiness scores are well-calibrated, and what measures were taken to achieve higher accuracy compared to existing LLMs like GPT-4? Looking forward to exploring the potential of TLM in enabling more reliable AI applications.
Emily Barry
@mashy Thanks Albert! We ran benchmarks with 5 Q&A datasets across different domains (world knowledge, school exams, math, medical diagnosis, …) and measured the ability of TLM trustworthiness scores to detect bad LLM responses with high precision/recall, as well as the accuracy of LLM vs TLM responses. You can find more details/results in our research blog, especially if you go through the Appendix: https://cleanlab.ai/blog/trustwo...