Cleanlab
p/cleanlab
Data you can trust
Anish Athalye
cleanlab — Automatically find errors in ML datasets
1
cleanlab is an open-source framework for machine learning and analytics with messy, real-world data. Based on research from MIT, cleanlab identifies errors in datasets, trains reliable models with noisy data, and more... each with just a few lines of code.
Replies
Best
Anish Athalye
We’re excited to launch cleanlab 2.0 our open-source Python package for addressing data-quality issues in machine learning, automating tasks like finding label errors in datasets. See our launch blog post: https://cleanlab.ai/blog/cleanla... cleanlab started out as a grad student research project at MIT, and it was eventually open-sourced. As we saw data scientists finding the tool useful for real-world applications, and as we did more research that applied the tool to find issues in academic datasets at scale (https://labelerrors.com/), we realized that this was an important real-world problem and decided to spend more time and energy building a useful and usable framework for solving data-quality challenges. We’d love to hear any ideas or feedback from the community, especially from those who face data-quality challenges in their work. The authors of the package, who all have a background in ML research, would also be happy to answer any questions you have related to cleanlab or data-centric AI in this thread.