Scale Document is Scale AI's platform for secure document processing. We use ML to automatically pre-label documents while our human-in-the-loop technology ensures impeccable accuracy with guaranteed quality SLAs.
Hey everyone—CEO and founder of Scale AI here!
Many startups and other businesses rely on the digitization of documents (PDFs, images, word docs, etc.) to be able to properly serve their customers, from transcribing invoices and receipts to complex documents like W-2s or pay stubs.
Scale Document combines Scale's leading human-in-the-loop platform with our own ML to make the process of document processing significantly more efficient and cost-effective, while also ensuring incredibly high accuracy.
We've worked with innovative companies like DoorDash and SAP on this, and we're very excited to provide infrastructure for every single company.
Happy to answer any questions and get feedback from folks. If you have any private questions, please send us an email at hello@scale.com.
Crazy impressive stuff. We tried to implement something similar a few years back in a defined space, and failed miserably, so huge congrats to the team on pulling this off! 🙌
This is wonderful, congratulations to the team! How does the product deal with PII/PHI — coming from a HIPAA perspective. Any possibility of opening this up for healthcare use cases?
Hey @eriktorenberg — happy to provide background. From the beginning we’ve had a thesis that accelerating the advancement of AI required expanding beyond our initial AV market.
As we talked with a variety of customers, we quickly realized that the document processing market was very fragmented w/ many pain points, primarily quality, cost, and elasticity. Off-the-shelf OCR engines don’t offer the high-quality desired, and often require complex biz rules to be added on top (for example, linking an address to a specific name mentioned). If further quality verification is needed, customers often recruit their own people to verify the results of an OCR model, which requires massive eng and ops investment. Even if customers are able to set this up, they’re often vulnerable to peaks and valleys in data needs, which results in unfulfilled demand or sunk cost.
Building off of our work in AVs, an industry that demands very high data quality, it was natural to apply the complex ML + human review workflows to this problem. Using similar techniques applied to image, video, and LIDAR labeling, we can mitigate bias that comes from ML results while reducing unnecessary costly review (for example, by skipping human review on an annotation that had high confidence from the ML model). Combined w/ our massive WFH workforce, we’re able to offer customers a seamless experience w/o having to make the difficult trade-off b/t quality, cost, and elasticity.
Looks great. Years ago I worked at an electronic health records company that had a process like this to digitizing lab reports, prescriptions, etc, but it wasn't great. Could be a good use case for you guys.
We're investing in ML-augmented document processing at a time when companies like Doordash and SAP, and industries like fintech, insurance, and logistics need it the most.
Excited to be building this alongside the talented team at Scale! We welcome your feedback
This is a product especially needed during the challenging times of COVID. As the need for document processing increases for industries like logistics and finance and amount of human force we can utilize is not enough, ML augmentation is essential.
Continue to love what Scale is doing. Another area that benefits tremendously from the combination of humans and AI to deliver high quality, low cost, ease-of-use, and elasticity. Scale solves problems like this so well that they are an enabling infrastructure layer for other companies to build novel solutions on top of to delight their customers.
Hi @alexandrw congrats on Scale Document! How would you compare using an automated solution like Instabase with Scale's solution. And, would you say there is unique value proposition in using a "human-in-the-loop" technology to solve document processing?
Hey @chaitanyadesai, great question -- as mentioned in an another reply, off-the-shelf OCR engines don’t offer the high-quality desired, and often require complex biz rules to be added on top (for example, linking an address to a specific name mentioned). If further quality verification is needed, you can recruit your own people to verify the results of an OCR model, but it requires massive eng and ops investment. Even if you're are able to set this up, you'll be vulnerable to peaks and valleys in data needs, which results in unfulfilled demand or sunk cost.
Building off of our work in AVs, an industry that demands very high data quality, it was natural to apply the complex ML + human review workflows to this problem. Using similar techniques applied to image, video, and LIDAR labeling, we can mitigate bias that comes from ML results while reducing unnecessary costly review (for example, by skipping human review on an annotation that had high confidence from the ML model). Combined w/ our massive WFH workforce, we’re able to offer customers a seamless experience w/o having to make the difficult trade-off b/t quality, cost, and elasticity.
I'm the ML researcher on Document working on a suite of NLP solutions to accelerate and improve document processing. Excited to be building this out with an amazing team at Scale!
This is fantastic. Congrats to the founders on such an impressive launch. Healthcare in particular could really use this product. As an AI founder/strategist in healthcare myself, I see many exciting applications for this!
Scale Document is a natural extension to Scale's industry proven high performing ML/AI data labeling workflows and pipelines. This is an exciting product launch.
Great work here from the Scale AI, Inc. team! I am looking forward to see the positive impact this products can do! Please send any question to hello@scale.com if you ever need clarifications about this product.
I worked in healthcare data and electronic medical records for many many years and I've seen it tried and fail. This will transform the very archaic healthcare industry and many more industries! Kudos to Scale!
It's a significant leap forward for the whole industry - data-intensive financial and delivery companies will find it extremely useful. Actually, I believe, pretty much anyone who has had to transcribe or annotate a document can find value in it.
Scale Forge