Davit Buniatyan

Deep Lake - AI Knowledge Agent — Deep Research on Your Multi-Modal Data

Featured
68
‱
Need to find answers to hard questions across multiple sources, including your private data? Use our Knowledge Agent, powered by AI search to scan up to billions of rows of any data - images, PDFs, text, tables and more, and provide a well-researched answer.

Add a comment

Replies
Best
Davit Buniatyan

Hi Product Hunt!


I'm Davit Buniatyan, CEO of Activeloop (YC S18). We're introducing Deep Lake AI Knowledge Agent, which conducts Deep Research on your data, no matter its modality, location, or size. Deep Lake supports multi-modal retrieval from the ground up. It uses vision language models for data ingestion and retrieval so that you can connect any data (PDFs, images, videos, structured data, etc.) stored anywhere, to AI.


Deep Lake can search your data from S3, Dropbox, and GCP. Over time, it learns from your queries, tailoring the results to your work!

Here are some example use cases:


1. Financial analysis

Connect earnings call transcripts, data from the Bloomberg terminal, and PowerPoint from the earnings reports or annual reports to analyze thousands of companies!


2. Scientific research

Connect patient EHR data, medical research (both public and internal), and lab results to discover new drugs (this is what a few Fortune 500 and leading biotech companies are doing with us already).


3. Legal stuff like patent search and generation, and contract review.

Cross-reference diagrams, scans of documents, invoices, etc. and ask questions over those! A few companies use us to search across millions of patents worldwide and generate novel, defensible patents! Examples here and here.


You may ask, what is our superpower?


There's a long answer, and a short answer (I'll let Sasun, our Director of Engineering handle the more technical part). But in short, I've spent a good chunk of my time at Princeton researching how to store complex data and connect it to AI.

TL;DR - we're multi-modal (i.e., handle all kinds of data, not just text or vectors) and highly accurate!


This is hard to achieve, but there's a way to store it in a way that's AI-native - and represent data that is unstructured in a more columnar way. At Activeloop, we figured out how to connect any data from your storage and process it in a way to extract as much information from it as possible without complicated OCR pipelines. Then, we store it in an AI-native format and can highly efficiently (more on that from Sasun) retrieve it from the storage, querying across multiple datasets at once.


With the model reasoning improving (our Agent can work with any model, closed-source or otherwise), we've unlocked the missing piece -> we provide the best-in-class AI search, while smart models help us generate answers to complex questions we find the right evidence for!


I'm very happy to share our hard work with you - please ask me anything about the product, our journey through YC to here, and more.

Vahan
Maker

@david_buniatyan very exciting!

Davit Buniatyan

@vahan25 indeed!

Levon Ohanyan

@david_buniatyan This is absolutely brilliant!


Azat Manukyan

@david_buniatyan this is fantastic, really excited to see it's happening

Ashot Arzumanyan
How “battle-tested” is the product?
Davit Buniatyan

@ashotarzumanyan it's in production with Fortune 500 companies and organizations conducting frontier drug discovery research! However, we're always optimizing query logic and reasoning, retrieval capabilities, so let us know if you have any feature you'd like us to implement!

Ashot Arzumanyan
🔌 Plugged in
@david_buniatyan do you have ongoing use cases in fintech?
Davit Buniatyan

@ashotarzumanyan good question! Actually, surprisingly we've seen a huge uptick in interest from fintech related use cases.


While classic RAG/GenAI solutions focus on earnings calls, SEC filings, etc., and more naive question answering, we are able to pick up on insights that aren't included by those approaches. For instance, if in annual report you have images with figures, these might sometimes contain more data (e.g. comparison of YoY growth). This helps companies create more accurate, in-depth research reports.


VCs are interested, too!

Mikayel Harut
@ashotarzumanyan we have one big case study coming up very soon on that front. definitely deep research is sonething that is required for fundamemtal analysis, outlook, or even basic report generation tasks.
Mikayel Harut

Hey folks,


Mikayel here! I am the voice (and the face) on the launch video and Activeloop's Head of Marketing and Growth.


I’d like to answer a few common questions that I’ve heard from talking to 200+ early adopters in person. It should give you context on how you can get the most out of Deep Lake AI Knowledge Agent.

How is it different from other search/RAG/Deep Research tools?

Fun fact #1 Deep Lake has its roots in neuroscience.
Fun fact #2 Deep Lake creators trained one of the earliest ‘large language models’ in Silicon Valley post YC S18.

Why do I mention this? Just to sprinkle some trivia, mainly, but also to showcase that our team knows their sh*t! We've spent the last 7 years building for this moment - rethinking how to organize unstructured data stored in different places and connect it to AI.


That's why Deep Lake, in comparison to others is:

  1. Truly multi-modal (i.e. detects more information to feed into AI)

  2. Works on private data (vs OpenAI, for instance, that currently doesn't).

  3. Works on data at any scale, and across clouds (or locally)!

  4. Has a bring-your-own model feature (to be released soon) that allows users to choose which reasoning LLM (e.g. open-source or close source) to use.

I've summarized more differences here.


Are you releasing an API for Deep Research?
Yes
 As a matter of fact, whoever comments under this will get early access from yours truly.

Can I share links to my conversations?

Yes, you can -> e.g.https://chat.activeloop.ai/mikayel/conversations/67ba1696a0a3d652b5b5ebe8 (need to be logged in to search).
____
I am thrilled to finally go live on Product Hunt after almost six years of building (well, this product capability took less to build, but it has all culminated in this point). Data infrastructure for AI is really freaking tough to build. Kudos to our insanely talented engineers for developing a rocket engine to rival giants like OpenAI (while the rocket ship is still flying into outer space towards singularity).


As one of the only non-technical folks on the team, I’d be happy to answer any questions below or in our Slack community (slack.activeloop.ai). Thanks for having us!

Davit Buniatyan

@mikayel_harut great post!

Sasun Hambardzumyan

Hi Product Hunt!


I'm Sasun, Activeloop's (YC S18) Director of Engineering. I've previously co-founded Pixomatic, one of the early successful photo-editing apps. Naturally, one of the things that excites me is how to visualize (and query) unstructured data, like images.


Except
 back in the day, there was no SQL for images.


Then I met @david_buniatyan, who started Activeloop with that mission - make the way complex data - like images, videos, text, etc. stored in a more organized way - and make it easily connectible to AI (for training, and asking questions!).

This comes with a number of exciting technical challenges.

1. Unstructured data is
 well
 unstructured. It's hard to search across such data (imagine saying I want all the images that contain bicycles larger than 200x350 pixels, and two people in them).

Retrieval systems until Deep Lake weren't fit for that.

2. Vector Search is inaccurate.
Achieving accuracy in AI-generated insights is challenging, especially in sectors like legal and healthcare where accuracy is paramount. The issue magnifies with scale—for instance, when searching through the world’s entire scientific research corpus.

Most data lives in data lakes (S3, AWS, GCP)

3. Limited Memory
Bolting a vector index onto traditional database architectures does not provide the scalability required by AI workloads. As the scale of your dataset increases, the memory and compute requirements scale linearly. For datasets that grew past 100M, the cost becomes prohibitive to maintain the index in memory.


My team and I focused on building this as Deep Lake's ‘unfair advantage’, since we're geared more towards analytical cases where the users need to ask questions across complex, large data. As a result, we're up to 10x more efficient than in-memory approaches.

4. AI Agents can fail
 spectacularly 

Not claiming we've totally solved this issue, but if there's even 1% probability of failing or responding inaccurately every time, in a complex, multi-step system there will be this ‘butterfly’ effect where with every additional step, the probability to fail will increase.  

So increasing retrieval accuracy is important - in critical verticals (autonomous driving, life sciences, healthcare, finance) it can be either a matter of life and death, or incalculable losses.


More on this in detail (with benchmarks here).


Feel free to ask me any technical questions on Deep Lake's capabilities, I'd be happy to answer.


Thanks for having us.

Davit Buniatyan

@khustup thanks for being a part of our journey and shipping an amazing product!

Mikayel Harut
@khustup so insightful! what was the hardest feature you had to ship in yoir experience at Activeloop?
Sasun Hambardzumyan

@mikayel_harut I'd say the most challenging part is to index the large scale data on object storage, keeping the balance between latency and scale.

Mikayel Harut
@khustup that is a good point. thanks for all your hard work balancing that line!
Manish Choudhary

This is exciting! I faced similar knowledge base challenges during my time at an omnichannel e-commerce company. Would love to try it out :)

Davit Buniatyan

@manish_choudhary19 thanks a lot! curious to hear which modalities were you querying across or what was the exact use case? keep us posted how you like it.

Sasun Hambardzumyan

@manish_choudhary19 Great! Would love to get your feedback.

Manish Choudhary

I had on ground ops/sales team that used to call for every small query that they had, if they had a solution where they can simply go on a platform and ask queries in their normal language on top of internal documentations

Mikayel Harut
@manish_choudhary19 this should totally address their usecaserigjt out of the box!
Hambardzum Kaghketsyan
💡 Bright idea
Am I right in thinking I can access it directly via APIs with no additional setup? I’m exploring use cases at the intersection of quantitative and fundamental analysis, applying swing trading strategies. Curious if you’ve seen anyone using Deep Lake for similar strategies.
Davit Buniatyan

@hambardzum_kaghketsyan1 we're going to release the Deep Research API soon. Will definitely send you early access, it's coming very very soon.

Davit Buniatyan

For now, you can use Deep Lake without the 'agentic' Deep Research capability, but with the multi-modal search via this tutorial.

Mikayel Harut
@hambardzum_kaghketsyan1 just put you on our waitlist!
Hayk Tepanyan

@david_buniatyan congrats on the Launch!


This one really stands out in todays world of emerging AI agents.

How long did it take to build and release Deep Lake AI Knowledge Agent?

Davit Buniatyan

@tehayk thank you so much for your support! One could say we were building towards this moment since inception of Activeloop almost 7 years ago. A core enabler for AI Knowledge Agent was introduction of Deep Lake v4, that radically increased the cost-efficiency by offloading the index to data lake as well, which helped us use models like ColPali for ingestion and querying (i.e. we use vision-language models to understand data and how it's linked together more deeply without a need of OCR pipelines).


After the launch of v4 in Oct, it was all hands on deck for this!

Renat Gabitov

Deep knowledge retrieval is the future. Kudos to the builders of this tool.

Emanuele

@renat_gabitov Thank you so much! We're excited for what’s ahead!

Mikayel Harut
@renat_gabitov thanks for the support!
Arman Zakaryan

The traditional stack to chat with your own multimodal data is indeed becoming too large to maintain.

Deep Lake is a game changer - it takes away all the complexity, while maintaining similar quality.

Congrats on the launch 🚀

Davit Buniatyan

@armzak thank you so much! Agreed.

Traun Leyden
Launching soon!

Looks like a very powerful tool! Does it support searching over Google Drive as well?

Emanuele

@tleyden Thanks for the question! Not yet, but we're planning to add other connectors soon.

Davit Buniatyan

@tleyden for now, you can integrate Dropbox as well as your favorite cloud provider (AWS, GCP, Azure).

Sargis Karapetyan

Behind every great product there is a great team.
Congrats Davit and the team with the launch.

Davit Buniatyan

@sargis_karapetyan2 thanks a lot, Sargis!

Narek Galstyan

This is exciting! Compelling demo!

I am curious how effective Deep Lake's integrated knowledge retrieval approach is for avoiding hallucinations and finding relevant articles not found by other tools in the same space?

Davit Buniatyan

@ngalstyan4 good question!

I wouldn't say it's possible to completely avoid hallucinations. Hallucinations happen for two reasons: wrong context, wrong answer by model, and right context, but still a wrong answer by a model. In the latter case, we can't do much. But we focus on making the former case obsolete!

How we do this:

  1. Query planning and gathering context from various datasets.

  2. Querying flexibility (choose to do hybrid, vector, keyword search, etc.)

  3. Multi-modality (on ingestion, gaining more depth of insight into what data is about - what is contained in figures, for instance), which helps pass more imoprtant context to the model.

We also learn over time what queries you consider correct, which helps further improve search experience and increase retrieval accuracy. No other vendor can handle this, as well as #3 as well as we do!

Henrikh Kantuni

The accuracy is phenomenal. Kudos to Davit and the Activeloop team for turning fiction into reality!

Mikayel Harut

@kantuni thank you so much! Feel free to share some examples of the questions you've asked. :) thanks for the support

Kay Kwak
Launching soon!

OCR-free retrieval of documents, images, and videos? This truly feels like the next era of AI-driven data utilization! Huge congratulations on your launch! 🎉

Davit Buniatyan

@kay_arkain thank you so much, Kay! You're absolutely right.

Mikayel Harut
@kay_arkain thank you!
Gerasim Hovhannisyan

Your data is your ultimate competitive advantage! Leveraging it effectively isn’t just an option anymore - it’s the key to staying ahead. Exciting to see solutions like Activeloop Agent unlocking its full potential, driving smarter decisions, and creating real impact!


How does it handle data quality and relevance when dealing with diverse sources ?

Emanuele

@gerasimh Thank you for your message! The system is based on a multimodal retrieval system, capable of obtaining the most relevant information in response to the user's query.

Through a process of data analysis and aggregation, it can provide surprisingly accurate answers. All of this is made possible thanks to the performance and flexibility offered by our database, Deep Lake.

Davit Buniatyan

@gerasimh one more additional point to Emanuele's - we learn from user queries over time to suggest more relevant information! And additionally, one surprisingly good way of increasing response quality is vision-language models -> OCR pipelines while performing well, are slightly clunky... Having an end-to-end neural search helps to get full context from the data across modalities, increasing response quality.

Muhammad Waseem Panhwar

@david_buniatyan Congratulations on the Launch.

Do you guys provide any API for deep research of your tool?

Mikayel Harut
@waseem_panhwer yes Muhammad, you can request wailtist spot for the API. it is coming very soon (i will put you on the waitlist!)
Ashot Ayvazyan

Wow, this is exciting! At Cloudchipr, we store a vast amount of data in object storages with diverse structures - from CSVs to time series and key-value data. This is a game-changer for us in generating various general statistics and empowering customers to "talk to with their data" without being restricted by data type.

Emanuele

@ashot_ayvazyan1 Exactly! You definitely need to try our tool and let us know what you think. Your feedback would be incredibly valuable!!

Davit Buniatyan

@ashot_ayvazyan1 thank you so much for sharing the use case - actually, you can copy the fine-grained access from your cloud provider to Activeloop, making it possible to restrict certain users of asking questions on only specific data.

Mikita Aliaksandrovich
Launching soon!

Congrats on the launch of Deep Lake AI Knowledge Agent! The ability to perform deep research across multiple data types and sources is impressive!

Davit Buniatyan

@mikita_aliaksandrovich thanks a lot, Mikita. How would you use it?

Mikita Aliaksandrovich
Launching soon!

@david_buniatyan You're welcome! I'd use Deep Lake AI Knowledge Agent for tasks like analyzing large datasets across various formats, such as research papers, financial reports, and customer feedback!

Artem Harutyunyan
💎 Pixel perfection
Congrats on the launch! What integrations does it support?
Davit Buniatyan

@artem_harutyunyan out of the box, the AI Knowledge Agent integrates with your favorite cloud storage providers (Azure, GCP, AWS), Dropbox (with more storage integrations underway). We also integrate with OpenAI, AWS Bedrock, and other model providers (coming soon)!