The best data analysis tools to use in 2024

Barry McCardel and the Product Hunt community surface the top free and paid data analysis tools that everyone is using in 2024.

Rely.ioZerve AISingleStore Kai™TradeIntPROCESIOConsensusSpiky
Barry McCardel

Barry McCardel

Co-founder and CEO of Hex.

Note from the Product Hunt editorial team

Our product category landscape posts are written by active builders who are experts in their fields. We recognize that the most knowledgeable people will rarely by impartial, but we work hard to make sure these articles are even-handed, and any prior interests are called out.

By now, I’ll assume everyone has heard the trope, “data is the new oil”. As someone who actually spent time in the oil and gas industry, I can tell you confidently that this is a terrible analogy. Oil is extremely dangerous, difficult to transport, and you have to burn it to get value. Data is invisible, floats around over wires, and you push buttons on your computer to do math with it.

But this flawed aphorism does capture the excitement and possibility about the value that can be unlocked from data – to better understand the world, automate processes, and develop predictions. There’s whole businesses and industries that rely on being able to wield data, from finance to advertising to, well, energy.

It can be hard to talk about the “data space” because it’s so big. It’s not one thing – there’s actually a ton of different needs, use cases, and tools. And it’s universal. Almost everyone has something in their day-to-day that could be better informed or driven by data.

Welcome to the data stack

A common way to refer to the collection of data tools at any organization is a “data stack”, and – as the name implies – they loosely align to “layers” that can integrate and build on each other.

The big deal in the last few years has been a pattern referred to as the modern data stack”, which was mostly a marketing buzzword signifying tools that assume a cloud data warehouse as the center of the universe.

To explain this, we’re going to work our way up the data stack, starting from the bottom. I’m going to give a high-level of description on each “layer”, mentioning leading tools in the area they’re best known for. I’m sure some of you will read this and write me like “you over-simplified this” or “you forgot my tool”, and yes, I certainly did, but this is just an overview!

Let’s begin.

Storage, querying, and compute

The foundational, core infrastructure of the data stack, where it’s actually stored, accessed, and processed. These tend to be the biggest players, because basically all the workloads happening anywhere else in the data stack run through them.

Data warehouses and lakes

The cornerstone of the modern data stack is a data warehouse. These are special types of databases designed to ingest lots of data, and then make it easy to do analytic workflows on them. Data is stored in a columnar format, meaning it’s easy to ask questions like “how many units did we sell last year?”, which needs to add up values for a column over lots of rows.

This pattern is very different than databases like Postgres or MySQL, which are transactional databases, optimized for rapid read-write workflows on small numbers of columns. These are useful for lots of things, but terrible for analytics – the question above might take 100x or longer to calculate (and potentially cause performance issues for other operations). So, you probably want to move sync data to a warehouse if you’re doing serious analytics.

Data warehouses aren’t necessarily new (shoutout to Teradata!), but having them run in the cloud is. Because the cloud is vast and mighty, data warehouses can now scale basically infinitely, allowing customers to store and query tons of data without worrying about hardware or whatever.

Snowflake, BigQuery (from GCP), and Redshift (from AWS) are the Three Horsemen of the cloud warehouse. Snowflake is the biggest “pure play” warehouse, while BigQuery and Redshift are mostly bolt-ons as part of a bigger ecosystem in their respective cloud offerings. Most people wind up choosing one based on their broader cloud relationship, but you’ll be fine with any of them to start.

You can also use a cloud data lake, which is basically the same idea, but you’re storing the data on commodity blob storage like S3, and querying it with a separate engine. Some folks choose to use this because they want more modularity and flexibility than a cloud warehouse, although historically it’s come with tradeoffs on performance. Databricks, Starburst, and Dremio are popular solutions here.

Data processing engines

All the larger vendors at this layer also offer data processing engines, allowing you to work with data in Python or other languages. Examples are Spark (from Databricks), Snow*park* (from Snowflake), and Ray (from Anyscale). These mostly come into play for folks doing high-scale machine learning workflows, where you need to do highly-parallel computation over large sets of data.

Movement and transformation

Ok - you have a warehouse or lake, but how do you get data into it and make it useful? Enter the the wonderful world of Extraction, Transformation, and Loading (ETL, although pedants sometimes invert it to ELT).

Data extraction

You most likely have your source data in a few different systems, like your own application database (as noted above), or a mix of other SaaS tools like Stripe, Salesforce, or Yo.

In order to sync that data in your warehouse, you’ll want to use a tool like Fivetran or Airbyte. These tools make it easy to configure and manage data pipelines, which is famously hard to do yourself because you’ll be stuck trying to keep up with the vagaries of various APIs and maintaining your own infrastructure.

Orchestration

Ok – now you have the data from your apps landing in the warehouse, but it’s in “raw” form, and likely needs to be cleaned and integrated to get value out of. For example, if you want to see how sales volume breaks down by sales region, you’ll need to join together that data from Stripe and Salesforce.

This is where tools like dbt, Dagster, and Airflow are super useful. They make it easy to set up automated transformations, and turn your data into analysis-ready tables that can easily be queried by other tools.

Activation (née “reverse ETL”)

Ok, you got your data in the warehouse all transformed nice and neat – but now you want to pipe it back somewhere else to make it useful. For example, if you want to show total revenue per customer back in their Salesforce entry, so your sales team can easily find it.

There are tools for that, too! Census and Hightouch are both great solutions for this, and make it easy to send your data back out of the warehouse to another tool. These used to be called “Reverse ETL” but everyone agreed that was a bad name, so “Data Activation” it is!

Could you do this yourself? Sure. But you’d have to write your own scripts to map values from your warehouse back to a SaaS tool APIs, manage schedules, and debug failures, which – trust me – is not fun.

Metadata and quality

Ok – you got your data in your warehouse, you have some pipelines set up, everything is great… but it can get messy, fast. Cloud data warehouses are really scalable, so people wind up jamming a ton of data in and building lots of tables via dbt. So, discovering the right data – and then knowing whether it’s actually trustworthy, and how to use it – can be tough.

Catalogs

As organizations grow, they wind up with a ton of tables, and sorting through them are tough. Data catalogs (or “Metadata Management Platforms”, if you’re feeling fancy) are built to help with this. Atlan, Acryl, Metaphor, and SelectStar are all examples of products in this space. They all allow data teams to organize and govern their data, so other teams and tools know what’s what.

Data observability

Another very common issue is quality. It’s unfortunately common for an upstream system to change something, or someone to enter a value incorrectly, or for a gremlin to crawl into a pipeline and gnaw through one of the queries, creating a cascade of downstream issues and incorrect data.

Data observability tools like BigEye, DataFold, Great Expectations, Metaplane, and Monte Carlo, let you catch and fix issues like this quickly.

Analytics, reporting, and data science

Ok - now to the really sexy stuff - making pretty charts! There’s a lot going on at this layer of the stack, but there’s a few basic genres of tools we can focus on.

Dashboarding tools

When most people think about data analysis, they have some form of a dashboard in their head. For example, if you’re an Executive and want to see how many units were sold last week, you probably want your data team to build you a dashboard.

There have been many generations of solutions here, with Tableau still being the 800 lb. gorilla (which – fun fact – is apparently heavier than any gorilla has ever weighed!), and PowerBI and Looker also being popular solutions.

Many data folks have a love/hate relationship with dashboards. They’re useful for reporting, but some teams can become dashboard factories, stuck doing pretty surface-level work.

Exploratory analytics and data science tools

As it turns out, 80-90% of data work doesn’t fit neatly in dashboards, and that’s where data teams turn to a different set of more flexible tools. As an example, if you wanted to do a deep dive on why you sold what you sold last week, you’d probably be using some combination of a SQL editor, Python notebook, and spreadsheets.

Notebooks, in particular, are a popular format for doing exploratory and data science work, because they break logic up into smaller chunks that can be easily iterated on. Jupyter is a popular open-source variant of this, and there are commercial offerings like Saturn, Colab, and Deepnote that focus on hosted versions.

And here it comes: absolutely shameless plug here; this is what my company Hex does! I won’t launch into a whole sales pitch, but it’s an integrated workspace for analytics and data science that lets you more flexibly and easily get to answers, whether you’re writing code, using no-code, or natural language – and it’s built to be collaborative, so the whole team can work together and keep things organized. You should use it.

Product analytics-specific products

The products above are horizontal and can be used for almost any analysis type. But there’s also a class of products specifically focused on product analytics. These products can track user events and typically have specialized visualizations and workflows focused on things like product paths, click streams, and funnels.

Mixpanel and Amplitude are two big players here, with PostHog and Motif also doing some interesting stuff. You can also use something like Segment to pipe user behavior data into your data warehouse, and analyze with other tools (like Hex!)

Experimentation tools

These are the nerdier, more stats-oriented cousins of product analytics. Tech companies like Airbnb and Netflix have elaborate experimentation platforms, and products like Eppo, Statsig, and LaunchDarkly make it easy for you to incorporate these techniques in your own work, too.

These are especially useful and relevant for AI, where experimenting with models and prompts is the name of the game!

Machine Learning

Speaking of AI, we’re at our last top in the great Data Stack Tour… although some wouldn’t necessarily consider ML part of a “data stack”, as it sits outside the kind of business analytics workflows most of the tools above focus on.

In any case, machine learning models rely on data for their training, and many of the same tools – like orchestration and notebooks – can be useful for these, too. So we’ll talk a bit about them. This is – necessarily – a very condensed overview; you could split this up into much finer grain!

Training

There’s lots of places you can train a model now, including products built into cloud data offerings like Vertex (GCP), SageMaker (AWS), and Databricks, and independents like W&B and Together.

They’re all basically wrappers around the compute primitives, and which you choose will likely have a lot to do with your existing cloud relationships, where you’re storing your data, and your favorite color.

Inference

Ok – your model is trained, now you want to make some predictions. Hosting and running it is the land of Inference platforms, like ModelBit, BaseTen, and Replicate. They all make it easy to put up an open-source, fine-tuned, or custom-built model behind an API, with additional tools for model workflow, management, and monitoring.

AI Evaluation and Observability

Ok you have your model up, you’re running inference, you’re feeling great. But you likely are going to want to iterate on your prompts, debug user issues, and check logs. That’s where a wide menagerie of tools have popped up to help you with “Evals” and observability, including products from LangSmith, Weights and Biases, Braintrust, Autoblocks, Log10, and a bunch of others.

Wrapping up our tour

Wow, that was a lot! But in many ways we just scratched the surface – there’s a ton of little sub-categories up and down the stack, with lots of great projects with interesting ideas.

This can be overwhelming if you’re just getting started building your data stack! But honestly, it’s easy to get lost obsessing over tools. The great thing about the modern data stack is that it’s modular; most products speak SQL, and it’s easy to swap them out over time. So pick some that make sense, get started, and see where it takes you.

 

Promoted
AssemblyAI
AssemblyAI
Ad
Multilingual Speech-to-Text API with superhuman accuracy
Most Loved Products
Rely.io
119 reviews
Rely.io's Internal Developer Portal enables engineering teams to consolidate and unify their engineering stack and gain automated visibility into their software ecosystem. Rely then provides a custom AI assistant trained on the data available in their software catalog so they can automate tasks throughout the software delivery life cycle (SDLC) to 10x their engineering productivity.
Dean Yeong
Phil Mishanin
Ricardo Castro
Dean Yeong and 229 others use Rely.io
Zerve AI
50 reviews
Zerve’s Data Science Development Environment gives data science and ML teams a unified space to explore, collaborate, build and deploy data science & AI projects.
Phil Mishanin
Igor
Amelia Decker
Phil Mishanin and 117 others use Zerve AI
68 reviews
Open-source data pipeline tool for transforming and integrating data. The modern replacement for Airflow. - Integrate and synchronize data from 3rd party sources - Build real-time and batch pipelines to transform data using Python, SQL, and R - Run, monitor, and orchestrate thousands of pipelines without losing sleep
Keith Armstrong
Ivan Dudin
Gabriela Trueba
Keith Armstrong and 159 others use Mage
SingleStore Kai enables up to 100x faster analytics on JSON data within existing MongoDB applications. The easy-to use-API for MongoDB enables developers to use familiar MongoDB commands to achieve real-time analytics for their applications.
Ghost Kitty
Pim Kat
Darly Selby
Ghost Kitty and 94 others use SingleStore Kai™
TradeInt
40 reviews
Access 5+ billion shipment records, 400+ million company profiles. Analyze global shipping data, find import-export opportunities, and optimize supply chains. Search buyers and suppliers worldwide. Stay competitive with data-driven insights.
Andrew Michalski
Bob Shelton
Mei Li
Andrew Michalski and 28 others use TradeInt
PROCESIO
78 reviews
PROCESIO is a no-code, low-code, and full-code platform that lets you integrate different tools, automate workflows, and process data super fast. Not your everyday automation tool, but much more - it's an advanced technology for complex automation use-cases
Madalina B
Alex Gavril
Nataliya Burdeynyuk
Madalina B and 137 others use PROCESIO
Consensus
30 reviews
Ever wonder what the research actually says? Just ask a question and Consensus will instantly read millions of research papers and deliver you answers. From nutrition, to exercise, to economic policy, Consensus makes you an expert on the research in seconds.
Des Traynor
Varna Sri Raman
Eric Olson
Des Traynor and 98 others use Consensus
Spiky
27 reviews
Surpass revenue peaks via analytics-empowered meetings!
Emmanuel Nwosu
Ian Williams
Zeynep Karvan
Emmanuel Nwosu and 39 others use Spiky
June
151 reviews
June is product analytics for B2B SaaS. Get auto-generated reports focused on how companies use your product, not users
Fabrizio Rinaldi
André J
James Gill
Fabrizio Rinaldi and 358 others use June
TelemetryDeck
25 reviews
TelemetryDeck is a service that helps app and web developers improve their product by supplying immediate, accurate analytics data while users use your app. And the best part: It's all anonymized so your users' data stays private!
Christian Selig
Will Bishop
Roddy Munro
Christian Selig and 45 others use TelemetryDeck