Data Diff is an open-source package that can be run in a CLI or wrapped into any data orchestrator such as Airflow, Dagster, etc. Compare datasets quickly (seconds/minutes) at a large (millions/billions of rows) scale across different databases.
We're excited to announce the launch of new open-source product – data-diff that makes comparing datasets across databases fast at any scale. data-diff automates data quality checks for data replication and migration. In modern data platforms, data is constantly moving between systems, and at the modern data volume and complexity, systems go out of sync all the time. Until now, there has not been any tooling to ensure that when the data is correctly copied.
data-diff