Data Lineage Definition
Data Lineage is a record of where a piece of data originated, how it has been transformed, and where it has moved over time, from its source system through every process that touched it, to its current state.
What does data lineage track?
For a product record, lineage might show that a product title came from a supplier spreadsheet, was edited manually by a content team, then pushed to three sales channels. It captures:
- Origin — which system, file, or supplier the data came from
- Transformations — any cleaning, reformatting, or applied along the way
- Movement — which systems received or consumed the data and when
- Ownership — who made changes and at what point in the process
Why does it matter?
When a product description is wrong on a marketplace, data lineage tells you exactly where the error was introduced: the supplier feed, the import mapping, or a manual edit. Without it, tracing a data problem means checking every system by hand.
It also supports compliance: regulations like the EU's Digital Product Passport increasingly require businesses to demonstrate where product data comes from and that it is accurate.
How is it different from an audit trail?
An audit trail records who changed what and when inside a single system. Data lineage is broader: it follows data across systems, from origin to destination, and includes automated transformations that no individual person triggered. The two are complementary: audit trails feed into a complete lineage picture.
Who uses data lineage?
- Data and IT teams use it to debug integration errors and map system dependencies
- Compliance and legal teams use it to demonstrate data provenance to regulators
- PIM and MDM administrators use it to trace where inaccurate product data entered the pipeline and fix it at the source.