Analytics engineering attempts to bridge the gap between data engineering and data analytics by introducing good software engineering practices.
ETL vs ELT (extract transform load vs extract load transform)
Kimball’s Dimensional Modeling
Contain descriptive attributes that give context to the facts
Examples include Customer, Product, Time, Location
Typically denormalized for simpler queries and better performance
-- normalized design
-- Product table
ProductID, ProductName, CategoryID
-- Category table
CategoryID, CategoryName, DepartmentID
-- Department table
DepartmentID, DepartmentName
~~~~
-- denormalized design
-- Product Dimension
ProductID,
ProductName,
CategoryName, -- Pulled from Category table
DepartmentName, -- Pulled from Department table
Brand,
Size,
Color,
... other attributes
Much smaller than fact tables but wider (more columns)
Allows users to work with SQL to deploy analytics code following software engineering best practices like modularity, DRY, portability, CI/CD, and documentation.
How does it work?