Topics

  1. OLAP vs OLTP
  2. what is a data warehouse
  3. BigQuery
    1. cost
    2. partitions and clustering
    3. best practices
    4. internals
    5. ML in BQ

OLTP vs OLAP

OLTP - online transaction processing; OLAP - online analytical processing

image.png

image.png

Data Warehouses - what is it?

It is an OLAP solution used for reporting and analysis. It contains 3 types of data: Meta data, Raw data and Summary data.

Partitions

when we create a dataset, we have columns. Assuming most of our queries are based on a specific column (e.g. creation_date), partitioning can improve performance by “partitioning” the set by the column of interest (e.g. splitting into multiple datasets based on specific dates like march 1, march 2, march3….).

How to partition

# create a partition from a table (external_yellow_tripdata)

CREATE OR REPLACE TABLE taxi-rides-ny.nytaxi.yellow_tripdata_partitioned
PARTITION BY
	DATE(tpep_pickup_datetime) AS
SELECT * FROM taxi-rides-ny.nytaxi.external_yellow_tripdata;

image.png

note - the icons are diff between partitioned and non-partitioned tables