The storage and modeling layer underneath every dashboard, every ML pipeline, every report. PostgreSQL is the default; ClickHouse, DuckDB, BigQuery, and Snowflake fill specific shapes.
Five databases that cover different points on the OLTP-to-OLAP spectrum. PostgreSQL is live and is the default for new builds; the rest land as content ships.
The default relational database. ACID, extensions (PostGIS, pgvector, TimescaleDB), JSON, full-text search. The warehouse Frappe, Superset, and most 2nth builds run on.
Column-oriented OLAP for sub-second analytics on billions of rows. The engine of choice when PostgreSQL with materialized views isn't fast enough for the query shape.
In-process analytical SQL. The SQLite of OLAP โ no server, just a library. Runs inside Python scripts, dbt, notebooks, and Cloudflare Workers (WASM build).
Google's serverless warehouse. Pay-per-query, petabyte-scale, zero infrastructure to manage. The warehouse of choice when the team wants SQL and nothing else.
Multi-cloud warehouse with separation of storage and compute. Scale compute independently of storage; pay for what you run. The enterprise multi-cloud answer.
Every database in this sub-hub sits somewhere on the OLTP-to-OLAP spectrum. Understanding where each one lives makes the choice obvious for most workloads.
OLTP (Online Transaction Processing) means many small, fast reads and writes โ an order being created, a user logging in, a row being updated. Postgres is native OLTP: row-oriented storage, ACID transactions, B-tree indexes, connection pooling. It's built for the operational workload.
OLAP (Online Analytical Processing) means fewer but larger queries โ "sum revenue by region for the last 90 days across 500 million rows." Column-oriented storage (ClickHouse, BigQuery, Snowflake) is dramatically faster for this shape: it reads only the columns the query touches, compresses tightly, and parallelises across many cores.
The interesting territory is the middle. PostgreSQL covers it surprisingly well with materialized views, partial indexes, and TimescaleDB hypertables. Most teams never need to leave Postgres. The others become relevant when specific shapes emerge: sub-second OLAP at massive scale (ClickHouse), zero-infra serverless (BigQuery), in-process analysis without a server (DuckDB), multi-cloud storage/compute separation (Snowflake).
# The spectrum OLTP Mixed OLAP โ โ โ PostgreSQL โโโโ PostgreSQL + โโโโโโโโ ClickHouse extensions BigQuery (pgvector, Snowflake TimescaleDB, DuckDB mat. views)
The practical guide: start on PostgreSQL. Add materialized views for dashboard queries. If you're still slow after that with real data volumes, look at ClickHouse. If you need zero infrastructure, look at BigQuery. If you need in-process analysis in a script or notebook, reach for DuckDB.
| Database | Best for | Not for | Infrastructure |
|---|---|---|---|
| PostgreSQL | General purpose โ OLTP + light OLAP with extensions | Sub-second queries on 100B+ rows | Self-hosted / Cloud SQL / RDS / Supabase |
| ClickHouse | High-cardinality analytics on huge event streams | Transactional writes, small datasets | Self-hosted / ClickHouse Cloud |
| DuckDB | In-process analysis, local files (Parquet, CSV), notebooks | Concurrent multi-user writes | Embedded โ no server |
| BigQuery | Petabyte-scale serverless SQL, GCP ecosystem | Low-latency transactional queries, cost-sensitive small data | Fully managed โ Google Cloud |
| Snowflake | Multi-cloud enterprise warehouse, independent scaling | Cost-sensitive workloads, simple OLTP | Fully managed โ AWS / Azure / GCP |
Warehousing is the substrate that analytics, ML pipelines, and agent memory all sit on. It connects upward to the data hub, laterally to analytics tools, and outward to edge infrastructure.