know.2nth.ai โ€บ Data โ€บ data โ€บ warehousing โ€บ postgresql
data/warehousing ยท PostgreSQL ยท Skill Leaf

The database that
does everything.

PostgreSQL 18 โ€” the relational database underneath Frappe, Superset, half the 2nth stack, and most of the internet. ACID-compliant, extensible, open-source, and the rare database that handles OLTP and analytical workloads in the same engine.

Live PostgreSQL License v18 ACID Extensible

Not just storage. A programmable data platform.

PostgreSQL is an object-relational database that's been in active development since 1986. The "object-relational" part matters: it supports user-defined types, functions, operators, and aggregates. But the real story in 2026 isn't the core engine โ€” it's the extension ecosystem that sits on top of it.

Extensions turn PostgreSQL into a vector database (pgvector), a time-series database (TimescaleDB), a geospatial database (PostGIS), a graph database (Apache AGE). The Frappe Framework runs on it. Apache Superset stores its metadata in it. Cloudflare Hyperdrive proxies connections to it from the edge. It is the substrate that the rest of the data stack builds on.

The core commitments: ACID transactions, row-level security, logical replication, MVCC concurrency (readers don't block writers), full-text search, JSONB document storage with indexing, and window functions for analytics โ€” all built in, no add-ons required.

Why this is the 2nth default

PostgreSQL is the default because it handles the first three years of most builds without requiring a database migration. Start on Postgres, add materialized views for dashboard queries, add pgvector for RAG, add PostGIS for geospatial. By the time you outgrow it โ€” if you ever do โ€” you'll know exactly which specialized database you need and why.

It's also the database that Frappe's ERPNext, HRMS, and CRM run on (v15+), which means a single Postgres instance can serve both your ERP and your analytics metadata store. One operational target, not two.

The capabilities that make it more than a row store.

PostgreSQL ships with a set of features that remove entire categories of external tooling from the architecture. Each one is worth knowing before you reach for a specialist service.

Feature What it does
JSONBDocument storage with full GIN indexing โ€” when you need NoSQL flexibility inside SQL structure. Supports path queries, containment checks, and array operations.
CTEs (WITH)Common Table Expressions for readable complex logic. Break a multi-step query into named intermediate results. Makes recursive queries tractable.
Window functionsAnalytics without leaving the database. Running totals, rank, lag/lead, percentile โ€” computed over a defined partition and order without collapsing rows.
Materialized viewsPre-computed query results stored as a table, refreshable on demand or concurrently. The first tool to reach for when dashboard queries are slow.
Full-text searchtsvector/tsquery with GIN indexes. Fast enough to skip Elasticsearch for most document search use cases.
LISTEN/NOTIFYPub/sub built into the database. Notify channels on row changes; listeners receive events in real time. No message queue required for simple event fans.
Logical replicationStream changes from a publication to one or more subscribers. Source for CDC pipelines, read replicas, and cross-database migrations.
Row-level securityPolicies on tables that control which rows a session can read or write, evaluated per-user based on session variables. POPIA data-subject access control without application logic.
-- JSONB: document storage with index
CREATE INDEX idx_orders_meta ON orders USING gin (metadata jsonb_path_ops);

-- Containment query โ€” uses the index
SELECT * FROM orders WHERE metadata @> '{"region": "ZA"}';

-- Path query
SELECT metadata ->> 'region' AS region, count(*) FROM orders GROUP BY 1;
-- Materialized view for a dashboard โ€” refreshable without locking
CREATE MATERIALIZED VIEW daily_sales AS
SELECT
  date_trunc('day', created_at) AS day,
  sum(total)                     AS revenue,
  count(*)                       AS orders
FROM orders
WHERE status = 'completed'
GROUP BY 1;

-- Refresh without blocking reads
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales;
-- LISTEN/NOTIFY: real-time events without a message queue
LISTEN order_updates;
NOTIFY order_updates, '{"order_id": 42, "status": "shipped"}';

-- Window function: running total by day
SELECT
  day,
  revenue,
  sum(revenue) OVER (ORDER BY day) AS running_total
FROM daily_sales
ORDER BY day;

The killer feature is what it becomes.

PostgreSQL's extension system lets the database grow into specialized roles without replacing the core. The extensions most relevant to the 2nth stack cover vectors, geospatial, time-series, scheduling, and graph queries.

Extension What it adds Where it fits
pgvectorVector similarity search โ€” cosine, dot product, L2 distance. Store embeddings, run nearest-neighbour queries.RAG pipelines, semantic search, recommendation
PostGISGeospatial geometry and geography types, spatial indexes, ST_Distance, ST_Within, routing.Field service, logistics, mapping, delivery zones
TimescaleDBHypertables that auto-partition time-series data, continuous aggregates, compression.IoT, metrics, event streams, financial tick data
pg_cronCron-scheduled SQL jobs inside the database. No external scheduler needed for routine maintenance or aggregation tasks.Nightly rollups, cleanup jobs, data expiry
pg_stat_statementsQuery performance monitoring โ€” total time, calls, mean time, rows per query. Essential for tuning.Performance investigation, index planning
Apache AGEGraph queries using the Cypher query language on top of relational data. No separate graph database required.Knowledge graphs, fraud detection, org charts
-- Install pgvector (once, per database)
CREATE EXTENSION IF NOT EXISTS vector;

-- Table with a vector column (1536-dim for text-embedding-3-small)
CREATE TABLE documents (
  id        bigserial  primary key,
  content   text,
  embedding vector(1536)
);

-- HNSW index for fast approximate nearest-neighbour search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

-- Semantic search: find the 5 most similar documents
SELECT id, content, embedding <=> '[0.1, 0.2, ...]' AS distance
FROM documents
ORDER BY distance
LIMIT 5;
-- TimescaleDB: hypertable for time-series data
CREATE EXTENSION IF NOT EXISTS timescaledb;

CREATE TABLE sensor_readings (
  time        timestamptz  NOT NULL,
  sensor_id   int,
  temperature numeric(5,2)
);

-- Convert to hypertable (auto-partitions by time)
SELECT create_hypertable('sensor_readings', 'time');

Where to run it and how to connect.

PostgreSQL runs everywhere. The choice between self-hosted and managed turns on operational overhead vs. control, and on where the data needs to live for compliance.

Option Region Good for Tradeoff
Cloud SQL (Google)africa-south1 (JHB)SA data residency, POPIA, GCP stackVendor lock-in, cost at scale
Amazon RDSaf-south-1 (Cape Town)AWS stack, multi-AZ HACape Town region only for SA
SupabaseClosest AWS regionRapid dev, built-in auth + storage + RESTNo SA region; EU/US data only
Self-hostedYour infraFull control, POPIA on-prem complianceYou own backups, HA, patching
# Connection string pattern
postgresql://user:password@host:5432/dbname?sslmode=require

# With PgBouncer (connection pooling) in front
postgresql://user:password@pgbouncer-host:6432/dbname?sslmode=require

# Cloudflare Hyperdrive connection string (from a Worker)
# env.HYPERDRIVE.connectionString returns a pooled string automatically

Connection pooling is not optional in production

PostgreSQL's default max_connections is 100. Each connection holds memory (roughly 5-10 MB). A web application with 20 pods, each opening 10 connections, hits the limit before it gets interesting. Use PgBouncer for self-hosted deployments โ€” it pools connections in transaction mode and makes the limit effectively irrelevant. Cloudflare Hyperdrive does the same thing for Workers: it maintains a warm pool at the edge so each Worker invocation doesn't pay the TCP + TLS handshake cost.

# Backup strategy

# Small databases (dev, staging): logical backup
pg_dump -Fc -h host -U user dbname > backup_$(date +%Y%m%d).dump
pg_restore -d newdb backup_20260416.dump

# Production: physical backup + WAL archiving
# pg_basebackup takes a binary snapshot
pg_basebackup -h host -U replicator -Ft -z -Xs -P -D /backups/base

# WAL archiving in postgresql.conf
archive_mode = on
archive_command = 'gsutil cp %p gs://my-bucket/wal/%f'

# Point-in-time recovery: restore base + replay WAL to a specific timestamp

Running PostgreSQL in a South African context.

South Africa has specific infrastructure constraints and regulatory requirements that shape how PostgreSQL gets deployed and configured here.

Cloud SQL in africa-south1. Google Cloud SQL has a PostgreSQL offering in the Johannesburg region (africa-south1). This is the right default for POPIA-sensitive builds where data residency in South Africa is a requirement. Latency from most SA cities to Johannesburg is low enough that application-level connection pooling (PgBouncer or Hyperdrive) makes the round-trip negligible.

POPIA and row-level security. The Protection of Personal Information Act requires that data subjects can access and request deletion of their personal data. PostgreSQL's row-level security (RLS) makes this straightforward: define a policy that restricts row access based on a session variable set to the current data subject's ID. An agent or API can query as a specific data subject and automatically get only their rows โ€” no application-level filtering required.

Money: never use float. For ZAR amounts, use NUMERIC(12,2) or store as integer cents. Floating-point arithmetic introduces rounding errors that compound across aggregations. A sum of 10,000 float-stored transactions may differ from the correct value by a few cents โ€” which is a financial reporting error, not just a rounding quirk.

Load shedding resilience. During load shedding, database connections are dropped when the server restarts. The defence: connection pooling (PgBouncer reconnects automatically), application-level retry logic with exponential backoff, and WAL archiving to cloud storage so point-in-time recovery is available if a hard restart corrupts a checkpoint. Do not rely on pg_dump scheduled jobs alone โ€” WAL archiving is the safety net.

-- Row-level security for POPIA data-subject access control
ALTER TABLE customer_data ENABLE ROW LEVEL SECURITY;

-- Policy: each session can only see rows where customer_id matches the session variable
CREATE POLICY customer_isolation ON customer_data
  USING (customer_id = current_setting('app.current_customer_id')::bigint);

-- Set before each query in the application layer
SET app.current_customer_id = '42';
SELECT * FROM customer_data; -- returns only customer 42's rows

-- ZAR money: always NUMERIC, never FLOAT
CREATE TABLE invoices (
  id         bigserial primary key,
  amount_zar numeric(12,2) NOT NULL,  -- correct
  amount_bad float                       -- never do this
);

Things that only bite once you ship.

Six failure modes worth internalising before a PostgreSQL instance goes in front of production traffic.

Never use float for money

Use NUMERIC(12,2) or integer cents. Floating-point arithmetic accumulates rounding errors that show up in financial reports. This catches every new engineer exactly once โ€” usually in production.

Connection limits hit fast

Default max_connections=100. A 20-pod app with 5 connections each is already at the limit. Use PgBouncer in transaction mode for self-hosted, or Cloudflare Hyperdrive for Workers. Raising max_connections without pooling just moves the pain.

VACUUM doesn't run itself well enough

Autovacuum is on by default but tuned conservatively. High-write tables accumulate dead tuple bloat. Monitor with pg_stat_user_tables, check n_dead_tup, and tune autovacuum thresholds for hot tables. Table bloat hurts query performance and index scans.

SELECT * in production

Always list columns explicitly, especially with JSONB columns. SELECT * on a table with a 10 MB JSONB column deserialises and transmits all of it. In a loop, this is a disaster. Always specify the columns you actually need.

Missing indexes on foreign keys

PostgreSQL does not automatically create indexes on foreign key columns. Every unindexed FK column becomes a sequential scan on the referenced table during JOINs and cascade deletes. Check pg_constraint after schema changes and index FKs manually.

Locale matters for sort order

The database locale controls collation โ€” how strings are ordered and compared. Set LC_COLLATE=en_ZA.UTF-8 at database creation for correct South African sort order. Changing locale after the fact requires a full dump-and-restore.

Use it for, skip it for.

PostgreSQL is the right default for most builds. Here is the honest set of cases where a specialist database is actually the better choice.

Use PostgreSQL when

  • You need ACID transactions โ€” financial records, inventory, order management.
  • You want extensions to grow into specialized roles: pgvector for RAG, PostGIS for geospatial, TimescaleDB for time-series.
  • The team knows SQL โ€” Postgres rewards SQL fluency with enormous capability.
  • Regulatory compliance (POPIA) requires self-hosted data residency in South Africa.
  • You're running Frappe/ERPNext โ€” it's the native database for v15+.
  • You want one database that serves both operational (OLTP) and light analytical (OLAP) workloads.

How this leaf compounds in the tree.

PostgreSQL is not one tool in the stack โ€” it is the substrate that multiple other skill nodes depend on. Understanding it once unlocks the operational model for a large part of the 2nth build surface.

The compounding logic: every node that talks to a PostgreSQL instance shares the same connection model, the same auth patterns, the same query language, and the same extension surface. An agent that knows how to operate a Postgres instance knows how to read Superset's metadata, back up a Frappe site's database, and query an ERPNext report โ€” because they're all the same database.

The extension story matters for agents too. pgvector turns a Postgres instance into a vector store โ€” no separate Pinecone or Weaviate to provision. PostGIS turns it into a geospatial engine. An agent operating against a Postgres instance with these extensions can do work that used to require three separate services.

tech/frappe/framework
Frappe Framework
Frappe v15 runs on PostgreSQL natively. Every Frappe doctype is a Postgres table. The Frappe leaf assumes this one as the backing store.
data/analytics/superset
Apache Superset
Superset stores its own metadata (dashboards, charts, datasets, users) in PostgreSQL. It also queries Postgres directly as a datasource โ€” the same instance can play both roles.
tech/cloudflare/hyperdrive
Cloudflare Hyperdrive
Hyperdrive pools and caches Postgres connections from Cloudflare Workers. When D1 (SQLite-at-edge) isn't enough, Hyperdrive brings a real Postgres instance to the edge with pooling built in.
tech/cloudflare/d1
Cloudflare D1
D1 is SQLite at the edge โ€” the right tool when you need a lightweight embedded database in a Worker. When you need the real thing (ACID, extensions, large datasets), the path is D1 โ†’ Hyperdrive โ†’ Postgres.
data/warehousing
Warehousing sub-hub
The parent sub-hub. PostgreSQL covers the OLTP-to-light-OLAP spectrum; the other nodes in the hub fill the shapes Postgres doesn't cover.
data/engineering
Data Engineering
dbt, Airbyte, and the transformation layer all sit on top of the warehouse. Postgres is the most common target for these tools in the 2nth stack.

Go deeper.

The official documentation is excellent โ€” one of the best in the open-source world. The wiki has the operational patterns. pgvector and Supabase docs are the fastest path to the extension ecosystem.