PostgreSQL 18 โ the relational database underneath Frappe, Superset, half the 2nth stack, and most of the internet. ACID-compliant, extensible, open-source, and the rare database that handles OLTP and analytical workloads in the same engine.
PostgreSQL is an object-relational database that's been in active development since 1986. The "object-relational" part matters: it supports user-defined types, functions, operators, and aggregates. But the real story in 2026 isn't the core engine โ it's the extension ecosystem that sits on top of it.
Extensions turn PostgreSQL into a vector database (pgvector), a time-series database (TimescaleDB), a geospatial database (PostGIS), a graph database (Apache AGE). The Frappe Framework runs on it. Apache Superset stores its metadata in it. Cloudflare Hyperdrive proxies connections to it from the edge. It is the substrate that the rest of the data stack builds on.
The core commitments: ACID transactions, row-level security, logical replication, MVCC concurrency (readers don't block writers), full-text search, JSONB document storage with indexing, and window functions for analytics โ all built in, no add-ons required.
PostgreSQL is the default because it handles the first three years of most builds without requiring a database migration. Start on Postgres, add materialized views for dashboard queries, add pgvector for RAG, add PostGIS for geospatial. By the time you outgrow it โ if you ever do โ you'll know exactly which specialized database you need and why.
It's also the database that Frappe's ERPNext, HRMS, and CRM run on (v15+), which means a single Postgres instance can serve both your ERP and your analytics metadata store. One operational target, not two.
PostgreSQL ships with a set of features that remove entire categories of external tooling from the architecture. Each one is worth knowing before you reach for a specialist service.
| Feature | What it does |
|---|---|
JSONB | Document storage with full GIN indexing โ when you need NoSQL flexibility inside SQL structure. Supports path queries, containment checks, and array operations. |
CTEs (WITH) | Common Table Expressions for readable complex logic. Break a multi-step query into named intermediate results. Makes recursive queries tractable. |
| Window functions | Analytics without leaving the database. Running totals, rank, lag/lead, percentile โ computed over a defined partition and order without collapsing rows. |
| Materialized views | Pre-computed query results stored as a table, refreshable on demand or concurrently. The first tool to reach for when dashboard queries are slow. |
| Full-text search | tsvector/tsquery with GIN indexes. Fast enough to skip Elasticsearch for most document search use cases. |
LISTEN/NOTIFY | Pub/sub built into the database. Notify channels on row changes; listeners receive events in real time. No message queue required for simple event fans. |
| Logical replication | Stream changes from a publication to one or more subscribers. Source for CDC pipelines, read replicas, and cross-database migrations. |
| Row-level security | Policies on tables that control which rows a session can read or write, evaluated per-user based on session variables. POPIA data-subject access control without application logic. |
-- JSONB: document storage with index CREATE INDEX idx_orders_meta ON orders USING gin (metadata jsonb_path_ops); -- Containment query โ uses the index SELECT * FROM orders WHERE metadata @> '{"region": "ZA"}'; -- Path query SELECT metadata ->> 'region' AS region, count(*) FROM orders GROUP BY 1;
-- Materialized view for a dashboard โ refreshable without locking CREATE MATERIALIZED VIEW daily_sales AS SELECT date_trunc('day', created_at) AS day, sum(total) AS revenue, count(*) AS orders FROM orders WHERE status = 'completed' GROUP BY 1; -- Refresh without blocking reads REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales;
-- LISTEN/NOTIFY: real-time events without a message queue LISTEN order_updates; NOTIFY order_updates, '{"order_id": 42, "status": "shipped"}'; -- Window function: running total by day SELECT day, revenue, sum(revenue) OVER (ORDER BY day) AS running_total FROM daily_sales ORDER BY day;
PostgreSQL's extension system lets the database grow into specialized roles without replacing the core. The extensions most relevant to the 2nth stack cover vectors, geospatial, time-series, scheduling, and graph queries.
| Extension | What it adds | Where it fits |
|---|---|---|
pgvector | Vector similarity search โ cosine, dot product, L2 distance. Store embeddings, run nearest-neighbour queries. | RAG pipelines, semantic search, recommendation |
PostGIS | Geospatial geometry and geography types, spatial indexes, ST_Distance, ST_Within, routing. | Field service, logistics, mapping, delivery zones |
TimescaleDB | Hypertables that auto-partition time-series data, continuous aggregates, compression. | IoT, metrics, event streams, financial tick data |
pg_cron | Cron-scheduled SQL jobs inside the database. No external scheduler needed for routine maintenance or aggregation tasks. | Nightly rollups, cleanup jobs, data expiry |
pg_stat_statements | Query performance monitoring โ total time, calls, mean time, rows per query. Essential for tuning. | Performance investigation, index planning |
Apache AGE | Graph queries using the Cypher query language on top of relational data. No separate graph database required. | Knowledge graphs, fraud detection, org charts |
-- Install pgvector (once, per database) CREATE EXTENSION IF NOT EXISTS vector; -- Table with a vector column (1536-dim for text-embedding-3-small) CREATE TABLE documents ( id bigserial primary key, content text, embedding vector(1536) ); -- HNSW index for fast approximate nearest-neighbour search CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops); -- Semantic search: find the 5 most similar documents SELECT id, content, embedding <=> '[0.1, 0.2, ...]' AS distance FROM documents ORDER BY distance LIMIT 5;
-- TimescaleDB: hypertable for time-series data CREATE EXTENSION IF NOT EXISTS timescaledb; CREATE TABLE sensor_readings ( time timestamptz NOT NULL, sensor_id int, temperature numeric(5,2) ); -- Convert to hypertable (auto-partitions by time) SELECT create_hypertable('sensor_readings', 'time');
PostgreSQL runs everywhere. The choice between self-hosted and managed turns on operational overhead vs. control, and on where the data needs to live for compliance.
| Option | Region | Good for | Tradeoff |
|---|---|---|---|
| Cloud SQL (Google) | africa-south1 (JHB) | SA data residency, POPIA, GCP stack | Vendor lock-in, cost at scale |
| Amazon RDS | af-south-1 (Cape Town) | AWS stack, multi-AZ HA | Cape Town region only for SA |
| Supabase | Closest AWS region | Rapid dev, built-in auth + storage + REST | No SA region; EU/US data only |
| Self-hosted | Your infra | Full control, POPIA on-prem compliance | You own backups, HA, patching |
# Connection string pattern postgresql://user:password@host:5432/dbname?sslmode=require # With PgBouncer (connection pooling) in front postgresql://user:password@pgbouncer-host:6432/dbname?sslmode=require # Cloudflare Hyperdrive connection string (from a Worker) # env.HYPERDRIVE.connectionString returns a pooled string automatically
PostgreSQL's default max_connections is 100. Each connection holds memory (roughly 5-10 MB). A web application with 20 pods, each opening 10 connections, hits the limit before it gets interesting. Use PgBouncer for self-hosted deployments โ it pools connections in transaction mode and makes the limit effectively irrelevant. Cloudflare Hyperdrive does the same thing for Workers: it maintains a warm pool at the edge so each Worker invocation doesn't pay the TCP + TLS handshake cost.
# Backup strategy # Small databases (dev, staging): logical backup pg_dump -Fc -h host -U user dbname > backup_$(date +%Y%m%d).dump pg_restore -d newdb backup_20260416.dump # Production: physical backup + WAL archiving # pg_basebackup takes a binary snapshot pg_basebackup -h host -U replicator -Ft -z -Xs -P -D /backups/base # WAL archiving in postgresql.conf archive_mode = on archive_command = 'gsutil cp %p gs://my-bucket/wal/%f' # Point-in-time recovery: restore base + replay WAL to a specific timestamp
South Africa has specific infrastructure constraints and regulatory requirements that shape how PostgreSQL gets deployed and configured here.
Cloud SQL in africa-south1. Google Cloud SQL has a PostgreSQL offering in the Johannesburg region (africa-south1). This is the right default for POPIA-sensitive builds where data residency in South Africa is a requirement. Latency from most SA cities to Johannesburg is low enough that application-level connection pooling (PgBouncer or Hyperdrive) makes the round-trip negligible.
POPIA and row-level security. The Protection of Personal Information Act requires that data subjects can access and request deletion of their personal data. PostgreSQL's row-level security (RLS) makes this straightforward: define a policy that restricts row access based on a session variable set to the current data subject's ID. An agent or API can query as a specific data subject and automatically get only their rows โ no application-level filtering required.
Money: never use float. For ZAR amounts, use NUMERIC(12,2) or store as integer cents. Floating-point arithmetic introduces rounding errors that compound across aggregations. A sum of 10,000 float-stored transactions may differ from the correct value by a few cents โ which is a financial reporting error, not just a rounding quirk.
Load shedding resilience. During load shedding, database connections are dropped when the server restarts. The defence: connection pooling (PgBouncer reconnects automatically), application-level retry logic with exponential backoff, and WAL archiving to cloud storage so point-in-time recovery is available if a hard restart corrupts a checkpoint. Do not rely on pg_dump scheduled jobs alone โ WAL archiving is the safety net.
-- Row-level security for POPIA data-subject access control ALTER TABLE customer_data ENABLE ROW LEVEL SECURITY; -- Policy: each session can only see rows where customer_id matches the session variable CREATE POLICY customer_isolation ON customer_data USING (customer_id = current_setting('app.current_customer_id')::bigint); -- Set before each query in the application layer SET app.current_customer_id = '42'; SELECT * FROM customer_data; -- returns only customer 42's rows -- ZAR money: always NUMERIC, never FLOAT CREATE TABLE invoices ( id bigserial primary key, amount_zar numeric(12,2) NOT NULL, -- correct amount_bad float -- never do this );
Six failure modes worth internalising before a PostgreSQL instance goes in front of production traffic.
Use NUMERIC(12,2) or integer cents. Floating-point arithmetic accumulates rounding errors that show up in financial reports. This catches every new engineer exactly once โ usually in production.
Default max_connections=100. A 20-pod app with 5 connections each is already at the limit. Use PgBouncer in transaction mode for self-hosted, or Cloudflare Hyperdrive for Workers. Raising max_connections without pooling just moves the pain.
Autovacuum is on by default but tuned conservatively. High-write tables accumulate dead tuple bloat. Monitor with pg_stat_user_tables, check n_dead_tup, and tune autovacuum thresholds for hot tables. Table bloat hurts query performance and index scans.
SELECT * in production
Always list columns explicitly, especially with JSONB columns. SELECT * on a table with a 10 MB JSONB column deserialises and transmits all of it. In a loop, this is a disaster. Always specify the columns you actually need.
PostgreSQL does not automatically create indexes on foreign key columns. Every unindexed FK column becomes a sequential scan on the referenced table during JOINs and cascade deletes. Check pg_constraint after schema changes and index FKs manually.
The database locale controls collation โ how strings are ordered and compared. Set LC_COLLATE=en_ZA.UTF-8 at database creation for correct South African sort order. Changing locale after the fact requires a full dump-and-restore.
PostgreSQL is the right default for most builds. Here is the honest set of cases where a specialist database is actually the better choice.
PostgreSQL is not one tool in the stack โ it is the substrate that multiple other skill nodes depend on. Understanding it once unlocks the operational model for a large part of the 2nth build surface.
The compounding logic: every node that talks to a PostgreSQL instance shares the same connection model, the same auth patterns, the same query language, and the same extension surface. An agent that knows how to operate a Postgres instance knows how to read Superset's metadata, back up a Frappe site's database, and query an ERPNext report โ because they're all the same database.
The extension story matters for agents too. pgvector turns a Postgres instance into a vector store โ no separate Pinecone or Weaviate to provision. PostGIS turns it into a geospatial engine. An agent operating against a Postgres instance with these extensions can do work that used to require three separate services.
The official documentation is excellent โ one of the best in the open-source world. The wiki has the operational patterns. pgvector and Supabase docs are the fastest path to the extension ecosystem.