Cloudflare
Vectorize.

A vector database with metadata filtering, built for RAG pipelines on Cloudflare. Store embeddings from Workers AI, query by similarity, and retrieve context for LLM generation — all without leaving the platform.

TechnologyVector DatabaseOpen TierLast updated · Apr 2026

01 · What it is

The retrieval half of a RAG pipeline.

Vectorize is Cloudflare's managed vector database. You store vector embeddings (from Workers AI or any embedding model) with metadata, then query by similarity to find the most relevant documents for a given query. The typical flow: embed a query, search Vectorize, pass the top results to an LLM as context.

Each index has a fixed dimension (matching your embedding model) and supports metadata filtering — so you can search "similar documents in category X" or "nearest neighbours created after date Y". Results include the similarity score, metadata, and optionally the vector itself.

For a fully Cloudflare-hosted RAG stack: Workers AI generates embeddings, Vectorize stores and searches them, R2 holds the source documents, and Workers AI generates the final response with retrieved context.

02 · How it works

Create, insert, query.

Create an index and bind

Create a Vectorize index with the dimension matching your embedding model. Bind it in wrangler.toml.

# Create index (768 dims for bge-base-en)
npx wrangler vectorize create knowledge-base \
  --dimensions 768 --metric cosine

# wrangler.toml
[[vectorize]]
binding = "VECTORS"
index_name = "knowledge-base"

Insert and query

Embed text with Workers AI, upsert into Vectorize with metadata, then query by embedding similarity with optional metadata filters.

// Embed and insert
const embeddings = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
  text: ["Workers run on V8 isolates"],
});

await env.VECTORS.upsert([{
  id: "doc-1",
  values: embeddings.data[0],
  metadata: { source: "workers", type: "explainer" },
}]);

// Query with metadata filter
const queryEmb = await env.AI.run("@cf/baai/bge-base-en-v1.5", {
  text: ["how do Workers handle state?"],
});

const results = await env.VECTORS.query(queryEmb.data[0], {
  topK: 5,
  filter: { type: "explainer" },
  returnMetadata: true,
});

03 · Gotchas

Where it bites you.

Scale

Index size limits on free/pro tiers

Free tier has strict vector count limits. For large corpora (millions of vectors), you'll need a paid plan or consider Pinecone/Weaviate as alternatives.

Dimension lock

Index dimension is fixed at creation

If you change embedding models (different dimension), you need a new index and re-embed everything. Pick your model carefully upfront.

Metadata filtering

Limited filter operators

Metadata filtering supports equality and basic comparisons. Complex queries (OR, nested conditions) may need to be handled in your Worker code post-query.

04 · Decision guide

When it fits. When it doesn't.

✓ Use it when

Building a RAG pipeline on Cloudflare. Vectorize + Workers AI + R2 is a fully edge-hosted stack with no external dependencies.
Your corpus fits within the tier limits. For small-to-medium knowledge bases (thousands to low millions of vectors), Vectorize is the simplest option.
You want zero-ops vector search. No cluster management, no index tuning. Managed and integrated.

✗ Skip it when

You need tens of millions of vectors. For very large corpora, dedicated vector databases (Pinecone, Weaviate, Qdrant) offer better scale and features.
You need advanced query features. Hybrid search (vector + full-text), complex filters, or graph-based retrieval need more specialised tools.
Your embeddings come from outside Cloudflare. If your pipeline is on AWS/GCP, shipping embeddings to Cloudflare adds unnecessary complexity.

05 · Connections