A vector database with metadata filtering, built for RAG pipelines on Cloudflare. Store embeddings from Workers AI, query by similarity, and retrieve context for LLM generation — all without leaving the platform.
Vectorize is Cloudflare's managed vector database. You store vector embeddings (from Workers AI or any embedding model) with metadata, then query by similarity to find the most relevant documents for a given query. The typical flow: embed a query, search Vectorize, pass the top results to an LLM as context.
Each index has a fixed dimension (matching your embedding model) and supports metadata filtering — so you can search "similar documents in category X" or "nearest neighbours created after date Y". Results include the similarity score, metadata, and optionally the vector itself.
For a fully Cloudflare-hosted RAG stack: Workers AI generates embeddings, Vectorize stores and searches them, R2 holds the source documents, and Workers AI generates the final response with retrieved context.
Create a Vectorize index with the dimension matching your embedding model. Bind it in wrangler.toml.
# Create index (768 dims for bge-base-en) npx wrangler vectorize create knowledge-base \ --dimensions 768 --metric cosine # wrangler.toml [[vectorize]] binding = "VECTORS" index_name = "knowledge-base"
Embed text with Workers AI, upsert into Vectorize with metadata, then query by embedding similarity with optional metadata filters.
// Embed and insert const embeddings = await env.AI.run("@cf/baai/bge-base-en-v1.5", { text: ["Workers run on V8 isolates"], }); await env.VECTORS.upsert([{ id: "doc-1", values: embeddings.data[0], metadata: { source: "workers", type: "explainer" }, }]); // Query with metadata filter const queryEmb = await env.AI.run("@cf/baai/bge-base-en-v1.5", { text: ["how do Workers handle state?"], }); const results = await env.VECTORS.query(queryEmb.data[0], { topK: 5, filter: { type: "explainer" }, returnMetadata: true, });
Free tier has strict vector count limits. For large corpora (millions of vectors), you'll need a paid plan or consider Pinecone/Weaviate as alternatives.
If you change embedding models (different dimension), you need a new index and re-embed everything. Pick your model carefully upfront.
Metadata filtering supports equality and basic comparisons. Complex queries (OR, nested conditions) may need to be handled in your Worker code post-query.