Best Boilerplates with RAG (Retrieval-Augmented Generation) Built-In 2026

Q: What RAG Requires?

A production RAG system has four components: The simplest stack: OpenAI embeddings + pgvector (in Supabase/Neon) + Vercel AI SDK for generation.

RAG Is the Architecture Behind Every Useful AI Product

Retrieval-Augmented Generation (RAG) is the technique that makes AI products actually useful: instead of relying on an LLM's training data, you retrieve relevant context from your own data sources and inject it into the prompt.

A document Q&A chatbot? RAG. A customer support bot trained on your docs? RAG. A knowledge base with semantic search? RAG.

In 2026, RAG has moved from research technique to production standard. The boilerplates that include it out of the box — or make it easy to add — give you a meaningful head start.

TL;DR

Best boilerplates for RAG in 2026:

Vercel AI SDK + pgvector — The most common stack. Supabase or Neon provides pgvector. Vercel AI SDK handles embedding and retrieval.
OpenSaaS + RAG pattern — Add RAG to OpenSaaS's Wasp foundation. The most complete free base.
Makerkit + AI Plugin — Enterprise-grade SaaS boilerplate with AI plugin including RAG patterns.
LangChain.js starter templates — More complex orchestration for multi-step RAG pipelines.
Custom: Next.js + Supabase + pgvector — Roll your own with well-documented patterns.

What RAG Requires

A production RAG system has four components:

Component	Purpose	Common Tools
Embedding model	Convert text to vectors	OpenAI text-embedding-3, Anthropic, Cohere
Vector store	Store and search vectors	pgvector, Pinecone, Weaviate, Qdrant
Retrieval	Find relevant chunks	Cosine similarity, hybrid search
Generation	LLM uses retrieved context	OpenAI, Anthropic, Gemini

The simplest stack: OpenAI embeddings + pgvector (in Supabase/Neon) + Vercel AI SDK for generation.

Stack Options

pgvector (PostgreSQL)

The simplest approach: add the pgvector extension to your existing PostgreSQL database. Available in Supabase and Neon with zero additional infrastructure.

-- Enable pgvector in Supabase/Neon:
CREATE EXTENSION IF NOT EXISTS vector;

-- Store document chunks with embeddings:
CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  metadata JSONB,
  embedding VECTOR(1536)  -- OpenAI text-embedding-3-small dimension
);

-- Semantic search function:
CREATE OR REPLACE FUNCTION match_documents(
  query_embedding VECTOR(1536),
  match_count INT DEFAULT 5
)
RETURNS TABLE(id BIGINT, content TEXT, metadata JSONB, similarity FLOAT)
LANGUAGE SQL STABLE AS $$
  SELECT id, content, metadata,
    1 - (embedding <=> query_embedding) AS similarity
  FROM documents
  WHERE 1 - (embedding <=> query_embedding) > 0.5
  ORDER BY embedding <=> query_embedding
  LIMIT match_count;
$$;

Dedicated Vector Databases

For large-scale RAG with millions of vectors:

Database	Free tier	Best for
Pinecone	Yes (Starter)	Simplest API, managed
Weaviate	Yes (self-hosted)	Hybrid search, multi-modal
Qdrant	Yes (cloud)	Performance, self-hosted
pgvector	Yes (via Supabase/Neon)	Simplest infra (same DB)

The RAG Implementation Pattern

Step 1: Ingest Documents

// lib/ingest.ts
import { openai } from '@ai-sdk/openai';
import { embed } from 'ai';
import { supabase } from '@/lib/supabase';

// Split document into chunks:
function chunkText(text: string, chunkSize = 500, overlap = 50): string[] {
  const chunks: string[] = [];
  for (let i = 0; i < text.length; i += chunkSize - overlap) {
    chunks.push(text.slice(i, i + chunkSize));
  }
  return chunks;
}

export async function ingestDocument(text: string, metadata: object) {
  const chunks = chunkText(text);

  for (const chunk of chunks) {
    const { embedding } = await embed({
      model: openai.embedding('text-embedding-3-small'),
      value: chunk,
    });

    await supabase.from('documents').insert({
      content: chunk,
      metadata,
      embedding,
    });
  }
}

Step 2: Retrieve Relevant Chunks

// lib/retrieve.ts
import { embed } from 'ai';
import { openai } from '@ai-sdk/openai';
import { supabase } from '@/lib/supabase';

export async function retrieveContext(query: string, topK = 5) {
  const { embedding } = await embed({
    model: openai.embedding('text-embedding-3-small'),
    value: query,
  });

  const { data: documents } = await supabase.rpc('match_documents', {
    query_embedding: embedding,
    match_count: topK,
  });

  return documents?.map(d => d.content).join('\n\n') ?? '';
}

Step 3: Generate with Context

// app/api/chat/route.ts
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { retrieveContext } from '@/lib/retrieve';

export async function POST(req: Request) {
  const { messages } = await req.json();
  const userQuery = messages[messages.length - 1].content;

  const context = await retrieveContext(userQuery);

  const result = await streamText({
    model: openai('gpt-4o'),
    system: `You are a helpful assistant. Use the following context to answer the user's question:

${context}

If the context doesn't contain relevant information, say so.`,
    messages,
  });

  return result.toDataStreamResponse();
}

Boilerplate Evaluations

Vercel AI SDK + pgvector (Recommended Free Stack)

The Vercel AI SDK's embed function handles embedding generation. Supabase provides pgvector. Together they form the simplest RAG stack for Next.js:

# Enable pgvector in Supabase:
# Dashboard → SQL Editor → Run: CREATE EXTENSION vector;

# Install deps:
npm install ai @ai-sdk/openai @supabase/supabase-js

No dedicated boilerplate exists for this — but the Supabase RAG quickstart and Vercel AI SDK docs together provide a complete guide.

OpenSaaS + RAG

OpenSaaS provides the SaaS foundation (auth, billing, admin). Add pgvector via Supabase (which OpenSaaS supports) for the RAG layer.

The combination gives you a complete AI SaaS with RAG capabilities without paying for a commercial boilerplate.

Makerkit AI Plugin

Makerkit's paid plugin marketplace includes an AI template with document Q&A patterns. If you are already using Makerkit ($299), the AI plugin extends it with:

Document upload and processing
Embedding generation
Semantic search over uploaded documents
Chat interface with document context

LangChain.js Starters

For complex RAG pipelines — multiple sources, re-ranking, query transformation — LangChain.js provides orchestration:

import { ChatOpenAI } from '@langchain/openai';
import { OpenAIEmbeddings } from '@langchain/openai';
import { SupabaseVectorStore } from '@langchain/community/vectorstores/supabase';

const embeddings = new OpenAIEmbeddings();
const vectorStore = await SupabaseVectorStore.fromExistingIndex(
  embeddings,
  { client: supabaseClient, tableName: 'documents' }
);

const retriever = vectorStore.asRetriever({ k: 5 });

LangChain adds complexity but enables advanced RAG patterns like:

Query transformation (HyDE, multi-query)
Re-ranking (Cohere rerank)
Multi-document summarization
Hybrid search (dense + sparse)

Recommended Implementation by Use Case

Use Case	Stack
Document Q&A	Next.js + Supabase pgvector + Vercel AI SDK
Knowledge base	Next.js + Supabase pgvector + pgfts (hybrid)
Multi-source RAG	LangChain.js + Pinecone
Product search	pgvector with hybrid (vector + full-text)
Customer support bot	OpenSaaS + pgvector

Performance Considerations

Chunk size matters. 500-1000 tokens per chunk is typical. Smaller chunks improve precision; larger chunks improve recall.
Overlap prevents gaps. 50-100 token overlap between chunks ensures sentences at boundaries are captured.
Hybrid search beats pure vector search. Combining pgvector similarity with PostgreSQL full-text search improves results significantly.
Reranking improves quality. After retrieval, using a reranker (Cohere, or Colbert) reorders results for better LLM context.

Methodology

Based on publicly available information from Vercel AI SDK documentation, Supabase RAG guides, LangChain.js documentation, and community resources as of March 2026.

Building an RAG application? StarterPick helps you find the right SaaS foundation to build on top of.

Comments