Best AI/LLM App Boilerplates in 2026

AI Products Are the Default New SaaS

In 2026, building a SaaS without AI features feels like launching without mobile support in 2015. The infrastructure matured: OpenAI, Anthropic, and Google offer reliable APIs; Vercel AI SDK standardized streaming; vector databases (Pinecone, pgvector) are commodity. The question isn't whether to add AI — it's how to build the infrastructure correctly from day one.

Quick Comparison

Starter	Price	LLM Providers	RAG	Streaming	Auth	Billing
AI SaaS Starter	$199	OpenAI + Anthropic	✅	✅	✅	Stripe
Vercel AI Chatbot	Free	Multi-provider	❌	✅	NextAuth	❌
Open SaaS AI	Free	OpenAI	❌	✅	Full	Stripe
LangChain template	Free	Multi-provider	✅	✅	❌	❌

The Starters

AI SaaS Starter — Best Complete AI SaaS

Price: $199 (one-time) | Creator: Various vendors

Purpose-built AI SaaS boilerplates include OpenAI/Anthropic integration, streaming responses, conversation history, usage metering (tokens per user), credit system, vector database for RAG, and Stripe billing tied to AI usage.

Key AI features to look for:

Multi-model support — Switch between GPT-4, Claude, Gemini without refactoring
Streaming — Character-by-character output, not wait-then-dump
Token metering — Track and limit per-user API usage
RAG pipeline — Retrieve-augment-generate with user's documents
Conversation history — Persistent threads per user
Rate limiting — Prevent API cost blowouts

Vercel AI Chatbot — Best Free Chat UI

Price: Free | Creator: Vercel

The reference implementation for AI chat in Next.js. Multi-provider (OpenAI, Anthropic, Google, Mistral), streaming via Vercel AI SDK, conversation history in Vercel KV, and NextAuth authentication. No billing — but the cleanest AI chat UI pattern.

import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: openai('gpt-4o'),
    messages,
    system: 'You are a helpful assistant.',
  });

  return result.toDataStreamResponse();  // Streams to client
}

Choose if: You need a clean AI chat starting point without billing.

Vercel AI SDK Patterns

The standard toolkit for AI apps in Next.js:

// Multi-step AI with tools
const result = streamText({
  model: anthropic('claude-3-5-sonnet-20241022'),
  tools: {
    searchWeb: tool({
      description: 'Search the web for current information',
      parameters: z.object({ query: z.string() }),
      execute: async ({ query }) => {
        return await webSearch(query);
      },
    }),
  },
  messages,
});

// Structured output generation
const { object } = await generateObject({
  model: openai('gpt-4o'),
  schema: z.object({
    title: z.string(),
    summary: z.string(),
    tags: z.array(z.string()),
  }),
  prompt: 'Summarize this article: ' + article,
});

AI SaaS Billing Patterns

Two standard billing models for AI products:

Credit System

// User buys credits; each AI call deducts credits
const COSTS = {
  'gpt-4o': { input: 0.000005, output: 0.000015 },  // per token
  'claude-3-5-sonnet': { input: 0.000003, output: 0.000015 },
};

async function chargeForAI(userId: string, model: string, usage: TokenUsage) {
  const cost = COSTS[model].input * usage.promptTokens +
               COSTS[model].output * usage.completionTokens;
  const credits = Math.ceil(cost * 1000);  // $0.001 = 1 credit
  await deductCredits(userId, credits);
}

Subscription with Limits

// Monthly subscription includes N tokens; overages billed
const PLANS = {
  starter: { tokensPerMonth: 1_000_000, price: 19 },
  pro: { tokensPerMonth: 10_000_000, price: 49 },
};

RAG Architecture for AI SaaS

User uploads document
  → Chunk into 512-token segments
  → Generate embeddings (text-embedding-3-small)
  → Store in pgvector/Pinecone

User asks question
  → Embed question
  → Find top-5 similar chunks (cosine similarity)
  → Inject chunks into LLM prompt
  → Stream response

Good AI boilerplates set this pipeline up. Most SaaS starters don't include it — you'll add it later.

Choosing LLM Providers in 2026

The provider landscape has stabilized: OpenAI (GPT-4o, o3), Anthropic (Claude 3.5/3.7), and Google (Gemini 1.5/2.0) are the three tier-1 providers for production applications. The Vercel AI SDK abstracts all three behind a common interface, making provider selection an operational decision rather than an architectural one.

OpenAI remains the default for new projects due to the widest model availability and most mature ecosystem of fine-tuning, assistants, and batch APIs. GPT-4o's multimodal capability (vision, audio input) makes it the default for applications with non-text inputs.

Anthropic Claude is preferred for long-context tasks (200k token context window), instruction-following fidelity, and use cases where response quality over speed matters — document analysis, code review, complex reasoning chains. Claude's Artifacts feature (rendering UI components in responses) is unique to the Anthropic API.

Google Gemini is cost-competitive at high volume and has the longest context window (1M tokens for Gemini 1.5 Pro). The best choice for processing long documents or large codebases in a single API call.

Provider switching strategy: Using the Vercel AI SDK, switching providers requires changing one import line and one model string. Build your boilerplate to abstract the provider at the configuration layer — a MODEL_PROVIDER=anthropic environment variable that drives which SDK adapter loads. This prevents vendor lock-in and allows you to route different features to different providers based on cost/quality trade-offs.

Streaming Architecture

The Vercel AI SDK's streaming primitives are the foundation for any AI chat product. Two patterns cover most use cases:

Text streaming (chat): streamText sends tokens character by character. The client receives a ReadableStream and updates the UI incrementally. React's useChat hook handles this automatically with optimistic updates and error recovery.

Object streaming (structured output): streamObject generates JSON that conforms to a Zod schema, streaming the object as it's generated. Use this for AI features that produce structured data — summaries, classifications, form field suggestions, or any feature where you need reliable JSON output rather than prose.

For production chat interfaces, the full AI SDK stack is: streamText on the server, useChat on the client, with streaming displayed via React's Suspense or incremental rendering. This delivers perceived performance significantly better than generating the full response and then displaying it.

Cost Management and Rate Limiting

AI API costs are the primary operational concern for AI SaaS products. Two common blowouts: a single user running an expensive loop (1,000 API calls in a session), and a single request that generates unexpectedly long output.

Per-user token budgets: Track tokens consumed per user per billing period. Redis with daily/monthly counters works well. Check the budget before each API call and reject with a user-friendly error when the limit is hit. Refund unused token budget at the end of each period if using the credit model.

Request timeout and token limits: Set maxTokens on every API call. A request that generates 10,000 tokens when you expected 500 will either fail (context window exceeded) or incur significant charges. Explicit maxTokens prevents this at the API level. For streaming, implement a server-side timeout that cancels the stream if token generation takes more than N seconds.

Prompt caching: Anthropic and OpenAI both support prompt caching — if the same system prompt is sent in multiple requests, the provider caches it and charges at a reduced rate for subsequent requests. For applications with a long, stable system prompt (RAG context, persona definitions, document prefixes), prompt caching reduces costs by 60-80% on the cached portion.

Vercel AI SDK Tool Calling

Tool calling (function calling) is the pattern that transforms AI chat from a text generation feature into an agentic product. With the Vercel AI SDK's tools parameter, you define functions the model can invoke, and the SDK handles the model-decides-to-call-tool → execute-function → return-result cycle:

The most common tool patterns for SaaS products: searchDocuments for RAG (retrieve relevant chunks when a user asks a question), getSubscriptionStatus (let the AI answer billing questions by calling your own API), and createCalendarEvent or similar business actions the user can invoke through natural language. The SDK normalizes tool calling across all supported providers — a tool definition written for GPT-4o works identically with Claude 3.5 Sonnet.

Multi-step agent flows (the model calls a tool, uses the result to call another tool) are handled by maxSteps in streamText. Setting maxSteps: 5 allows up to 5 tool invocations per user turn before the model must respond with text. This enables simple task completion workflows — "find the document, summarize it, draft a reply" — without building a separate agent framework.

When to Use a Paid vs Free AI Starter

The free starters (Vercel AI Chatbot, Open SaaS) cover the technical foundation well. The question is whether to build your business logic on top of them or start with a paid AI SaaS starter.

Use free starters when: Your AI features are additive to a traditional SaaS product (a dashboard with an AI assistant). The AI feature can be added to any existing boilerplate as a module — you don't need the entire codebase rebuilt around AI patterns.

Consider paid AI starters when: AI is the core product, not a feature. Products where token metering, conversation threading, document processing pipelines, and credit billing need to be deeply integrated with the user model from day one. The paid starters that target AI SaaS specifically (AI SaaS Starter Kit, HextaUI AI) pre-integrate these concerns; adding them later to a general-purpose boilerplate requires significant architectural work.

Key Takeaways

Vercel AI Chatbot is the best free starting point — multi-provider, streaming, conversation history, and NextAuth authentication
Open SaaS (from the Wasp team) is the best free full-SaaS starter with OpenAI + Stripe + auth already integrated
Use the Vercel AI SDK for all new AI features — it standardizes streaming, tool calling, and structured output across OpenAI, Anthropic, and Google
Implement per-user token budgets from day one; AI cost blowouts happen in the first week of production traffic
RAG architecture (embed → vector search → inject into prompt) is table stakes for AI products with user data — most generic boilerplates don't include it
Prompt caching reduces costs by 60-80% for applications with stable, long system prompts — enable it in both OpenAI and Anthropic clients
Tool calling is the feature that enables agentic AI products — use maxSteps in the Vercel AI SDK to support multi-step reasoning without a separate agent framework

How to Evaluate AI/LLM Boilerplates

AI boilerplates vary more than standard SaaS boilerplates because the AI integration requirements vary so much by product type. Before committing, verify these specifics:

RAG pipeline completeness. Many starters claim RAG support but implement only the vector search query, not the ingestion pipeline. A complete RAG implementation includes document upload handling, text extraction (PDF, DOCX, plain text), chunking strategy (fixed-size vs semantic), embedding generation, vector storage, and the retrieval query. Ask whether the starter includes all six steps or just the last two.

Streaming error recovery. LLM APIs return errors mid-stream (context window exceeded, content policy violation, provider outage). Test what happens when an error occurs after streaming has started. The correct behavior: stop the stream, update the UI with an error state, don't leave the user staring at a frozen loading indicator. Most demo-quality starters don't handle this case.

Token usage reporting accuracy. Vercel AI SDK provides usage.promptTokens and usage.completionTokens in the onFinish callback. Starters that estimate token usage before the request (based on input message length) will be inaccurate. Only post-completion token counts from the API response are reliable. Check where the starter tracks token usage in the request lifecycle.

Provider fallback pattern. The most resilient AI SaaS products implement provider fallback: if OpenAI is rate-limited or down, route the request to Anthropic. The Vercel AI SDK's fallback parameter in streamText handles this, but most boilerplates don't pre-configure it. Verify whether the starter includes provider fallback or whether you'll need to add it.

What AI LLM Boilerplates Have in Common

The AI boilerplate landscape, despite surface variety, converges on the same foundational decisions:

The Vercel AI SDK (ai package) is the universal LLM client. There is no compelling alternative for Next.js AI SaaS in 2026 — every serious AI boilerplate uses it. The SDK's provider abstraction, streaming primitives, structured output generation, and tool calling are the features that matter, and they work consistently across OpenAI, Anthropic, Google, Groq, and Mistral.

PostgreSQL with pgvector is the default vector database for new projects. Dedicated vector databases (Pinecone, Weaviate, Qdrant) were the standard two years ago. In 2026, pgvector's performance is sufficient for most production AI SaaS applications (hundreds of thousands of vectors), and using PostgreSQL for both relational data and vector search simplifies operations significantly.

The credit billing model is more common than flat subscription billing for AI SaaS. Credit packs purchased via Stripe one-time payments, with credits deducted per API call, align user cost with actual LLM API cost. Flat subscription billing requires careful token budget management to avoid the plan being consumed by heavy users who are effectively subsidized by light users.

Rate limiting is not optional. AI APIs are expensive. A user running a loop or a script against your API can generate hundreds of dollars in API costs in minutes. Redis-based rate limiting with per-user daily limits, combined with per-request token limits via maxTokens, is the minimum viable protection.

For purpose-built AI starters designed around these patterns from day one, see best AI SaaS boilerplates for shipping fast and top AI SaaS boilerplates with built-in AI. For the open-source options that include AI integration without a license fee, the free open-source SaaS boilerplates guide covers Open SaaS and other free starters.

Compare AI SaaS boilerplates in the StarterPick directory.

See our guide to open-source SaaS boilerplates — Open SaaS is the most feature-complete free AI SaaS starter.

Review Next.js SaaS boilerplates — most AI SaaS products are built on Next.js App Router.

The SaaS Boilerplate Matrix (Free PDF)