Best Boilerplates for Building an AI Wrapper SaaS 2026
TL;DR
An AI wrapper SaaS has different requirements than a standard SaaS — you need streaming responses, token usage tracking, per-user rate limiting, and cost management on top of your own billing. Most general SaaS boilerplates don't ship with these. In 2026, the best options are: Shipixen (AI-native, ships with Vercel AI SDK pre-configured), T3 Stack + AI SDK (most flexible, build to your exact needs), and adapted versions of ShipFast/Makerkit (add AI layer yourself). Here's how to evaluate and set up each.
Key Takeaways
- AI wrapper requirements: streaming, token metering, rate limiting, model switching, cost passthrough
- Vercel AI SDK: the de facto standard for streaming in Next.js AI apps — use it in any boilerplate
- T3 Stack + AI SDK: most flexible foundation — build exactly what you need
- Shipixen: AI-native starter with streaming pre-built, good for rapid prototyping
- ShipFast + AI layer: large community, add
aipackage on top — fastest path if you already own ShipFast - Token metering: use Stripe Meters (usage-based billing) — no boilerplate ships this pre-built
What Makes AI Wrapper SaaS Different
A standard SaaS boilerplate gives you auth + billing + dashboard. An AI wrapper needs:
Standard SaaS:
Auth → Dashboard → Features → Billing (flat subscription)
AI Wrapper SaaS:
Auth → Dashboard → AI Features (streaming) → Usage Tracking
→ Rate Limiting (per-user)
→ Token Metering → Cost Management
→ Billing (usage-based OR credit-based)
→ Prompt Management / Versioning
The unique technical requirements:
| Requirement | Why It Matters | Typical Solution |
|---|---|---|
| Streaming responses | LLM responses are slow — stream for UX | Vercel AI SDK useChat / useCompletion |
| Token tracking | API costs scale with usage | Count tokens per request, store in DB |
| Rate limiting | Prevent abuse / cost overruns | Redis + sliding window (Upstash) |
| Model switching | GPT-4 vs Claude vs Gemini | Abstraction layer via AI SDK |
| Prompt management | Version prompts, A/B test | DB-stored prompts or separate config |
| Cost passthrough | Charge users for AI usage | Stripe Meters or credit system |
| Abort/cancel | Users stop mid-generation | AbortController in streaming handler |
The Core: Vercel AI SDK
Regardless of which boilerplate you pick, the Vercel AI SDK (ai package) is the foundation for all AI interactions:
npm install ai @ai-sdk/openai @ai-sdk/anthropic
// app/api/chat/route.ts — streaming chat endpoint (works in any boilerplate):
import { openai } from '@ai-sdk/openai';
import { anthropic } from '@ai-sdk/anthropic';
import { streamText } from 'ai';
import { auth } from '@/lib/auth';
import { checkRateLimit } from '@/lib/rate-limit';
import { trackTokenUsage } from '@/lib/usage';
export async function POST(req: Request) {
const session = await auth();
if (!session) return new Response('Unauthorized', { status: 401 });
// Rate limit: 20 requests/hour for free, 200 for pro:
const { success, remaining } = await checkRateLimit(session.user.id, session.user.plan);
if (!success) return new Response('Rate limit exceeded', { status: 429 });
const { messages, model = 'gpt-4o-mini' } = await req.json();
// Model routing:
const modelProvider = model.startsWith('claude')
? anthropic(model)
: openai(model);
const result = await streamText({
model: modelProvider,
messages,
system: 'You are a helpful assistant.',
onFinish: async ({ usage }) => {
// Track token usage after generation completes:
await trackTokenUsage({
userId: session.user.id,
model,
inputTokens: usage.promptTokens,
outputTokens: usage.completionTokens,
totalTokens: usage.totalTokens,
});
},
});
return result.toDataStreamResponse();
}
// app/chat/page.tsx — streaming UI (any boilerplate):
'use client';
import { useChat } from 'ai/react';
export default function ChatPage() {
const { messages, input, handleInputChange, handleSubmit, isLoading, stop } = useChat({
api: '/api/chat',
body: { model: 'gpt-4o-mini' },
onError: (err) => console.error('Chat error:', err),
});
return (
<div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
<div className="flex-1 overflow-y-auto space-y-4">
{messages.map((m) => (
<div key={m.id} className={m.role === 'user' ? 'text-right' : 'text-left'}>
<span className="inline-block p-3 rounded-lg bg-muted max-w-[80%]">
{m.content}
</span>
</div>
))}
</div>
<form onSubmit={handleSubmit} className="flex gap-2 mt-4">
<input
value={input}
onChange={handleInputChange}
placeholder="Type a message..."
className="flex-1 rounded border p-2"
/>
{isLoading
? <button type="button" onClick={stop}>Stop</button>
: <button type="submit">Send</button>
}
</form>
</div>
);
}
This pattern works with any boilerplate — add it to ShipFast, Makerkit, T3 Stack, etc.
Option 1: T3 Stack — Most Flexible
Best for: developers who want full control and are comfortable assembling their own AI layer.
npm create t3-app@latest my-ai-saas
# Select: Next.js, TypeScript, Prisma, tRPC, Tailwind
npm install ai @ai-sdk/openai @ai-sdk/anthropic @upstash/ratelimit @upstash/redis
Database schema for AI usage tracking:
// prisma/schema.prisma additions:
model Conversation {
id String @id @default(cuid())
userId String
user User @relation(fields: [userId], references: [id])
title String?
messages Message[]
createdAt DateTime @default(now())
updatedAt DateTime @updatedAt
}
model Message {
id String @id @default(cuid())
conversationId String
conversation Conversation @relation(fields: [conversationId], references: [id])
role String // 'user' | 'assistant' | 'system'
content String @db.Text
model String? // 'gpt-4o-mini', 'claude-3-haiku', etc.
inputTokens Int @default(0)
outputTokens Int @default(0)
createdAt DateTime @default(now())
}
model UsageSummary {
id String @id @default(cuid())
userId String
user User @relation(fields: [userId], references: [id])
month String // '2026-03'
totalTokens Int @default(0)
totalCost Float @default(0) // in USD
updatedAt DateTime @updatedAt
@@unique([userId, month])
}
Rate limiting with Upstash Redis:
// lib/rate-limit.ts
import { Ratelimit } from '@upstash/ratelimit';
import { Redis } from '@upstash/redis';
const redis = new Redis({
url: process.env.UPSTASH_REDIS_REST_URL!,
token: process.env.UPSTASH_REDIS_REST_TOKEN!,
});
const limits = {
free: new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(20, '1 h'), // 20 requests/hour
}),
pro: new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(200, '1 h'), // 200/hour
}),
enterprise: new Ratelimit({
redis,
limiter: Ratelimit.slidingWindow(2000, '1 h'),
}),
};
export async function checkRateLimit(userId: string, plan: string) {
const limiter = limits[plan as keyof typeof limits] ?? limits.free;
return limiter.limit(userId);
}
Token cost tracking:
// lib/usage.ts
const TOKEN_COSTS = {
'gpt-4o': { input: 0.000005, output: 0.000015 }, // per token
'gpt-4o-mini': { input: 0.00000015, output: 0.0000006 },
'claude-3-5-sonnet': { input: 0.000003, output: 0.000015 },
'claude-3-haiku': { input: 0.00000025, output: 0.00000125 },
'gemini-1.5-pro': { input: 0.00000125, output: 0.000005 },
} as const;
export async function trackTokenUsage({
userId, model, inputTokens, outputTokens, totalTokens,
}: {
userId: string;
model: string;
inputTokens: number;
outputTokens: number;
totalTokens: number;
}) {
const costs = TOKEN_COSTS[model as keyof typeof TOKEN_COSTS];
const cost = costs
? inputTokens * costs.input + outputTokens * costs.output
: 0;
const month = new Date().toISOString().slice(0, 7); // '2026-03'
await db.usageSummary.upsert({
where: { userId_month: { userId, month } },
update: {
totalTokens: { increment: totalTokens },
totalCost: { increment: cost },
},
create: { userId, month, totalTokens, totalCost: cost },
});
}
T3 Stack is right for AI SaaS if you need custom billing logic, multi-model support, or are building something that doesn't fit a template.
Option 2: ShipFast + AI Layer
Best for: those who already own ShipFast and want to add AI features fast.
ShipFast doesn't ship with AI pre-built, but adding the Vercel AI SDK on top takes ~2 hours:
# In your ShipFast project:
npm install ai @ai-sdk/openai
// Add to ShipFast's existing API structure:
// app/api/ai/chat/route.ts
import { openai } from '@ai-sdk/openai';
import { streamText } from 'ai';
import { getServerSession } from 'next-auth'; // or Supabase auth
import { authOptions } from '@/libs/next-auth';
export async function POST(req: Request) {
const session = await getServerSession(authOptions);
// Reuse ShipFast's existing auth check:
if (!session?.user) {
return new Response('Unauthorized', { status: 401 });
}
// Use ShipFast's plan detection:
const isPro = session.user.priceId === process.env.STRIPE_PRO_PRICE_ID;
if (!isPro) {
return new Response('Upgrade to Pro for AI features', { status: 403 });
}
const { messages } = await req.json();
const result = await streamText({
model: openai('gpt-4o-mini'),
messages,
});
return result.toDataStreamResponse();
}
ShipFast + AI path makes sense if:
- You already own ShipFast (no additional boilerplate cost)
- Your AI features are gated behind a paid plan (ShipFast's plan check is simple)
- You don't need per-token billing (flat-rate subscription is fine)
Option 3: Makerkit — Plugin-Based AI Integration
Makerkit's plugin system is well-suited for AI features:
// Makerkit plugin pattern for AI:
// packages/plugins/ai-assistant/src/api/chat.ts
import { createRouteHandlerClient } from '@supabase/auth-helpers-nextjs';
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';
export async function chatHandler(req: Request) {
const supabase = createRouteHandlerClient({ cookies });
const { data: { user } } = await supabase.auth.getUser();
if (!user) return new Response('Unauthorized', { status: 401 });
// Makerkit's org context:
const { organizationId, messages } = await req.json();
// Check org's AI quota:
const quota = await getOrganizationAIQuota(organizationId);
if (quota.used >= quota.limit) {
return new Response('AI quota exceeded for this organization', { status: 429 });
}
const result = await streamText({
model: openai('gpt-4o-mini'),
messages,
onFinish: async ({ usage }) => {
await incrementOrganizationAIUsage(organizationId, usage.totalTokens);
},
});
return result.toDataStreamResponse();
}
Makerkit shines for AI SaaS if:
- You're building B2B — organizations share an AI quota
- You want the AI feature as an add-on to a full SaaS (not AI as the core product)
- You'll use the team management and billing plugins alongside it
Credit-Based vs Usage-Based Billing
Two billing models work for AI wrappers:
// Model 1: Credit-based (buy credits, spend on AI usage)
// Simpler UX — users buy packs, credits deduct per request
// Credit purchase:
const session = await stripe.checkout.sessions.create({
mode: 'payment', // One-time payment, not subscription
line_items: [{ price: 'price_1000_credits', quantity: 1 }],
// ...
});
// Credit deduction:
await db.user.update({
where: { id: userId },
data: { credits: { decrement: tokensUsed } },
});
// Guard before each AI call:
const user = await db.user.findUnique({ where: { id: userId } });
if (user.credits <= 0) throw new Error('Insufficient credits');
// Model 2: Stripe Meters (usage-based subscription)
// More complex setup, but users pay for exactly what they use
// Record usage event:
await stripe.billing.meterEvents.create({
event_name: 'ai_tokens',
payload: {
stripe_customer_id: customerId,
value: String(tokensUsed),
},
});
// Create subscription with usage price:
await stripe.subscriptions.create({
customer: customerId,
items: [{ price: 'price_per_1k_tokens' }],
});
Which billing model to use:
| Credit-Based | Usage-Based (Stripe Meters) | |
|---|---|---|
| User experience | Predictable (buy credits, see balance) | Pay for what you use |
| Revenue predictability | Higher (bulk purchase) | Lower (variable) |
| Setup complexity | Lower | Higher |
| Best for | Consumer AI tools, indie hackers | B2B with high usage variance |
Production Checklist for AI SaaS
Before launching an AI wrapper:
[ ] Streaming — responses stream, not wait for full generation
[ ] Error handling — API timeouts, rate limit errors, model failures
[ ] Token limits — enforce per-request max tokens (prevent cost bombs)
[ ] Rate limiting — per-user hourly/daily limits
[ ] Cost monitoring — alert when daily spend exceeds threshold
[ ] Prompt injection prevention — sanitize user input
[ ] PII handling — don't log PII in prompt logs
[ ] Fallback model — if GPT-4o fails, fall back to GPT-4o-mini
[ ] Abort/cancel — users can stop a generation
[ ] Content moderation — if user-facing, run through moderation API
// Content moderation (OpenAI):
const moderation = await openai.moderations.create({
input: userMessage,
});
if (moderation.results[0].flagged) {
return new Response('Message flagged by content policy', { status: 400 });
}
Find AI-ready SaaS boilerplates at StarterPick.