Quick Verdict
If you ship on Next.js and use the Vercel AI SDK, Vercel AI Gateway is the lowest-friction option in 2026 — single endpoint, no extra key, automatic fallback, and unified billing on your Vercel invoice. OpenRouter wins for breadth of models and pay-as-you-go credit, LiteLLM wins for self-hosted control, and Portkey wins when you need real governance, guardrails, and multi-team observability.
Pick one routing layer early. Plumbing a second one in after you've shipped costs more than getting the first one wrong.
Key Takeaways
- AI gateways are now standard in any SaaS that calls more than one model — they handle fallbacks, retries, caching, and per-customer cost attribution.
- All four normalize on an OpenAI-compatible API, so swapping is mostly a base URL change.
- Vercel AI Gateway has the tightest Next.js boilerplate fit; OpenRouter has the widest model catalog; LiteLLM is the OSS escape hatch; Portkey is the governance layer.
- Cost markup ranges from 0% (LiteLLM self-hosted) to ~5% (OpenRouter and Portkey paid plans) to at-cost (Vercel AI Gateway, billed at the underlying provider price).
Decision Table
| You are... | Pick |
|---|---|
| Indie founder shipping a Next.js MVP on Vercel | Vercel AI Gateway |
| Building a multi-model product (Claude + GPT + Llama + Mistral) on a budget | OpenRouter |
| Cost-sensitive, want to self-host on your own infra | LiteLLM |
| Multi-team B2B with PII, audit logs, prompt versioning | Portkey |
| EU/regulated data, need self-hosted prompt store | LiteLLM + Langfuse |
What an AI Gateway Actually Buys You
Before comparing, it helps to be specific about what these layers add on top of calling provider SDKs directly:
- One endpoint, many models. A normalized OpenAI-compatible API across Anthropic, Google, OpenAI, Mistral, Cohere, Groq, Fireworks, and dozens of OSS providers.
- Fallbacks and retries. When OpenAI is degraded (which still happens), automatically route to Anthropic or Bedrock without redeploying.
- Caching. Hash-based response caching to slash repeat-prompt cost. Useful for product surfaces like classification, embeddings, and structured output.
- Per-tenant attribution. Tag requests with
customer_idso your usage-based billing engine can split cost back to the right account. - Guardrails and PII redaction. Strip emails, credit cards, secrets before they hit a third-party model.
- Observability. Logs, traces, latency, and cost per route. The thing your boilerplate's analytics doesn't capture.
If your product is a single OpenAI call, you don't need this. As soon as you add a second model, retry policy, or per-customer billing, you do.
Vercel AI Gateway
Pricing: Models billed at provider rates; Vercel takes no markup on tokens. Included in any Vercel plan.
Fit: SaaS boilerplates already on Next.js + Vercel — ShipFast, Makerkit, Supastarter, and most premium starters in our SaaS boilerplates ranking.
What you get:
- Single AI Gateway URL, no per-provider key — Vercel injects them.
- Native integration with the Vercel AI SDK (
aipackage):import { streamText } from 'ai'works unchanged. - Built-in fallback ordering across providers.
- Cost and latency observability per project on the Vercel dashboard.
- Token spend rolls into your Vercel bill — finance team friendly.
import { streamText } from 'ai';
import { gateway } from '@ai-sdk/gateway';
const result = await streamText({
model: gateway('anthropic/claude-sonnet-4-6'),
messages,
});
Where it bites: No deep prompt management UI, no multi-team RBAC, and you're locked to Vercel's chosen providers. Limited self-host story. If you leave Vercel, you leave the gateway.
OpenRouter
Pricing: Pass-through provider price + ~5% routing fee on most models. No subscription. Free credits at signup; works as a personal account before you add a corporate card.
Fit: Multi-model AI products. Side projects. Anything that benefits from a long-tail model catalog (Llama 3, Mixtral, DeepSeek, Qwen) without managing N provider accounts.
What you get:
- 200+ models from one API key.
- Automatic provider failover — OpenRouter spreads requests across upstream hosts (Together, Fireworks, Lepton, Anyscale) for the same OSS model.
- Transparent per-request pricing in the dashboard.
- BYO key option to avoid the 5% markup on providers where you already have a contract.
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://openrouter.ai/api/v1',
apiKey: process.env.OPENROUTER_API_KEY,
});
const completion = await client.chat.completions.create({
model: 'meta-llama/llama-3.3-70b-instruct',
messages,
});
Where it bites: Limited team features — no SSO on cheap tiers, weak audit log, no PII filtering. Caching is basic. Not the best pick for B2B with compliance asks.
LiteLLM
Pricing: Free, OSS (MIT). LiteLLM Cloud (managed) starts in the $99/mo range; most teams self-host the proxy.
Fit: Backend-heavy boilerplates (FastAPI, Django, Hono, Encore.ts) where you already operate Postgres and Redis. Teams with budget pressure. EU/regulated workloads that must stay in-cluster.
What you get:
- A Python proxy server speaking the OpenAI API.
- Routes to 100+ providers including local models (Ollama, vLLM, LM Studio).
- Spend tracking per virtual key (great for per-customer attribution).
- Pluggable Redis cache, retry/fallback policy, rate limits per key.
- Helm chart and Docker image for K8s deployment.
docker run -p 4000:4000 \
-e OPENAI_API_KEY=sk-... \
-e ANTHROPIC_API_KEY=sk-ant-... \
ghcr.io/berriai/litellm:main-stable \
--config /app/config.yaml
Where it bites: You operate it. Upgrades, scaling, observability dashboard — your problem. Pair with Langfuse or Helicone for traces.
Portkey
Pricing: Free up to 10k requests/mo. Paid plans add SSO, custom guardrails, prompt management, and dedicated support; team plans typically land in the low-three-figure monthly range.
Fit: B2B SaaS with multiple internal AI features, multiple teams, and a security/compliance reviewer who will ask hard questions about prompt logging.
What you get:
- Gateway + prompt registry + observability + guardrails — the most "platform" of the four.
- Versioned prompts with rollback, environment promotion (dev → staging → prod).
- Built-in PII redaction, content moderation, and topic guardrails.
- RBAC, virtual keys per team, SOC 2 reports.
- Caching with semantic similarity, not just exact-match hashing.
Where it bites: Highest implementation surface area; overkill for a 1-person SaaS. Lock-in to Portkey's prompt registry if you adopt it heavily.
Cost Reality Check
Routing fees only matter at volume. A rough back-of-envelope for a SaaS doing $5k/mo in model spend:
| Layer | Markup | Monthly cost on $5k spend | Notes |
|---|---|---|---|
| Direct provider SDK | 0% | $5,000 | No fallback, no observability |
| Vercel AI Gateway | 0% on tokens | $5,000 | Plus Vercel plan you already pay |
| OpenRouter | ~5% | ~$5,250 | BYO key brings markup to near-zero on listed providers |
| LiteLLM (self-host) | 0% | $5,000 + ~$30 infra | Plus your engineering time |
| Portkey | ~5% paid plans | ~$5,250 + plan fee | Includes governance you'd otherwise build |
The five-percent markup is rarely the deciding factor. Engineering time to build fallbacks, dashboards, and guardrails costs more than the gateway fee for most teams.
What This Replaces in a Boilerplate
Most SaaS boilerplates that ship "AI features" wire one provider directly into a server action. That works for week one. By month three you typically need:
- Fallback when the provider is degraded.
- Per-customer cost attribution for usage-based billing — see how to add usage-based billing with Stripe meters.
- Cache to stop paying twice for the same prompt.
- Logs to debug a customer's bad output.
A gateway gives you 1–4 in one dependency. The build-from-scratch alternative is a homegrown router on top of fetch plus a few hundred lines of glue code that you will keep tweaking forever.
Choosing in 60 Seconds
- Already on Vercel, single team, want the lowest setup time → Vercel AI Gateway.
- Want the widest model catalog and don't care about deep governance → OpenRouter.
- Need it inside your VPC or want zero markup → LiteLLM.
- Multi-team, regulated, prompt versioning matters → Portkey.
For most paid SaaS boilerplate buyers in 2026, the order of consideration is: Vercel AI Gateway → OpenRouter → Portkey → LiteLLM. Start with the simplest layer your stack supports; graduate when concrete pain (cost attribution, guardrails, multi-team prompts) shows up.
FAQ
Can I use two gateways at once? Yes — some teams put OpenRouter behind LiteLLM as one upstream route. It works, but it's another moving piece. Don't do it before you need to.
Do these break streaming? No. All four pass server-sent events through correctly. Vercel AI SDK and OpenAI SDK clients work without code changes.
What about embedding models? All four route embeddings. LiteLLM and OpenRouter are the most permissive on OSS embedding providers.
Will my AI SaaS boilerplate work with these? Any boilerplate using the Vercel AI SDK or raw OpenAI SDK swaps over with a base URL change. Boilerplates wired directly to provider SDKs (Anthropic SDK, Google GenAI SDK) need a small adapter.
Compare more options in the best AI SaaS boilerplates ranking.
If you're still picking the underlying SaaS foundation, start with the best Next.js SaaS boilerplates of 2026.