What will your AI feature cost to run?
Before you ship a Claude- or GPT-powered feature, model the bill at scale. Enter your request volume, token sizes, and your model's per-million-token prices — the calculator shows the monthly and annual API cost, the cost per request, and how much prompt caching saves. Everything runs in your browser; nothing is sent anywhere.
Your usage
What it costs
This is the API line item only. A production feature also carries the cost of the edge/server proxy, retrieval or vector storage for grounding, logging and evals, and engineering time — but the token bill is the part that scales directly with usage, and the one this models.
I ship production AI features end-to-end — keys server-side, grounded, streamed, and cached.
If the number above is workable, the next question is building it right: a server-side proxy, retrieval grounding so it doesn't hallucinate, streaming, and prompt caching to keep that bill down. That's the AI Automations engagement.
Book a Free Consultation →See the AI Automations service → · Read the production chat-widget guide →
How LLM API cost is calculated
LLM APIs bill per token — almost always quoted per million tokens — and they price the two halves of a call separately:
- Input tokens — everything you send: the system prompt, any retrieved grounding context, the conversation history, and the user's message. Cost is (requests × avg input tokens ÷ 1,000,000) × input price.
- Output tokens — what the model generates back. Output is typically priced several times higher than input, so response length matters more than people expect. Cost is (requests × avg output tokens ÷ 1,000,000) × output price.
Add the two, multiply by twelve, and you have the annual figure most teams never put on a slide before they ship.
Why prompt caching is the biggest lever
Most production features send a large, near-identical prefix on every single request — a detailed system prompt plus the grounding context that keeps answers accurate. Prompt caching reuses that stable prefix instead of re-processing it each time, billing those cached input tokens at a steep discount. For a grounded assistant with a big fixed prompt, that can take a serious bite out of the input half of the bill. Set a cache hit rate and your cached-input price above to model it for your own feature.
The other lever is output length: because output is the expensive side, tightening responses (and not letting the model ramble) often saves more than any input optimisation. The build details behind both — server-side keys, grounding, streaming, and caching — are in the production chat-widget guide.
Frequently asked questions
- How do you estimate the cost of an LLM API feature?
- Tokens are billed per million, with input (your prompt + context) and output (the reply) priced separately. Monthly cost = (requests × avg input tokens ÷ 1M × input price) + (requests × avg output tokens ÷ 1M × output price); prompt caching discounts the repeated input. This tool runs that from your numbers.
- What drives LLM API cost the most?
- Request volume and tokens per request. Output is priced several times higher than input, so long responses cost more than long prompts — and a large fixed input (big system prompt + grounding context on every call) is the silent driver that prompt caching exists to discount.
- How much does prompt caching save?
- It reuses a stable prefix (system prompt, instructions, grounding) at a large discount instead of re-billing it each request. For a feature with a big fixed prefix, the input portion of the bill can drop substantially. Set a cache hit rate and cached-input price above to model your own savings.
- Where do I find my model's per-token price?
- Providers like Anthropic, OpenAI, and Google publish per-million-token input/output prices on their pricing pages, with cached and batch tiers listed separately. The price fields here are editable example values — replace them with your model's actual rates.
- Does this send my data anywhere?
- No — every calculation runs in your browser. Nothing you type is sent to a server or stored.