Your usage

Your model's prices — example values, set yours
$
$
%
$

What it costs

Cost / request
Cost / 1k requests
Input cost / mo
Output cost / mo
Total / month
Caching saves / mo
Annual API cost

This is the API line item only. A production feature also carries the cost of the edge/server proxy, retrieval or vector storage for grounding, logging and evals, and engineering time — but the token bill is the part that scales directly with usage, and the one this models.

From estimate to production

I ship production AI features end-to-end — keys server-side, grounded, streamed, and cached.

If the number above is workable, the next question is building it right: a server-side proxy, retrieval grounding so it doesn't hallucinate, streaming, and prompt caching to keep that bill down. That's the AI Automations engagement.

Book a Free Consultation →

See the AI Automations service → · Read the production chat-widget guide →

How LLM API cost is calculated

LLM APIs bill per token — almost always quoted per million tokens — and they price the two halves of a call separately:

  • Input tokens — everything you send: the system prompt, any retrieved grounding context, the conversation history, and the user's message. Cost is (requests × avg input tokens ÷ 1,000,000) × input price.
  • Output tokens — what the model generates back. Output is typically priced several times higher than input, so response length matters more than people expect. Cost is (requests × avg output tokens ÷ 1,000,000) × output price.

Add the two, multiply by twelve, and you have the annual figure most teams never put on a slide before they ship.

Why prompt caching is the biggest lever

Most production features send a large, near-identical prefix on every single request — a detailed system prompt plus the grounding context that keeps answers accurate. Prompt caching reuses that stable prefix instead of re-processing it each time, billing those cached input tokens at a steep discount. For a grounded assistant with a big fixed prompt, that can take a serious bite out of the input half of the bill. Set a cache hit rate and your cached-input price above to model it for your own feature.

The other lever is output length: because output is the expensive side, tightening responses (and not letting the model ramble) often saves more than any input optimisation. The build details behind both — server-side keys, grounding, streaming, and caching — are in the production chat-widget guide.

Frequently asked questions

How do you estimate the cost of an LLM API feature?
Tokens are billed per million, with input (your prompt + context) and output (the reply) priced separately. Monthly cost = (requests × avg input tokens ÷ 1M × input price) + (requests × avg output tokens ÷ 1M × output price); prompt caching discounts the repeated input. This tool runs that from your numbers.
What drives LLM API cost the most?
Request volume and tokens per request. Output is priced several times higher than input, so long responses cost more than long prompts — and a large fixed input (big system prompt + grounding context on every call) is the silent driver that prompt caching exists to discount.
How much does prompt caching save?
It reuses a stable prefix (system prompt, instructions, grounding) at a large discount instead of re-billing it each request. For a feature with a big fixed prefix, the input portion of the bill can drop substantially. Set a cache hit rate and cached-input price above to model your own savings.
Where do I find my model's per-token price?
Providers like Anthropic, OpenAI, and Google publish per-million-token input/output prices on their pricing pages, with cached and batch tiers listed separately. The price fields here are editable example values — replace them with your model's actual rates.
Does this send my data anywhere?
No — every calculation runs in your browser. Nothing you type is sent to a server or stored.