1. What drives the cost: tokens
AI doesn't bill "per question". Everything is measured in tokens — word fragments the model reads (your prompt, the documents you feed it) and produces (its response). The longer the exchange, the more it consumes, and the more it costs. Pasting an 80-page PDF "costs" more than a one-line question.
The second lever is your choice of model. A powerful model (think Opus) costs more per token than a fast one (think Haiku). That's the whole point of picking the right tool for the job — no need for a sledgehammer to hang a picture.
2. The ways to pay
Four main models, from simplest to most professional. Prices move fast: think in orders of magnitude, not fixed figures.
3. Subscription or API: which one
This is the real practical question, and the answer depends on what you're doing:
You're chatting
- Fixed monthly flat rate
- Predictable bill, zero setup
- Ideal for everyday individual use
You're automating
- Pay per token consumed
- For connecting tools, running agents
- More powerful, needs guardrails (caps)
4. Keeping the bill under control
The API can feel scary ("what if it spirals?"), but it's very easy to keep in check:
- Set spending caps. Most API platforms let you set a hard monthly limit. Do it from day one.
- Pick the right model. A fast model for volume and simple tasks; the powerful model only when the difficulty warrants it.
- Constrain your agents. Budget, frequency, stopping criteria — otherwise an agent can loop and burn tokens for nothing (I cover this in the AI Agent guide).
For context: my daily HiFi monitoring runs on the API and costs next to nothing per month — because it's scoped (limited sources, once a day, right-sized model). Cost only becomes a problem when it's left unconstrained.