The anatomy of a 3 cent answer
Summarise all the field operations logged on my farm this season. List the spraying and fertiliser applications per field with their dates, and tell me which field has had the most activity.
reconciliation: per round sum $0.032862 vs the backend estimator $0.0329, delta $0.000038
The waterfall, round by round
Output tokens are the biggest slice of this answer. The bars below show tokens per round; tap a round for the cent arithmetic.
493 fresh input tokens x $3.00/M = 0.1479 cents
4,081 cache read tokens x $0.30/M = 0.1224 cents
0 cache write tokens x $3.75/M = 0.0000 cents
61 output tokens x $15.00/M = 0.0915 cents
tools called: get_work_logs, get_spray_records (2,963 characters of results returned to the model)
stop reason: tool_use
1 fresh input tokens x $3.00/M = 0.0003 cents
4,081 cache read tokens x $0.30/M = 0.1224 cents
2,303 cache write tokens x $3.75/M = 0.8636 cents
1,292 output tokens x $15.00/M = 1.9380 cents
no tools called; the model wrote the final answer
stop reason: end_turn
Inside the cached prefix
Both rounds re-read the same 4,081 token prefix from cache. Counted block by block with the free count_tokens endpoint, the same content the backend sends:
JSON schemas for the 12 tools a keyword router selected for this question, out of 45 registered.
The agronomy assistant rules: grounding requirements, tool discipline, answer format.
A per request snapshot of this farm pulled from the database, so answers start grounded.
The punchline: at $0.30/M a cached prefix token is 10x cheaper than fresh input at $3.00/M. In this run the prefix was served from cache in both rounds and never re-written, and cache reads covered 74.5% of all input tokens.
honesty note: this reconstruction counts 4,396 tokens vs 4,081 measured in the run, a delta of 315 tokens. The context is rebuilt post hoc from live data, so small drift is expected and shown, not hidden.
What this answer would cost without each optimization
Every cached token repriced at the fresh input rate, same rounds, same output.
all 45 tool definitions in every round instead of the 12 the router selected, priced at warm cache read rates; a cold cache write would add more
not computable from this run; an ungrounded model is a different product, so no number is invented
What we deliberately did not optimize
Programmatic tool calling
It moves tool results out of the model context. For agronomy answers the model must see the raw data it cites, or grounding quality drops.
Routing easy questions to a smaller model
We A/B tested a smaller model on the vision path and it confidently misdiagnosed a disease. Answer quality is the product; we did not ship that risk.
Semantic caching of whole answers
Farm state changes daily. Replaying a cached answer about spray windows after the weather turns is how you lose a harvest, and a customer.
We stop optimizing where quality starts paying.
This is the teardown I do on client systems.
Per round token traces, cached prefix dissection, counterfactuals at published rates, and an honest list of what not to touch. If your AI feature has a bill you cannot explain, I will explain it.
Get this teardown for your systemCache write pricing: $3.75/M for 5 minute TTL entries, $6.00/M for 1 hour TTL entries. Every cache write in this run was a 5 minute entry, so the flat $3.75/M the estimator uses is exact here: maximum understatement $0.00.
This page dissects the instrumented run of 2026-06-12, which cost 3.286 cents. The showcase panel reports this same run for the question, rounded to 3.3 cents. Above is the full round by round dissection behind that figure.
