WyrumWyrum
back to the showcase
agrotus, instrumented live run

The anatomy of a 3 cent answer

Summarise all the field operations logged on my farm this season. List the spraying and fertiliser applications per field with their dates, and tell me which field has had the most activity.

0.0 centsexactly 3.286 centsmeasured
claude-sonnet-4-62026-06-12one instrumented live run on the production system, warm cache

reconciliation: per round sum $0.032862 vs the backend estimator $0.0329, delta $0.000038

The waterfall, round by round

fresh input $3.00/M cache read $0.30/M cache write $3.75/M output $15.00/M
Where the cents wentcomputed
fresh input 4.5%cache read 7.5%cache write 26.3%output 61.8%

Output tokens are the biggest slice of this answer. The bars below show tokens per round; tap a round for the cent arithmetic.

493 fresh input tokens x $3.00/M = 0.1479 cents

4,081 cache read tokens x $0.30/M = 0.1224 cents

0 cache write tokens x $3.75/M = 0.0000 cents

61 output tokens x $15.00/M = 0.0915 cents

tools called: get_work_logs, get_spray_records (2,963 characters of results returned to the model)

stop reason: tool_use

1 fresh input tokens x $3.00/M = 0.0003 cents

4,081 cache read tokens x $0.30/M = 0.1224 cents

2,303 cache write tokens x $3.75/M = 0.8636 cents

1,292 output tokens x $15.00/M = 1.9380 cents

no tools called; the model wrote the final answer

stop reason: end_turn

Inside the cached prefix

Both rounds re-read the same 4,081 token prefix from cache. Counted block by block with the free count_tokens endpoint, the same content the backend sends:

tool definitions0 tokensmeasured

JSON schemas for the 12 tools a keyword router selected for this question, out of 45 registered.

base system prompt0 tokensmeasured

The agronomy assistant rules: grounding requirements, tool discipline, answer format.

live farm context0 tokensmeasured

A per request snapshot of this farm pulled from the database, so answers start grounded.

The punchline: at $0.30/M a cached prefix token is 10x cheaper than fresh input at $3.00/M. In this run the prefix was served from cache in both rounds and never re-written, and cache reads covered 74.5% of all input tokens.

honesty note: this reconstruction counts 4,396 tokens vs 4,081 measured in the run, a delta of 315 tokens. The context is rebuilt post hoc from live data, so small drift is expected and shown, not hidden.

What this answer would cost without each optimization

without prompt caching5.3 cents, 1.62xcomputed

Every cached token repriced at the fresh input rate, same rounds, same output.

without keyword tool routing+0.277 centscomputed

all 45 tool definitions in every round instead of the 12 the router selected, priced at warm cache read rates; a cold cache write would add more

without database groundingno number inventedstated

not computable from this run; an ungrounded model is a different product, so no number is invented

What we deliberately did not optimize

  • Programmatic tool calling

    It moves tool results out of the model context. For agronomy answers the model must see the raw data it cites, or grounding quality drops.

  • Routing easy questions to a smaller model

    We A/B tested a smaller model on the vision path and it confidently misdiagnosed a disease. Answer quality is the product; we did not ship that risk.

  • Semantic caching of whole answers

    Farm state changes daily. Replaying a cached answer about spray windows after the weather turns is how you lose a harvest, and a customer.

We stop optimizing where quality starts paying.

This is the teardown I do on client systems.

Per round token traces, cached prefix dissection, counterfactuals at published rates, and an honest list of what not to touch. If your AI feature has a bill you cannot explain, I will explain it.

Get this teardown for your system

Cache write pricing: $3.75/M for 5 minute TTL entries, $6.00/M for 1 hour TTL entries. Every cache write in this run was a 5 minute entry, so the flat $3.75/M the estimator uses is exact here: maximum understatement $0.00.

This page dissects the instrumented run of 2026-06-12, which cost 3.286 cents. The showcase panel reports this same run for the question, rounded to 3.3 cents. Above is the full round by round dissection behind that figure.