AI Cost & Grounding Audit, the teardown

The audit, performed on a live system

Know where every cent of your AI bill goes, and which part you verifiably get back. Below is the exact audit deliverable, run on our own production system instead of a slide deck.

This teardown is the cost half of the audit, run live. The grounding review, where the same apparatus that catches a fabricated figure catches a fabricated citation, is demonstrated side by side in the Model Watch teardown.

What you get, day by day

Days 1 to 2

Read only instrumentation on your inference path. No behavior changes, no code risk; we trace what already runs.

Days 3 to 5

Per answer cost waterfalls on your real traffic: fresh input vs cache vs output, round by round, reconciled against your provider bill.

Days 6 to 8

Gap analysis ranked by dollars: caching, routing, output discipline, retries, model mix. Each gap carries its measured size and the technique that closes it.

Days 9 to 10

The written teardown, a savings plan ranked by impact and effort, and a walkthrough call. Your team can execute it with or without me.

Forward this section to your cofounder; it is the scope of work. Duration 1 to 2 weeks, fixed fee, pricing on request.

Proof, performed live: all 8 published questions

These are the 8 real questions from the showcase, re run through the production system with per round instrumentation on. Every figure below is read from those traces. Each question carries a replay of the recorded run: press play, watch the answer, get the token receipt.

agrotus, instrumented runs, 2026-06-12

3.4 centsmean per answer, this run set of 8measured

every question reconciled: per round cents vs the backend estimator, max delta $0.000038 across all 8 · cache served 72.6% of input tokens in these runs

fresh input $3.00/M cache read $0.30/M cache write $3.75/M output $15.00/M

round 10.549 centstools: get_agronomic_advice, get_agronomic_advice

511 fresh input = 0.1533c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 182 output = 0.2730c

round 23.514 centsfinal answer, no tools

1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 5,724 cache write = 2.1465c · 830 output = 1.2450c

this run 4.063 centsmeasured

recorded instrumented run, 2026-06-12

round 10.567 centstools: get_agronomic_advice, get_agronomic_advice

527 fresh input = 0.1581c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 191 output = 0.2865c

round 20.632 centstools: get_agronomic_advice

1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 921 cache write = 0.3454c · 109 output = 0.1635c

round 30.430 centstools: get_agronomic_advice

1 fresh input = 0.0003c · 5,002 cache read = 0.1501c · 271 cache write = 0.1016c · 119 output = 0.1785c

round 43.263 centsfinal answer, no tools

1 fresh input = 0.0003c · 5,273 cache read = 0.1582c · 4,995 cache write = 1.8731c · 821 output = 1.2315c

this run 4.892 centsmeasured

recorded instrumented run, 2026-06-12

round 10.584 centstools: get_agronomic_advice, get_agronomic_advice

545 fresh input = 0.1635c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 199 output = 0.2985c

round 20.634 centstools: get_agronomic_advice

1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 875 cache write = 0.3281c · 122 output = 0.1830c

round 30.379 centstools: get_agronomic_advice

1 fresh input = 0.0003c · 4,956 cache read = 0.1487c · 198 cache write = 0.0742c · 104 output = 0.1560c

round 43.132 centsfinal answer, no tools

1 fresh input = 0.0003c · 5,154 cache read = 0.1546c · 4,731 cache write = 1.7741c · 802 output = 1.2030c

this run 4.730 centsmeasured

recorded instrumented run, 2026-06-12

round 10.433 centstools: get_agronomic_advice

419 fresh input = 0.1257c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 123 output = 0.1845c

round 21.840 centsfinal answer, no tools

1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 1,051 cache write = 0.3941c · 882 output = 1.3230c

this run 2.272 centsmeasured

recorded instrumented run, 2026-06-12

round 10.428 centstools: get_agronomic_advice

399 fresh input = 0.1197c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 124 output = 0.1860c

round 21.301 centsfinal answer, no tools

1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 1,054 cache write = 0.3953c · 522 output = 0.7830c

this run 1.729 centsmeasured

recorded instrumented run, 2026-06-12

round 10.461 centstools: get_agronomic_advice, get_agronomic_advice

499 fresh input = 0.1497c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 126 output = 0.1890c

round 24.136 centsfinal answer, no tools

1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 4,947 cache write = 1.8551c · 1,439 output = 2.1585c

this run 4.598 centsmeasured

recorded instrumented run, 2026-06-12

round 10.434 centstools: calculate_spray_window, get_weather_forecast

505 fresh input = 0.1515c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 107 output = 0.1605c

round 21.552 centsfinal answer, no tools

1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 1,315 cache write = 0.4931c · 624 output = 0.9360c

this run 1.986 centsmeasured

recorded instrumented run, 2026-06-12

round 10.362 centstools: get_work_logs, get_spray_records

493 fresh input = 0.1479c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 61 output = 0.0915c

round 22.924 centsfinal answer, no tools

1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 2,303 cache write = 0.8636c · 1,292 output = 1.9380c

this run 3.286 centsmeasured

recorded instrumented run, 2026-06-12

Every figure here is read straight from the instrumented traces and reconciled against the backend estimator, the same run the showcase panel reports. The single answer dissection lives in the cost anatomy.

What we deliberately did not optimize

Programmatic tool calling
It moves tool results out of the model context. For grounded answers the model must see the raw data it cites, or quality drops.
Routing easy questions to a smaller model
We A/B tested a smaller model and it confidently misdiagnosed a disease. Answer quality is the product; we did not ship that risk.
Semantic caching of whole answers
State changes daily. Replaying a stale cached answer is how you lose a customer.

An audit that only finds savings is half an audit. Yours will also list the cuts NOT to make, with the quality evidence for each.

What this does for your company

Cost

You learn what each answer costs and which levers cut it, before paying for a sprint.

measured

Reliability

The same traces expose retries, grounding gaps and silent failures your monitoring does not bill for.

measured

Speed

Ten working days from access to a ranked plan; industry teardown engagements commonly run a multiple of that.

stated

The honest ROI math

Your monthly inference spend$40K/mo

projected monthly savings

$12K to $36K

projected yearly savings

$144K to $432K

industry benchmark range, not a promisedocumented technique ranges, 30 to 90 percent

Existence proof from our own system: without prompt caching the 8 instrumented answers would cost 1.76x as much, so caching alone removes about 43 percent of that bill.measured

The floor is the guarantee, not a projection: full refund if the audit identifies less than 30 percent in verifiable savings.

At this spend, engagements like this typically pay for themselves within weeks. Ask for pricing.

Why Wyrum

We could not find another consultancy that publishes per answer costs of its own production system, live and checkable. If you find one, send it; until then, the receipts are above.measured
A money back guarantee carries the risk, not your budget.
The person who built and instrumented the system above is the person who audits yours. No handoffs.

Projections on this page are labeled benchmark ranges; the only promises are the guarantees.industry benchmark range, not a promise

The guarantee

Full refund if the audit identifies less than 30 percent in verifiable savings. You either get a ranked plan worth multiples of the fee, or you do not pay.

Ask for pricing

No prices on this site. You get a number and a scope in the first reply.