AI Cost & Grounding Audit, the teardown
The audit, performed on a live system
Know where every cent of your AI bill goes, and which part you verifiably get back. Below is the exact audit deliverable, run on our own production system instead of a slide deck.
This teardown is the cost half of the audit, run live. The grounding review, where the same apparatus that catches a fabricated figure catches a fabricated citation, is demonstrated side by side in the Model Watch teardown.
What you get, day by day
Read only instrumentation on your inference path. No behavior changes, no code risk; we trace what already runs.
Per answer cost waterfalls on your real traffic: fresh input vs cache vs output, round by round, reconciled against your provider bill.
Gap analysis ranked by dollars: caching, routing, output discipline, retries, model mix. Each gap carries its measured size and the technique that closes it.
The written teardown, a savings plan ranked by impact and effort, and a walkthrough call. Your team can execute it with or without me.
Forward this section to your cofounder; it is the scope of work. Duration 1 to 2 weeks, fixed fee, pricing on request.
Proof, performed live: all 8 published questions
These are the 8 real questions from the showcase, re run through the production system with per round instrumentation on. Every figure below is read from those traces. Each question carries a replay of the recorded run: press play, watch the answer, get the token receipt.
every question reconciled: per round cents vs the backend estimator, max delta $0.000038 across all 8 · cache served 72.6% of input tokens in these runs
511 fresh input = 0.1533c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 182 output = 0.2730c
1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 5,724 cache write = 2.1465c · 830 output = 1.2450c
527 fresh input = 0.1581c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 191 output = 0.2865c
1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 921 cache write = 0.3454c · 109 output = 0.1635c
1 fresh input = 0.0003c · 5,002 cache read = 0.1501c · 271 cache write = 0.1016c · 119 output = 0.1785c
1 fresh input = 0.0003c · 5,273 cache read = 0.1582c · 4,995 cache write = 1.8731c · 821 output = 1.2315c
545 fresh input = 0.1635c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 199 output = 0.2985c
1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 875 cache write = 0.3281c · 122 output = 0.1830c
1 fresh input = 0.0003c · 4,956 cache read = 0.1487c · 198 cache write = 0.0742c · 104 output = 0.1560c
1 fresh input = 0.0003c · 5,154 cache read = 0.1546c · 4,731 cache write = 1.7741c · 802 output = 1.2030c
419 fresh input = 0.1257c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 123 output = 0.1845c
1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 1,051 cache write = 0.3941c · 882 output = 1.3230c
399 fresh input = 0.1197c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 124 output = 0.1860c
1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 1,054 cache write = 0.3953c · 522 output = 0.7830c
499 fresh input = 0.1497c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 126 output = 0.1890c
1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 4,947 cache write = 1.8551c · 1,439 output = 2.1585c
505 fresh input = 0.1515c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 107 output = 0.1605c
1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 1,315 cache write = 0.4931c · 624 output = 0.9360c
493 fresh input = 0.1479c · 4,081 cache read = 0.1224c · 0 cache write = 0.0000c · 61 output = 0.0915c
1 fresh input = 0.0003c · 4,081 cache read = 0.1224c · 2,303 cache write = 0.8636c · 1,292 output = 1.9380c
Every figure here is read straight from the instrumented traces and reconciled against the backend estimator, the same run the showcase panel reports. The single answer dissection lives in the cost anatomy.
What we deliberately did not optimize
Programmatic tool calling
It moves tool results out of the model context. For grounded answers the model must see the raw data it cites, or quality drops.
Routing easy questions to a smaller model
We A/B tested a smaller model and it confidently misdiagnosed a disease. Answer quality is the product; we did not ship that risk.
Semantic caching of whole answers
State changes daily. Replaying a stale cached answer is how you lose a customer.
An audit that only finds savings is half an audit. Yours will also list the cuts NOT to make, with the quality evidence for each.
What this does for your company
You learn what each answer costs and which levers cut it, before paying for a sprint.
measuredThe same traces expose retries, grounding gaps and silent failures your monitoring does not bill for.
measuredTen working days from access to a ranked plan; industry teardown engagements commonly run a multiple of that.
statedThe honest ROI math
projected monthly savings
$12K to $36K
projected yearly savings
$144K to $432K
Existence proof from our own system: without prompt caching the 8 instrumented answers would cost 1.76x as much, so caching alone removes about 43 percent of that bill.measured
The floor is the guarantee, not a projection: full refund if the audit identifies less than 30 percent in verifiable savings.
At this spend, engagements like this typically pay for themselves within weeks. Ask for pricing.
Why Wyrum
- We could not find another consultancy that publishes per answer costs of its own production system, live and checkable. If you find one, send it; until then, the receipts are above.measured
- A money back guarantee carries the risk, not your budget.
- The person who built and instrumented the system above is the person who audits yours. No handoffs.
Projections on this page are labeled benchmark ranges; the only promises are the guarantees.industry benchmark range, not a promise
The guarantee
Full refund if the audit identifies less than 30 percent in verifiable savings. You either get a ranked plan worth multiples of the fee, or you do not pay.
Ask for pricingNo prices on this site. You get a number and a scope in the first reply.
