Stop Apologizing for Your Token Bill

Let's kill a bad idea before it spreads: that rising token spend is a problem.

It isn't. It's the point.

Spending tokens means your AI is actually doing work. Real work. Coding, analyzing, running long workflows, finishing tasks that used to sit in someone's queue for a week. More tokens is more leverage. It's your team working smarter and faster. If your token graph is going up and to the right because people are getting more done, that's not a leak. That's the flywheel.

Nvidia CEO Jensen Huang put it bluntly on the All-In Podcast: "If that $500,000 engineer did not consume at least $250,000 worth of tokens, I'm going to be deeply alarmed." His point: an elite engineer who spends almost nothing on AI is like a chip designer who insists on paper and pencil instead of CAD tools. Read that again. The alarm isn't the spend. The alarm is the silence. Low token usage on expensive talent isn't thrift, it's a performance problem hiding in your budget. The "cut your AI bill" reflex starts to look like optimizing exactly the wrong number.

So why is everyone suddenly nervous?

Because Uber just lit their AI coding budget on fire and burned through 2026's allocation in four months. One company reportedly torched $500 million almost by accident. And quietly, every AI leader is doing the same math: we got hooked on consumption, and the providers know it. The fear isn't today's bill. It's the day prices climb and we're already addicted.

Fair fear. Wrong target.

What's actually torching your money

Your AI isn't learning from any of it. Every prompt, every correction, every "no, that's not what revenue means here" gets thrown away the second the session ends. So tomorrow it asks again. It rebuilds the same context from scratch. It burns tokens reconstructing knowledge it should have known cold by now.

That's the waste. Not the consumption. The amnesia.

Imagine an analyst who is brilliant on day one and shows up every morning with total memory loss. You'd re-explain the business, every day, and pay full price for it. That's most enterprise AI right now. Smart. Expensive. Forgetful.

The fix isn't spending less. It's spending once.

Capture the context, the definitions, the tribal knowledge, the corrections, and let the system actually keep them. (This is the whole argument behind shared context, and why it's the real moat.) Then every future prompt starts smarter instead of starting over. Your token yield, the useful output per token, goes up because you stopped paying to re-teach the same lesson on a loop.

Funny enough, Glean's own CEO, Arvind Jain, recently argued a version of this himself: that your token spend is an architecture problem, not a model problem, and that context quality is the real lever. He's right. He's just selling the wrong half of the solution.

Because this is why "enterprise search" was always a half-measure. Glean and friends got very good at finding your documents. But finding is not knowing.

Try it. Ask enterprise search "what was Q2 net revenue?" and watch what comes back: the board deck, last year's 10-Q, a finance Slack thread, and three spreadsheets named final_v2, final_FINAL, and final_actually_final. Four numbers, zero answers. It found the documents. Now you go figure out which one counts, whether it's net of refunds, and why EMEA books a quarter late. Again.

Ask a system that actually learns the same question, and it already knows. It knows your team excludes trials and internal accounts. It knows "fail" means a settlement break, not a failed login. It knows which of those four spreadsheets your CFO trusts, because someone corrected it once, and it kept that correction. Search makes you re-derive the truth every single time. Context remembers it.

That's the difference. Retrieval hands you links. It does not accumulate understanding.

Spend the tokens. Spend them proudly.

Just stop paying twice for context your AI should already own.

Mike Walsh

Stop Apologizing for Your Token Bill

So why is everyone suddenly nervous?

What's actually torching your money

The fix isn't spending less. It's spending once.

See PromptQL in action on your data.