PromptQL Logo
14 Apr, 2026

3 MIN READ

Same Question. Different Intelligence.

There's a “tokenmaxxing” fever sweeping through AI teams right now. Engineers flexing on leaderboards about how many tokens they can burn. Execs are starting to catch on, throwing more compute at a problem doesn't magically make the answers better.

But there's an opposite trap that's just as bad. Teams obsessing over per-token cost, always reaching for the cheapest model, thinking they're being smart about AI spend.

Both get it wrong.

What actually matters is: did you get the right answer, and what did it cost you to get there?

I ran an experiment using PromptQL that shows exactly what I mean. Same question. Two different models. The "expensive" one won and it wasn't even close.

The Setup

I used PromptQL to ask the same question on my energy commodities dataset using both Claude Opus 4.6 and Haiku 4.5:

"What is the demand growth for ethylene, propylene, and butadiene in the Asia Pacific and Middle East regions, and how much of this growth is driven by the AI boom?"

It's a gnarly question. Petrochemical demand forecasting with a hypothesis about AI's impact on the supply chain. The kind of thing that would take an analyst hours to piece together.

Here's what happened.

Opus 4.6: The Experience

•  3 LLM calls to reach a complete answer

•  Picked the right polymers and tables immediately

•  Ran a historical disproof test (unprompted)

•  Delivered a formal per-polymer confidence matrix

•  Zero methodological corrections needed

Done. First try. Next problem.

Haiku 4.5: The Experience

•  16 LLM calls across 7 user turns

•  Mixed incompatible data sources from the start

•  Reported conflicting numbers: 7.55% one moment, 8.23% the next

•  I had to guide every analytical step

•  4 corrections before reaching the right answer

The Intelligence Gap
Let me break down what actually happened:

The difference wasn't just speed. Opus understood that the real question was about causality and not data retrieval. It asked better questions of the data, which meant it skipped all the dead ends that Haiku kept falling into.

But Wait. What About Cost?
Here's where it gets interesting.
Haiku 4.5 pricing: $1 / $5 per million tokens (input / output)
Opus 4.6 pricing: $5 / $25 per million tokens (input / output)

My Haiku session consumed:

•  196,174 input tokens

•  139,931 output tokens

•  Total: $0.90

My Opus session consumed:

•  68,790 input tokens

•  28,848 output tokens

•  Total: $1.07

Haiku used 3.4× more tokens and still cost 16% less in absolute terms.
But here's the kicker: Opus cost just $0.17 more and got it right first try.

The Real Math

We obsess over token price. But what about tokens used?

Every correction in the Haiku thread cost tokens twice: once to re-explain, once to re-execute. 16 calls vs 3. Hours of back-and-forth vs a clean answer.

When you factor in:

•  Human time spent on corrections

•  The cognitive load of steering a confused model

•  The risk of accepting a subtly wrong answer because you're tired of iterating

...that $0.17 starts looking like a bargain.

The Lesson

Don't optimize for token price. Optimize for tokens used.
Three things I learned from this:

1. Intelligence compounds. Opus didn't just answer faster, it reasoned better. Better reasoning = fewer dead ends = fewer tokens.

2. Corrections are expensive. Every time you have to re-explain something, you're paying twice.

3. Match model to task. This is the key insight.

But Haiku Has Its Place

Let me be clear: this isn't a "never use Haiku" post.
Haiku is excellent for:

•  Simple lookups and data retrieval

•  Classification tasks

•  Summarization

•  Entity extraction

•  High-volume, low-complexity operations

For these tasks, Haiku is not just cheaper, it's the right choice.
The problem isn't Haiku. The problem is using Haiku for everything.

How PromptQL Handles This

This is actually one of the things I love about building PromptQL.

PromptQL doesn't just use one model for everything. It intelligently routes different types of work to different models:

Complex analytical reasoning? That goes to the more capable model.

Wiki page selection? Classification? Summarization? Those go to a faster, lighter model.

This way, you get the best of both worlds: raw horsepower when you need it for complex reasoning, and speed + cost efficiency for the simpler tasks.

The system makes the decision for you. You don't have to think about which model to use. You just ask your question, and the right intelligence shows up.

The Question I'm Left With

For complex analytical queries, a more capable model costs slightly more but saves hours of back-and-forth.

So here's what I'm curious about:

Would you pay a slight premium for answers that just... work?

Srini Sankar
Srini Sankar
Pre Footer

See PromptQL in action on your data.