Last week at GTC 2026, Jensen Huang shared a bold vision: treat token budgets like engineer compensation—allocate up to half of an engineer’s salary in AI spend to “amplify” their output 10x.
He reinforced this on the All-In Podcast, putting numbers behind it. If a $500K engineer spends only $5K a year on AI tokens, he said, “I will go ape.” His expectation? Closer to $250K in annual token consumption.
It’s a compelling idea—and it exposes a deeper tension in how AI systems are being built and monetized.
The Incentive Misalignment
I’ve built several LLM-based applications over the past year. One of them—designed to be highly efficient and almost entirely LLM-driven—is delivering strong ROI.
In multiple conversations with cloud provider sales teams, a consistent pattern emerged. The first question is almost always: “How many tokens are you using?”
When I explain that our systems are optimized for low token consumption, the energy in the conversation shifts. The focus quickly moves elsewhere.
This isn’t a one-off interaction—it reflects a broader pattern.
Not Just NVIDIA — A System-Wide Pattern
It’s easy to frame this as NVIDIA’s perspective—tokens drive GPU demand. But this dynamic extends across the entire stack:
- Reduce waste
- Improve efficiency
- Maximize output per unit cost
This creates a fundamental tension:
Providers benefit from more tokens.
Builders benefit from fewer tokens (for the same outcome).
Where Real ROI Actually Comes From
In practice, the highest ROI systems I’ve seen don’t come from higher token spend—they come from better system design. Here are a few patterns that consistently drive results:
1. Curated Context > Broad Context
Sending more context does not equal better results. Carefully selecting high-quality, relevant sources consistently outperforms large, unfocused inputs.
- Examples:
- Narrow RAG retrieval to top-k relevant documents
- Use curated knowledge bases instead of raw dumps
- Convert documents into structured representations before passing to LLMs
Result: Better accuracy, fewer tokens, higher ROI.
2. TOON: Token-Oriented Optimization Notion
Treat tokens as a design constraint, not a byproduct. This means:
- Designing prompts for precision, not verbosity
- Breaking workflows into smaller, targeted steps
- Avoiding monolithic “do everything” prompts
TOON is about: Maximizing value per token, not tokens per task.
3. Output Efficiency Is Underrated
Optimization doesn’t stop at input—output matters just as much. Generating responses in:
- Structured formats (JSON, schemas)
- Concise summaries
- Machine-friendly outputs
…reduces downstream processing and total token usage across pipelines. A tight 200-token structured output often beats a 2,000-token narrative.
Efficiency Is the Real Multiplier
The emerging narrative suggests:
More tokens = more intelligence = more productivity
But in practice, what actually works is:
Better architecture = fewer tokens = higher ROI
The best systems:
- Use LLMs selectively, not everywhere
- Combine deterministic logic with LLM reasoning
- Eliminate unnecessary generation altogether
The Real Question: Spend More or Spend Smarter?
Jensen’s vision of AI amplification is directionally right—AI will absolutely 10x engineers. But the key question isn’t:
“How many tokens are you consuming?”
It’s:
“How much value are you generating per token?”
Because at scale:
- Increasing token usage scales cost linearly.
- Increasing efficiency scales advantage exponentially.
Final Thought
We’re seeing two economies evolve:
- The compute economy, driven by token consumption.
- The application economy, driven by efficiency and outcomes.
And they are not perfectly aligned.
The builders who win won’t be the ones who spend the most on tokens. They’ll be the ones who figure out:
How to get 10x results with 1/10th the tokens.
That’s the real leverage—and it’s still undervalued.