From tokenmaxxing to token minimalism

With a great number of tokens comes great responsibility

Jun 24, 2026

👋 Hi, I’m Thomas. Welcome to a new edition of Beyond Runtime, where I dive into the messy, fascinating world of distributed systems, debugging, AI, and system design. All through the lens of a CTO with 20+ years in the backend trenches.

QUOTE OF THE WEEK:

“The amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it.” — Alberto Brandolini

For a few strange quarters, parts of the tech industry decided that the best way to prove AI adoption was to count how many tokens your engineers burned.

Meta built a leaderboard ranking 85,000+ employees by token consumption, handing out titles like “Session Immortal” and “Token Legend.” Microsoft ran something similar. Salesforce set minimum monthly token spend targets and made everyone’s numbers visible to their teammates.

And Goodhart’s Law did what it always does. The moment token usage became the target, it stopped measuring anything useful. Engineers started asking AI questions already covered in documentation, prototyping features they had no intention of shipping, and defaulting to agents for tasks they could have done faster by hand. All to avoid being seen as insufficiently AI-native.

We’d been here before (see lines of code and story points). Every time the industry latches onto a proxy for productivity, it optimizes for the proxy and loses sight of the outcome.

The reckoning

Then the invoices showed up.

Uber burned through its entire 2026 Claude Code budget in four months. COO Andrew Macdonald described learning about the budget blowout as a “head-exploding moment.” After talking with senior engineering leaders, he came away unconvinced the spend was translating into outcomes: “That link is not there yet, […] maybe implicitly there is more that is getting shipped, but it’s very hard to draw a line between one of those stats and, ‘Okay, now we’re actually producing 25% more useful consumer features.’”

Uber isn’t alone. Duolingo reversed a policy that had tied employee performance reviews to AI usage, after staff raised concerns that the metric rewarded tool adoption rather than actual results. Microsoft revoked developer Claude Code licenses months after enabling them.

J.R. Storment of the FinOps Foundation described hearing from companies that they were three times over their entire 2026 token budget: “We started hearing existential crises, and the whole conversation shifted from tokenmaxxing and ‘go fast’ to ‘we need guardrails, how do we control this?’”

The data made it worse. A two-year study of 20,000 developers by Faros AI found that output was rising, but so were bugs and rewrites. Research by Jellyfish found that engineers who used the most tokens were about twice as productive as those who used AI less, but they spent ten times the tokens to get there.

On X, Aiswarya Sankar put a practitioner frame on it: “Try justifying spending $100k on token spend when only $18k even makes it to a stable prod feature. In the rush to maximize AI token spend, companies are wasting over 44% on bug fixes.”

Aiswarya Sankar@Aiswarya_Sankar

This is what we've been seeing with every company we work with. Try justifying spending 100k on token spend when only 18k even makes it to a stable prod feature. In the rush to maximize AI token spend, companies are wasting over 44% on bug fixes

Ed Zitron @edzitron

Uber’s COO has said that it’s getting “harder to justify” its AI costs because there was no way to show a link between AI spend and any meaningful increase in useful features. This is the first time I’ve seen a company say this directly. https://t.co/xUhZvtpwah

3:38 PM · May 26, 2026 · 1.32M Views

124 Replies · 284 Reposts · 2.14K Likes

What’s next

The industry is now scrambling to instrument, audit, and govern token spend. The Linux Foundation announced the Tokenomics Foundation, a new standards body modeled on FinOps for cloud. Startups are rushing out token observability tools. Established vendors like Datadog and New Relic are tacking on token-level monitoring.

Measurement and governance are very reasonable and probably useful. But I think the more interesting shift is cultural.

I came across a conversation recently between Boris Cherny (Head of Claude Code) and Cat Wu (Head of Product, Claude Code). They were reflecting on a year of development, and check out what they said about context engineering:

Boris: You know, people used to talk about prompt engineering, then context engineering. This was sort of matching where the model was at the time. Back in the days of Sonnet 3.5 you had to prompt engineer. Back in the days of Opus 4 you had to context engineer. But with the models of today you don’t do any of this. You give it the minimal possible system prompt, the minimal possible tools, and then you let the model figure it out. You just have to give the model somewhere to pull in the context. I think that’s the most important thing. How do you think about it?
Cat: I see things very similarly. I’m a context minimalist, so my general philosophy is: tell the model only what it needs to know, and let it figure out the rest. I think when you give models too much context it’s kind of like you’re micromanaging them. And sometimes the model knows a better way to get to the same outcome. I personally prefer to give the model the freedom to do that. And in general we’re also making our harness more lean, so that you have more room for your own prompt and so that it follows your prompts better.

What makes this exchange interesting to me, beyond practical advice, it’s what it signals about the maturity curve of working with AI.

The early days of Claude Code required careful prompt engineering: you had to spell out exactly what you wanted and how. Then came context engineering: curating the right files, the right system prompts, the right framing. The people who built Claude Code are now telling you that the model doesn’t need as much guidance now. The instinct to add more context, more tools, more instructions, is now working against you.

This reminds me of how we learned to work with the cloud. In the early days, we lifted and shifted: took what worked on-premise and moved it wholesale. Then we realized the cloud had a different economics and redesigned around autoscaling, managed services, and ephemeral infrastructure. The teams that got the most out of cloud were the ones that understood how to use it differently.

We’re at a similar inflection point with AI tools, which is why I’m calling the next phase of AI tool management “token minimalism” (or, for those who prefer a more pragmatic framing, “token optimization”).

Yes, there will be governance dashboards and spend tracking. But, most importantly, it will also be a genuine rethinking of how we work with these tools: what data we prepare for them, how we select which model does which job, and how much we trust them to reason rather than trying to pre-load the answer.

Token minimalism, in practice

Token minimalism starts with at least two things that most teams haven't addressed yet:

Better data, designed for agents from the start. Setting aside the deliberate token waste caused by tokenmaxxing, agents waste tokens when they work from wrong or incomplete inputs: sampled and aggregated observability data, siloed context that stops at system boundaries, etc.

An agent is only as good as the environment it operates in, and traditional observability simply wasn’t designed for them. By structuring the data layer natively for machine consumption you give the agent a curated, correlated, session-scoped package of exactly what it needs to understand your system and how to fix it. Precision beats volume.

Right model for the right task. You don’t need a frontier model to write a docstring or format a config file. Instead select the right model for the specific task. For example, local models for routine generation and frontier models for reasoning about complex system failures.

Final thoughts

Tokenmaxxing was always going to end. Goodhart’s Law doesn’t fail.

What replaces it is the more interesting question. And I think the answer is practitioners who treat token spend the way good engineers treat any other resource: as something to use precisely.

💜 This newsletter is sponsored by Multiplayer.app, the debugging agent for developers.

Try it for free for 1 month

📚 Interesting Articles & Resources

AI-generated abandonware is hollowing out open source - Charles Humble

93% of codebases contain components with no development activity in the last two years. AI is making this worse through PR slop and because its workflows short-circuit the engagement loop that sustains open source: no documentation reads, no forum questions, no community traffic. When everyone can build, the scarce resource is maintainers.

Hardwood: A New Parser for Apache Parquet - Gunnar Morling

Gunnar Morling built Hardwood: a fast, dependency-free Parquet parser for Java and used Claude Code extensively throughout. His take on AI is one of the most honest framing I’ve seen: AI is a tool and a genuine productivity booster but you still need to understand the code, own it, and catch what the model papers over. Vibe-coding a parser is possible. Building one that’s correct, fast, and maintainable still takes a real engineer.

Beyond Runtime

Discussion about this post

Ready for more?