From tokenmaxxing to token minimalism
With a great number of tokens comes great responsibility
đ Hi, Iâm Thomas. Welcome to a new edition of Beyond Runtime, where I dive into the messy, fascinating world of distributed systems, debugging, AI, and system design. All through the lens of a CTO with 20+ years in the backend trenches.
QUOTE OF THE WEEK:
âThe amount of energy needed to refute bullshit is an order of magnitude bigger than that needed to produce it.â â Alberto Brandolini
For a few strange quarters, parts of the tech industry decided that the best way to prove AI adoption was to count how many tokens your engineers burned.
Meta built a leaderboard ranking 85,000+ employees by token consumption, handing out titles like âSession Immortalâ and âToken Legend.â Microsoft ran something similar. Salesforce set minimum monthly token spend targets and made everyoneâs numbers visible to their teammates.
And Goodhartâs Law did what it always does. The moment token usage became the target, it stopped measuring anything useful. Engineers started asking AI questions already covered in documentation, prototyping features they had no intention of shipping, and defaulting to agents for tasks they could have done faster by hand. All to avoid being seen as insufficiently AI-native.
Weâd been here before (see lines of code and story points). Every time the industry latches onto a proxy for productivity, it optimizes for the proxy and loses sight of the outcome.
The reckoning
Then the invoices showed up.
Uber burned through its entire 2026 Claude Code budget in four months. COO Andrew Macdonald described learning about the budget blowout as a âhead-exploding moment.â After talking with senior engineering leaders, he came away unconvinced the spend was translating into outcomes: âThat link is not there yet, [âŚ] maybe implicitly there is more that is getting shipped, but itâs very hard to draw a line between one of those stats and, âOkay, now weâre actually producing 25% more useful consumer features.ââ
Uber isnât alone. Duolingo reversed a policy that had tied employee performance reviews to AI usage, after staff raised concerns that the metric rewarded tool adoption rather than actual results. Microsoft revoked developer Claude Code licenses months after enabling them.
J.R. Storment of the FinOps Foundation described hearing from companies that they were three times over their entire 2026 token budget: âWe started hearing existential crises, and the whole conversation shifted from tokenmaxxing and âgo fastâ to âwe need guardrails, how do we control this?ââ
The data made it worse. A two-year study of 20,000 developers by Faros AI found that output was rising, but so were bugs and rewrites. Research by Jellyfish found that engineers who used the most tokens were about twice as productive as those who used AI less, but they spent ten times the tokens to get there.
On X, Aiswarya Sankar put a practitioner frame on it: âTry justifying spending $100k on token spend when only $18k even makes it to a stable prod feature. In the rush to maximize AI token spend, companies are wasting over 44% on bug fixes.â
Whatâs next
The industry is now scrambling to instrument, audit, and govern token spend. The Linux Foundation announced the Tokenomics Foundation, a new standards body modeled on FinOps for cloud. Startups are rushing out token observability tools. Established vendors like Datadog and New Relic are tacking on token-level monitoring.
Measurement and governance are very reasonable and probably useful. But I think the more interesting shift is cultural.
I came across a conversation recently between Boris Cherny (Head of Claude Code) and Cat Wu (Head of Product, Claude Code). They were reflecting on a year of development, and check out what they said about context engineering:
Boris: You know, people used to talk about prompt engineering, then context engineering. This was sort of matching where the model was at the time. Back in the days of Sonnet 3.5 you had to prompt engineer. Back in the days of Opus 4 you had to context engineer. But with the models of today you donât do any of this. You give it the minimal possible system prompt, the minimal possible tools, and then you let the model figure it out. You just have to give the model somewhere to pull in the context. I think thatâs the most important thing. How do you think about it?
Cat: I see things very similarly. Iâm a context minimalist, so my general philosophy is: tell the model only what it needs to know, and let it figure out the rest. I think when you give models too much context itâs kind of like youâre micromanaging them. And sometimes the model knows a better way to get to the same outcome. I personally prefer to give the model the freedom to do that. And in general weâre also making our harness more lean, so that you have more room for your own prompt and so that it follows your prompts better.
What makes this exchange interesting to me, beyond practical advice, itâs what it signals about the maturity curve of working with AI.
The early days of Claude Code required careful prompt engineering: you had to spell out exactly what you wanted and how. Then came context engineering: curating the right files, the right system prompts, the right framing. The people who built Claude Code are now telling you that the model doesnât need as much guidance now. The instinct to add more context, more tools, more instructions, is now working against you.
This reminds me of how we learned to work with the cloud. In the early days, we lifted and shifted: took what worked on-premise and moved it wholesale. Then we realized the cloud had a different economics and redesigned around autoscaling, managed services, and ephemeral infrastructure. The teams that got the most out of cloud were the ones that understood how to use it differently.
Weâre at a similar inflection point with AI tools, which is why Iâm calling the next phase of AI tool management âtoken minimalismâ (or, for those who prefer a more pragmatic framing, âtoken optimizationâ).
Yes, there will be governance dashboards and spend tracking. But, most importantly, it will also be a genuine rethinking of how we work with these tools: what data we prepare for them, how we select which model does which job, and how much we trust them to reason rather than trying to pre-load the answer.
Token minimalism, in practice
Token minimalism starts with at least two things that most teams haven't addressed yet:
Better data, designed for agents from the start. Setting aside the deliberate token waste caused by tokenmaxxing, agents waste tokens when they work from wrong or incomplete inputs: sampled and aggregated observability data, siloed context that stops at system boundaries, etc.
An agent is only as good as the environment it operates in, and traditional observability simply wasnât designed for them. By structuring the data layer natively for machine consumption you give the agent a curated, correlated, session-scoped package of exactly what it needs to understand your system and how to fix it. Precision beats volume.
Right model for the right task. You donât need a frontier model to write a docstring or format a config file. Instead select the right model for the specific task. For example, local models for routine generation and frontier models for reasoning about complex system failures.
Final thoughts
Tokenmaxxing was always going to end. Goodhartâs Law doesnât fail.
What replaces it is the more interesting question. And I think the answer is practitioners who treat token spend the way good engineers treat any other resource: as something to use precisely.
đ This newsletter is sponsored by Multiplayer.app, the debugging agent for developers.
đ Interesting Articles & Resources
AI-generated abandonware is hollowing out open source - Charles Humble
93% of codebases contain components with no development activity in the last two years. AI is making this worse through PR slop and because its workflows short-circuit the engagement loop that sustains open source: no documentation reads, no forum questions, no community traffic. When everyone can build, the scarce resource is maintainers.
Hardwood: A New Parser for Apache Parquet - Gunnar Morling
Gunnar Morling built Hardwood: a fast, dependency-free Parquet parser for Java and used Claude Code extensively throughout. His take on AI is one of the most honest framing Iâve seen: AI is a tool and a genuine productivity booster but you still need to understand the code, own it, and catch what the model papers over. Vibe-coding a parser is possible. Building one thatâs correct, fast, and maintainable still takes a real engineer.




