You don’t need more dashboards
You need an observability framework that helps teams fix problems faster and build systems with confidence
👋 Hi, I’m Thomas. Welcome to a new edition of Beyond Runtime, where I dive into the messy, fascinating world of distributed systems, debugging, AI, and system design. All through the lens of a CTO with 20+ years in the backend trenches.
QUOTE OF THE WEEK:
“Observability will not _replace_ monitoring, it will _augment_ monitoring.” - Ben Sigelman
When we talk about “observability,” the conversation often drifts straight into tools: logs, metrics, dashboards, traces. But tools alone aren’t enough. Without a framework to unify them, you end up with scattered data and fragmented visibility.
The real power of observability shows up during incident management. Picture this: a critical service goes down, dashboards are flashing red, and your team scrambles to figure out what’s wrong. Without a common framework, engineers waste precious time piecing together clues: one person sifting through logs, another digging into metrics, someone else chasing traces. The result? Delays, frustration, and sometimes blind guesses.
An observability framework changes that. It consolidates telemetry into a consistent, unified view of your system. Instead of chasing data across tools, engineers can see the bigger picture at once, making problem resolution faster and more reliable.
But faster incident response is only part of the story. There are deeper, long-term benefits to establishing an observability framework.
Better fixes, fewer band-aids
Without observability, teams often fall back on quick but shallow fixes: restarting a service, scaling up resources, or applying a patch without really understanding the root cause. It works … until the same issue reappears days later.
With a framework in place, teams gain access to detailed system insights. That visibility enables them to identify the underlying causes of problems, not just the symptoms. They can also catch subtle performance degradations before they escalate, turning firefighting into proactive optimization.
The result: more permanent solutions, fewer recurring incidents, and higher-quality systems overall.
Empowering the whole team
One of the most overlooked benefits of an observability framework is how it changes team dynamics. Traditionally, system visibility has been the domain of operations or SREs. Developers often throw code over the wall and wait for feedback.
With an observability framework, that barrier breaks down. Everyone (developers, QA, architects) can see how code behaves in production. Developers can quickly diagnose issues in their own services, design more resilient features, and make better architectural decisions. In other words, observability democratizes knowledge. It empowers teams to take ownership of reliability, instead of siloing it.
Optimizing costs and resources
Observability also pays off in dollars and cents. With a clear view into how services behave, organizations can identify overprovisioned infrastructure, spot waste, and make smarter scaling decisions. You stop guessing about capacity and start making data-driven calls. That means better performance for users and better control of infrastructure spend.
Why OpenTelemetry matters
Of course, none of this works without a standard. That’s where OpenTelemetry comes in. It’s now the leading open-source framework for collecting logs, metrics, and traces in a consistent, vendor-neutral way.
With OpenTelemetry, you don’t have to worry about vendor lock-in or fragmented instrumentation. It offers automatic hooks for popular frameworks and libraries, cross-language consistency, and full support for the three pillars of telemetry. In short, it’s the foundation most modern observability frameworks are built on.
How to get started
Rolling out observability isn’t a flip-the-switch project — it’s best approached in phases:
Phase 1: Service Assessment: Begin by identifying critical services that require immediate observability. Establish Service Level Objectives (SLOs) for these components and determine which metrics and traces will help monitor these objectives. This targeted approach ensures resources are focused on high-impact areas first.
Phase 2: Instrumentation Deployment: Deploy OpenTelemetry instrumentation across selected services. This includes setting up automatic instrumentation where available and adding custom instrumentation for business-specific metrics. Ensure proper configuration of sampling rates and data filtering to manage costs and storage requirements.
Phase 3: Data Pipeline Setup: Establish reliable data collection and storage pipelines. Configure the OpenTelemetry Collector to process and route telemetry data to appropriate backends. Implement data retention policies and ensure scalability of storage solutions to handle increasing data volumes.
Phase 4: Tool Integration: Connect observability data with analysis tools and dashboards. Create standardized visualizations for common use cases while enabling ad-hoc query capabilities. Set up alerting rules based on established SLOs and ensure proper notification channels are in place.
Final Thoughts
An observability framework is a way of working. To succeed, teams need more than dashboards: they need training, documented troubleshooting workflows, clear guidelines for adding instrumentation, and regular reviews to ensure the framework evolves with the system.
Done right, observability is a feedback loop that improves reliability, empowers developers, and gives organizations the confidence to scale.
💜 This newsletter is sponsored by Multiplayer.app.
Full stack session recording. End-to-end visibility in a single click.
I originally wrote about this topic in this deep dive overview:
Read the original for more information about:
Why implement an observability framework?
Key components of an observability framework
Why OpenTelemetry?
Implementing an observability framework
📚 Interesting Articles & Resources
AI Engineers and the Hot Vibe Code Summer - Kate Holterhoff
This article explores the rising role of the “AI Engineer,” not as hype, but as a practical new sub-discipline emerging at the intersection of traditional software development and machine learning. Holterhoff argues that AI Engineers focus less on building foundational models and more on integrating, orchestrating, and fine-tuning AI within real-world applications (sparked by increased startup demand and evolving tooling). Does that mean we are (or will be) all AI Engineers?
No, AI is not Making Engineers 10x as Productive - Colton Voege
Voege confronts the myth that AI instantly turns developers into “10× engineers.” He reflects on the anxiety this narrative creates and pushes back with nuance: AI tools might speed up certain coding tasks by 20–50%, but organizational complexity and workflows mean that full productivity gains rarely translate to an order-of-magnitude improvement.