Concept

Latency Load

The additional work delay creates. Not the work delayed. The new work the delay generates while the original work waits.

Definition

Latency Load is the additional work, complexity, coordination, recontextualization, and risk created by the passage of time between when work becomes actionable and when it is actually completed.

The original work is one cost. Latency Load is the second cost the system pays for holding that work in queue. In a low-latency system, Latency Load is a rounding error. In a high-latency system, it consumes capacity at a rate that approaches or exceeds the capacity available to do new work. At that point, the organization is no longer producing. It is governing the queue.

The system stops doing value-adding work and starts spending its capacity governing the queue.

The Cause Doesn’t Matter

Three categories cover virtually every real-world delay. The cause of latency is irrelevant to the load it generates.

Chosen latency. Work parked in a queue because something else is more urgent. Backlogs. Prioritized lists. Deferred decisions.

Imposed latency. Work blocked by a dependency, an approval cycle, a funding decision, or a platform constraint.

Environmental latency. The world moves while the work waits. Markets shift. Codebases evolve. Stakeholders change roles. Requirements drift.

The load accumulates either way. This is why Latency Load is a single operational target regardless of why a particular delay exists. Reduce the latency, and the load drops across all three causes at once.

The Components

Latency Load is not a single cost. It is a family of costs that delay generates.

Bookkeeping work. Maintaining the queue itself: prioritizing, refining, reporting, defending in stakeholder meetings.

Recontextualization work. Picking up a delayed task in an environment that has shifted underneath it. Re-reading specs, verifying assumptions, re-checking dependencies, re-scoping when scope has drifted.

Coordination work. Stakeholder updates, expediting calls, status reporting, escalation paths, the workarounds other teams build to route around the delay.

Decay work. Documentation, code, knowledge, skills, and tooling that go stale during the wait and require refresh work at execution time.

Defect-amplification work. Late-discovered defects mean more downstream work has been built on the broken foundation, all of which must be redone.

Switching work. The cognitive cost of reloading mental context every time a paused task resumes.

Recovery work. Triggered when long-deferred items go critical. Firefighting. Escalation management. Rush coordination.

Decision-replay work. Re-justifying that the task is still worth doing, re-prioritizing against newcomers, re-estimating because circumstances changed.

Compounding-queue work. Reinertsen’s loop. Queues amplify variability. Variability amplifies coordination cost. Coordination cost amplifies WIP. WIP amplifies queues.

The list is not exhaustive. Other components exist. These nine cover the most consequential mechanisms.

The Self-Reinforcing Loop

Latency Load is not a one-shot cost. It compounds.

Higher utilization produces more work-in-process. More WIP produces longer queues. Longer queues produce more delay. More delay produces more Latency Load. More Latency Load consumes capacity that would otherwise have gone to new work. Less usable capacity creates pressure to increase utilization. The loop closes.

The Latency Load loop: higher utilization leads to more WIP, then longer queues, more delay, more Latency Load, less usable capacity, and pressure to increase utilization, which closes the cycle. — The Latency Load loop. Each step feeds the next, and the cycle closes back on itself.

This is the mechanism Don Reinertsen modeled in Principles of Product Development Flow. Queues amplify variability. Variability amplifies coordination cost. Coordination cost amplifies WIP. WIP amplifies queues. Every step in the loop makes the next step worse. Organizations end up consuming their own capacity carrying the queue they created.

The system is consuming its own capacity carrying Latency Load.

Intellectual Lineage

Latency Load is an umbrella over a phenomenon that multiple thinkers have named at specific corners.

Taiichi Ohno. Secondary waste. New labor generated by the existence of primary waste. Toyota Production System, 1950s onward.

W. Edwards Deming. System design determines roughly 95% of capability. When delays accumulate, the resulting waste is structural, not the result of individual underperformance.

John Seddon. Failure demand. Demand on a system generated by a failure to do something right the first time, or at the right time. Vanguard Method, 1987 onward; named in I Want You to Cheat (1992).

Eliyahu Goldratt. Theory of Constraints. Protective capacity. Queues at every handoff generate Ohno-style secondary work. Deliberate slack at non-bottlenecks prevents queue formation.

Don Reinertsen. Cost of Delay and the queueing mechanics of product development flow. Principles of Product Development Flow (2009). Modeled the self-reinforcing loop explicitly.

None of these authors named the umbrella. Each was working a specific corner. Latency Load gives executives and practitioners a single operational name for seeing these effects as one pattern. All of this work exists because of delay. Reducing delay is the lever that reduces all of it at once.

Why It Matters for Practice

If Latency Load is a single operational target, the intervention is also a single target: reduce the time between actionable and complete.

The highest-leverage moves are the ones that compress that interval. WIP limits. Smaller batches. Fewer governance gates. Decision rights pushed down to the level with the context. Funding models that follow value increments rather than projects. Each of these moves attacks the loop at a different point. None of them require a new methodology to adopt.

The trap is treating Latency Load components individually. Most organizations notice one or two of them, the recontextualization meetings, the firefighting drills, the rework after a long pause, and try to optimize those locally. That fails. The components are symptoms of the same underlying mechanism. Optimizing them locally without compressing the latency that generates them moves the load somewhere else in the system.

The lever is latency. Everything else is downstream.

Choked Flow. One of the Three Barriers in Applied Agility. Latency Load is the structural manifestation of Choked Flow at the work-item level.

Cost of Delay. Don Reinertsen’s economic measurement of delay. Latency Load is the operational consequence; Cost of Delay is the dollar-denominated quantification.

WIP Limits. The highest-leverage intervention for reducing Latency Load. WIP limits cut the queue length directly, which cuts every component of Latency Load downstream.

Value Acceleration Process. The framework intervention that operationalizes Latency Load reduction at portfolio scale. VAP gives the organization a repeatable rhythm for diagnosing where Latency Load is heaviest and removing it incrementally.

Value Increments. The portfolio-altitude slicing discipline. Funding work as value increments rather than year-shaped initiatives compresses the interval between intent and outcome, which compresses Latency Load directly. Increment slicing is one of the highest-leverage moves available against portfolio-altitude Latency Load.