Intro to Queues

Slack isn't just a chat tool

Sep 15, 2023

minimalist photography of three crank phones — Photo by Pavan Trikutam on Unsplash

Let’s say that you are the manager of a call center, and you are trying to plan shift schedules. On average your center receives about 100 calls an hour, and the average employee can service about 10 calls per hour. How many people should you schedule on each shift in order to keep wait times reasonable?

This sounds like the sort of math question that my 11-year old son has for homework, and the answer seems very self evident - you need 10 people per shift, right?

This problem perplexed a Danish telephone engineer named Agner Erlang (although he was studying telephone operators and circuits). He directly observed that when you applied this simple approach to the problem, customers ended up waiting in extremely long queues.

His study of this problem was the genesis of what we call Queue Theory - the study of the flow of work through a system. The lessons and models that have been defined within Queue Theory have been used successfully to tackle problems in traffic management, manufacturing, computing and events management, as well as software development.

My favorite example of queue theory in action is an oft-repeated traffic experiment. Researchers set up a circle, and then set a number of cars driving in the circle endlessly, no cars enter or exit the circle during the test. In theory, the cars should be able to drive at a steady speed forever, right?

But in practice, when the researchers fill the circle closer to capacity, a curious thing happens - eventually, inevitably, a traffic jam happens. Small variances in the system (slight differences in speed) compound over time and inevitably the entire circle is jammed, even though the same number of cars were driving fine in the exact same space just a few minutes ago.

The basic notation of queue theory is W / X / Y / Z where:

W is the distribution of work entering the system; it can be regular or variable
X is the distribution of effort consumed by the item within the system; it can be regular or variable
Y is the number of items that can be supported in parallel at a time
Z is the maximum number of items that can exist in the queue at a time

Two key principles that queue theory is based around are arrival rate - the rate at which new work is entering the system, and service rate - the rate at which work can be satisfied and closed out.

In software, we commonly describe the flow of work as M/M/1/∞ in which M indicates that the rate is variable and not fixed1. So this notation suggests that for an M/M/1/∞ we are describing a queue in which:

Work arrives at a variable rate, even if the average flow is constant, the actual individual items do not arrive at predictable rates (e.g. you may get an average of 30 bug reports a month, but you do not get 1 report every day at 2pm)
The effort to resolve work is variable (some things are small, some are big), again on average you might complete roughly the same amount of work, but the work will complete on a normally distributed schedule.
A team (or dev) works on 1 item at a time
There is no limit to the amount of work that can exist in the backlog

Based on this model, we can actually apply Queue Theory to learn several critical insights to the flow of work, which form the foundations of many best practices in product development flow. Here are three key insights (with math!) that we can learn by applying Queue Theory to software development.

1. The exit flow must be >= the arrival flow

If the flow of new requests is greater than the teams ability to service those requests, then the team will collapse under an infinite backlog that can never be finished.

This feels a bit self evident, but there are interesting implications to software design. If a poorly scoped/built system generates a steady stream of bug reports from customers, and new bugs are being raised more quickly than old bugs are being fixed, then you end up in a system where the queue stretches to infinity and bug fixing work is likely to actually crowd out roadmap work.

Other streams can trigger this sort of response too. If you are so excited to accept new large customers that you don’t properly qualify them, then the resulting stream of enhancement requests from these high revenue customers trying to mold the product to their expectations is again likely to crowd out the strategic work that you want to be focused on.

Both of these situations can contribute to a general feeling that “the team just can’t move quickly anymore”.

2. High utilization increases queues exponentially

Because work originates from several sources, and the flow of new work is not entirely predictable, it is difficult to operate at high capacity. This reflects back to the question that we posed at the beginning of the article. If management anticipates an average of about 100 calls per hour, and the average worker can answer about 10 calls an hour, how many call center workers should you schedule per shift?

The naive assumption is that we should staff 10 workers, because they can, combined, handle about 100 calls per hour, equal to our flow. However, because the arrival of new calls is not predictable - queue theory shows us that staffing 10 workers will inevitably end up with extremely long wait times for individual customers calling in. At first the team will be able to mostly handle the flow, but as small bursts of new calls arrive faster than they can be handled (or a few current calls are more complicated than normal), it triggers a compounding effect which eventually leads to significant delays. It’s the traffic circle experiment all over again.

The mathematical equation for modeling the impact of capacity on queue size is:

\(ItemsInSystem = \frac {PercentUtilization}{1-PercentUtilization}\)

We can graph this relationship out to show how greater levels of utilization results in exponential delays in the system.

The implication to software is fairly clear, if you are planning a high utilization rate (your plan is based on a pretty close approximation of what you think you can complete) then you will inevitably end up with a giant backlog of work - even if your plan fairly accurate accounts for “unexpected” work such as bugs and urgent requests.

For example, if you build capacity into your plan to handle 5 new bugs a week you will end up with a significant backlog of bugs even if the number of bugs that comes in is slightly below your estimate (say an average of 4.5 bugs per week).

And why is a long queue or backlog bad? Partly because it’s hard to escape from the trap of endless due dates that are increasing difficult to hit, and partly because queues are inventory and inventory is expensive.

3. As arrival rate approaches service rate, wait times increase exponentially

Queue theory breaks down the flow of work into arrivals (new items entering the system) and service (when an item is serviced it exits the system).

Again, naively it sounds like as long as we can service items more quickly than they come in, then we can stay on top of things. This simple assumption is quickly proven false in real world scenarios

Queue theory has a good model to describe this:

\(AvgTimeWaiting=\frac{1}{(ArrivalRate - ServiceRate)}\)

What this formula says is that if new work is coming into the queue at an average rate of 5 new items a week, and you are currently resolving them, on average, at a rate of 6 items resolved per week, then on average and new item that enters the system will be waiting roughly 30 weeks before work begins.

\(T=\frac {1}{\frac{1}{5}-\frac{1}{6}}\)

In fact, the closer the rate of closing items is to the rate of new items, the longer the average wait for any individual item of work.

In this graph we can see that, on average, if you are closing out work 10 times faster than new work enters the queue, the average item only spends about 0.1 units (minutes, days, weeks) waiting. But if you are closing 10 items for every 9 that enter, the average item ends up waiting 9 units (minutes, days, weeks).

This relationship holds true whether you are shipping FIFO (first in first out) or whether you are constantly re-prioritizing (for example fixing bugs first).

This has tremendous implications in planning software. If you want to fix bugs, and they are coming in at a high rate, the impact to the strategic roadmap work is enormous. Or similarly, if you are trying to focus on an ambitious roadmap of delivery, customers are going to report bugs that may go unfixed for weeks or months.

If your queue is out of control, something is going to suffer.

Take aways

Teams need slack to manage their queue of work
When slack starts to tighten, delays begin to add up exponentially

Slack is key, but in addition the concept of Queue Theory is foundational to the emergence of many best practices in shipping software, from limiting WIP (work in progress) to working in small batches.

More specifically, the M defines the distribution of the rate - the choice of M refers to Markovian, which is a normal distribution in which the events are independent and unrelated to each other. For example a bug report by one customer has no impact in any way on a different customer reporting a different bug.

Dan’s Substack

Discussion about this post