A Question of Handoffs & Queues
Recently, a manager who’s part of an agile transformation I’m helping with asked me about the impact of handoffs and waiting queues, and the relationship they have to the flow of value. He was looking for some hard numbers, not anecdotal stories or theory. I know I’ve seen some data on this somewhere, but I have no idea where it might be. But it got me looking for information related to this topic.
It turns out that there isn’t a lot of information published that actually shows the impact of queues and handoffs. Lots of theory, and lots of anecdotes, and lots of things that make sense. But not a huge amount I could find on specific quantifiable impacts.
Thanks to some help from a bunch of people (Troy Magennis, Kevin Sivic, Declan Whelan, Tyson Browning, and Glen B. Alleman - really hoping I didn’t miss anyone), I consolidated some information below, and thought others might find it helpful. I suspect there’s more information available, and if you have some, I’d love to learn about. Feel free to contact me & let me know what I’ve missed, or what I’ve gotten wrong!
Since I know it’s possible you’re not going to read everything below, I’m going to start with my conclusion here:
There’s enough information from a wide variety of sources that confirm what many of us know: waiting times, handoffs, and queues all significantly contribute to sources of delays in the delivery of value. Moreover, in spite of many attempts to not locally optimize, that seems to be exactly the approach we’ve taken over the years. It seems that we often make our teams look good, even if it’s at the expense of the full delivery of value. Based on the material that follows, leaders and organizations look to improve components, not the whole. But that’s a topic for another blog post another day.
Understanding the flow of value means looking at the sources of delay in our systems. More often than not, these delays aren’t caused by the time a team is working on something, but almost always because of the time a piece of work sits waiting for the next person to pick it up for the next step in the process. The estimation done on the length of time a piece of work takes to be completed is often correct; but the waiting time between steps, where handoffs occur or where work is in a queue for the next step in a workflow, accounts for the majority of time it takes for us to get anything done. The fact we can’t see our work only compounds this problem.
And now, a small collection of excerpts from a pile of sources that helped me confirm that hypothesis.
One of the key things we need to understand when talking about sources of delay is how we measure the performance of a process. So let’s start with a quick definition of Process Cycle Efficiency:
PCE, also known as “Flow Efficiency” or “Value Add Ratio,” is a measurement of the amount of value-add time in any process, relative to lead time (the time between the initiation and completion of a production process). The higher the ratio, the more efficient your process. This metric quantifies waste throughout a system of delivery.
https://www.leadingagile.com/2017/09/process-cycle-efficiency-pce-metric/
It’s calculated using the following formula: Process Cycle Efficiency = Value-Added Time / Cycle Time
Hand-offs result in work waiting in queues for the next person, team, or process. It’s not common to have one process complete at the exact moment the next process is ready to begin . Think about the development work being complete and then waiting for the testing process to begin. And once that’s done, it likely sits in a queue, waiting for the deployment window. Or a marketing team completes a campaign, but needs to wait for the legal team to review the small print.
The Lean Enterprise Institute (LEI) publishes a fantastic reference book called “Lean Lexicon”. From this source, they look at the Value-Creating Time (VCT), which they define as:
The time of those work elements that actually transform the product in a way that the customer is willing to pay for.
A simple test as to whether a task and its time is value-creating is to ask if the customer would judge a product less valuable if this task could be left out without affecting the product.
Customers are not usually willing to pay for the time the work is sitting around, not being built!
Depending on the source, you’ll find that most companies operate somewhere between 5-10% PCE, meaning the “value add” time in their process accounts for roughly 5-10% of the total time it takes to produce a widget. Repetitive environments, such as manufacturing, where they’re producing identical products, often gets up to 20%. Most people I talk with find this unbelievable… Until they map out their own process!
I have found through personal experience (not books) that for discrete manufacturing processes a PCE over 5% is outstanding. However, for front office VSM’s the PCE may be much higher. I once mapped a front office process (current state) that had a PCE of more than 12%. I have also mapped some manufacturing processes where my PCE was less than 1%!
https://blog.gembaacademy.com/2007/03/15/mysterious-process-cycle-efficiency/
One case study by Black Swan Farming looked at the length of time to deliver a feature at Maersk Line, where they found the Value Add Time 106 hours yet the delivery of the feature took 38 weeks. And this isn’t uncommon:
https://blackswanfarming.com/cost-of-delay/
If we assume 40 hours of available work time per week, we can plug the numbers into the formula to calculate the PCE for this feature:
Process Cycle Efficiency = Value-Added Time / Cycle Time
PCE = 106 / 1,520
PCE = 6.97%
This isn’t a reflection on the people on any of the teams doing the work. This is a reflection of the environment and structure in which the people work. There are a lot of handoffs and waiting going on in that case.
It’s important to note that this data doesn’t tell us why. But, if we want to improve the flow of value, we’ve now got some information to help us get started, and we’ve got a baseline we can use to determine if our next target condition makes an impact on the flow of value, or not.
Queues are everywhere, and work is sitting, waiting in them. Some we see, many we don’t. Handoffs are one very explicit queue, where work is passed from one person to another, and then waits for that other person to be available to pick up the work:
All of our working life is filled with queues. Because we are so busy, we find it odd to realize that of a project’s total length, very little is actually work. It spends most of its time in a series of queues. The queues range from really big and obvious delays, (waiting to have a team assigned to the project); to minor and almost untraceable ones, (a request for information in an email sitting unanswered in a colleague’s inbox).
We tend to respond very well to visible queues. We can avoid them (big queue at the bank, I’ll come back later); we can get angry with them (complain to the manager and threaten to move your account elsewhere); and we can manage them (invest in automatic paying-in machines to try and reduce queues at the bank).
In manufacturing, where queues are large piles of inventory on the factory floor, and thus present on the balance sheet as unrealized assets, management will expend a lot of effort on reducing them.
But inventory queuing up in software development is invisible. Most of our work is made up of information: ideas, design and lines of code. The impact is just as severe for us as for the manufacturer, however. The longer part-completed work sits there gathering metaphorical dust, the greater the danger that the value could disappear altogether. The opportunity could pass, a customer might walk away, technology or the environment might change... The sunk cost would be irretrievably lost, the hours of time invested to date, wasted.
Because our queues are invisible, we find it easy to ignore them. If we double the number of requirements in a project, there is no warning bell that sounds. Developers in the department might look slightly more stressed, but there is no real way of knowing what the change has done to their work or to how quickly they will complete it. Now imagine we doubled the number of developers on the project. Everyone would notice that! Not only is there a sudden scuffling for desk-space, but managers would be in crisis meetings trying to find extra budget to pay their wages.
Managing these queues, and handoffs, can have significant impacts on the flow of value.
“Queues only exist in manufacturing, so queueing theory and queue management don’t apply to product development.”
This is a common misconception. As mentioned, queueing theory did not arise in manufacturing but in operations research to improve throughput in telecom systems with high variability. Furthermore, many development groups—especially those adopting lean or agile practices—have adopted queue management based on queueing theory insight for both product development and portfolio management . One study from MIT and Stanford researchers concluded:
“Business units that embraced this approach [queue management for portfolio and product management] reduced their average development times by 30% to 50%”
https://less.works/less/principles/queueing_theory#QueueManagementtoReduceCycleTime
In the Agile community, a paper published originally in 2018 focused on local optimization – that is, optimization within a team, rather than across the full value stream.
Process Efficiency is derived from a standard metric used for decades in Lean Manufacturing - value-added work time divided by clock time. In practice, Process Efficiencies exceed 25% for processes that have been improved through the use of Lean methods whereas the average Scrum team Process Efficiency for completing a Product Backlog Item is on the order of 5-10% according to polls of participants in Scrum classes in the U.S. and Europe. Measuring Process Efficiency can significantly improve the performance of Scrum teams as it is directly correlated with increase in team performance.
Section 3 of this paper outlines the example used to highlight the impact of “resource utilization maximization” on the flow of value within a team. It continues on, showing how to calculate this number, and articulating how to improve it within a Scrum team. The problem is that their “improvement” is creating a local optimization within the team at the expense of truly having an accurate picture of the actual cycle time. By inserting a step outside of their process of getting work to meet a Definition of Ready they have made their team numbers look better, but also created an incomplete and inaccurate picture of the true impact of handoffs, queues, and actual improvements in the delivery of value from concept-to-cash:
A previous study at a CMMI Level 5 Scrum company showed that improving Process Efficiency doubled productivity. This was enabled with a checklist that determined whether a story was “Ready” to be brought into a sprint. Published research on this effect led to introduction of the concept of “Ready” in the Scrum Guide and publication of a pattern called “Definition of Ready” by ScrumPlop.org.
Let me be more explicit about what might be happening in this case study: work isn’t flowing through the team, so the team puts a DoR in place to get a whole pile of things done ahead of time, so that the work can flow. And that sounds good. It is. Especially when we look now at the flow of value through the team: no blockers, and work is getting done. But what we’re missing is the entire mapping and quantification of a full view of the work being done. The data for the team now looks good, but have we really made any difference to the flow of value? Maybe. But maybe not. I don’t know.
Often, we see this sort of local optimization - making the metrics for a team look good. That may be measuring the wrong thing. I’m reminded of an environment where I was brought in to speed up development at a large Canadian telco. The business was complaining about the ~16 months it took to get anything to market. Development needed to be improved. The problem, which when we exposed it didn’t make us popular, showed that ~14 of those months were getting the work ready for the development team. It took that long to meet the development team’s Definition of Ready. And when the development team looked at their data, they didn’t understand the problem, since they started measuring only once the work met the DoR. They weren’t looking at the ~14 months of time that had elapsed since the work was originally requested - they were only measuring their ~4 weeks of development.
It would be a bit like measuring the amount of time needed for me to get a coffee from Starbucks. If the story only measures the time to actually make my coffee (the development time), they’d get metrics showing that it takes 35 seconds, and that’s almost all value-add time. What they’d be missing is the 7 minutes I spent in a line waiting to place my order, and then the 2 minutes it takes to process and pay for my order, the 4 minutes my cup is sitting waiting between the time I’ve placed my order and the time they actually start making my coffee… The time for me to get the value started long before it was ‘ready’ for the barista.
But that’s exactly what appears to be happening in this case study.
Dave Nicolette, an exceptional Agile Coach, was engaged at a large bank to support an initiative to improve software delivery performance. Using Lego, teams tracked, every hour, if they had made progress on their work items. Red bricks indicated that no work was done on an item for an hour, while a green brick indicated that work was done on a work item. While a little intrusive, it was an interesting experiment to conduct, with some very interesting results, and was less intrusive than some of the other approaches to collecting this information. With the one team of data he provided in his writeup, he’s shown that for one team, on one specific day, their Process Cycle Efficiency (PCE) was around 30%. That is, out of the 50 hours available, value was added to the team’s work a total of 16 of those hours. He also noted that focusing on the flow and progress against work – not on the people – resulted in immediate changes by the individuals on the team:
It’s commonplace that when something is made visible, people act on it. I was surprised to see the natural response when a team member reaches for a red brick. Others on the team immediately ask what the impediment is and how they can help.
This approach wasn’t without controversy, though:
A couple of people voiced the concern that we were asking individuals to keep track of how they spend their time. The organizational culture is such that management expects people to get their work done, and does not track when, how, or where they work. I had to clarify that this is about tracking time from the point of view of a User Story, and not from the point of view of any individual person. We want to expose time sinks so that we can help management change the organizational structure and formal procedures in ways that make life better for development teams. Once that was clear, people were not opposed to it.
And that point he makes is critical: The work is being tracked, not the workers.
In March, 1999, Don Reinertsen published an article in “Electronic Design” where he explained:
All managers know that engineers can’t accurately forecast development schedules. But there are different explanations for this phenomenon. Some managers assume this flaw is due to lack of motivation. “If only we made it more painful to miss the schedule, they would take us seriously.” Other, more humanist managers suspect a lack of skills. “If only we trained them to estimate properly, we would get more accurate estimates.”
Both views are incorrect. Since most engineers dislike missing their schedules, motivation is rarely an issue. Engineers don’t submit silly estimates for the sheer pleasure of missing them. Furthermore, engineers rarely offer poor estimates because of an inability to provide high-confidence ones. High-confidence schedules can be generated easily by leaving large safety margins for uncertain tasks. No mysterious energy field is causing engineers to produce wacky estimates. Instead, there is a much simpler and more accurate explanation: Experienced managers don’t want conservative schedules, and experienced engineers know that.
Why don’t experienced managers like high-confidence schedules? Quite simply, they’re significantly longer than aggressive ones.
A common planning fallacy this brings to light is the ability for engineers to forecast, with surprisingly high accuracy, the length of time and effort their part of the project will take. However, the sequential nature of work, handoffs, and waiting for the next step in the process is almost never accounted for in planning; the focus is on the person, or team, and not the work.
When people join Toyota, they learn “Eyes for Waste.” They learn to see things as waste that they had not considered, such as inventory —queues of stuff. Now, queues of physical things are easy for people to perceive, and to perceive as a problem… My goodness, there’s a gigantic pile of Stuff queuing up over there! Making any money from the pile? Are there defects in there? Does it need to be combined with other stuff before we can ship it? Do we need—and will we make money with—each and every item in the pile?
Invisible queues—In traditional development there are also all kinds of queues, but because they are invisible they are not seen as queues or keenly felt as problems. If you are a business person who has invested ten million euros to create a gigantic pile of partially done Stuff sitting on the floor, not making any money, you walk by it and see it and you feel the pain and urgency to get it moving. And you think about no longer making big piles of partially done stuff. But product development people do not really see and feel the pain of their queues.
Yet, they are there. Queues of WIP—information, documents, and bits on a disk. Invisible queues. Product development people need a lesson in “Eyes for Queues” so that they can start to perceive what is going on, and develop a sense of urgency about reducing queue sizes.
https://less.works/less/principles/queueing_theory#HiddenQueuesEyesforQueues
There are lots of little nuggets of goodness in that excerpt, and even more in the source in its entirety. However one of the elements that I think is critical to call out is the culture of the environment. And while it’s outside the scope of the purpose of this post, I’m reminded of the NUMMI plant, where, with the same managers, and the same unionized employees, Toyota had a very different experience in the production, profitability, and performance of the plant because of the culture their senior leaders created. Have a look at the NUMMI case study, and John Shook for more information, if you’re interested.
In the book “rethinking Agile” by Klaus Leopold, he writes:
The Entire organization talked about the most exciting questions, such as “Can the Product Owner take part in a Retrospective?” or “Is the Scrum Master also allowed to do work operatively in the team?” … I’ll have a perplexed look on my face and think to myself: “Huh?!? What in the world does that have to do with the goals you want to achieve?”
This organization … confused the means for the purpose. At the very beginning, it was going about improving their Time-to-Market – now, however, everyone was talking about stupid rules that were written down years ago in some kind of agile framework.
As soon as management decides that Daily standups, Retros, cross-functional teams, etc., are the requirements for achieving their goal, implementing these agile working methods becomes the goal itself. The focus – for the teams as well as for management – is placed on whether or not all the methods’ rules are being followed correctly.
He goes on to elaborate in the section “Cause #2: Dealing with dependencies between teams and products”. While he doesn’t provide specific data around numbers, this clearly was an identified issue in the delivery of value:
I went looking for patterns and asked questions like: “Which teams do you often have to wait for?” and “With which team do you have the most interactions where you regularly are waiting on something?” My goal was to make these interactions visible in dependency graphs, and so I worked through from team to team consolidating the fractals.
The management and teams were mostly shocked at first. Their idea was to eliminate as many dependencies as possible. That’s why teams were newly arranged as cross-functional teams where each one was only working on one product. The teams shouldn’t even have to wait on other teams. And yet there were a number of dependencies visible that naturally increased the cycle times. Because what goes out of one system must first be prioritized in the next system. For example, if the work lands at a Scrum team, it must wait at least one Sprint until it gets processed.
https://www.amazon.ca/Rethinking-Agile-Nothing-Business-Agility/dp/3903205397/
So what do we do with all this information?
Well, if you’re still reading, congratulations on getting to this point. There’s a reason I put the conclusion up at the top.
There’s a great quote from Russell Ackoff, which will likely be a future blog post related to this topic:
A system is never the sum of its parts. It’s the product of their interaction.
If we can start to track our PCE and make these queues and handoffs visible to identify where the delays are happening, we’ll be miles ahead of where we are today. If you do, I hope you’ll consider sharing your findings so others (starting with me) can learn from you.