Managing Risk in Agile Projects (Part II): Managing Risk in Scrum Projects

In this post, our second post of the agile risk management series, we will walk through a specific instance of the agile risk management process that is appropriate for operational risk management in Scrum projects that use Kanban boards. We describe this approach because we have found useful in such agile environments though we refer you to Agile Risk Management for further details where appropriate. We will be taking you through those details which we have found to work best in agile projects so let’s start with the big picture.

ScrumKanbanModel

Consider a project that you have recently been involved with and try to identify what you consider to have been risks that you encountered. Perhaps you recall that the customer was not really that clear about what they actually wanted, that implementing a specific feature was technically very difficult or that the time schedule was tight. When we start to think about risks, we tend to worry mostly about what might go wrong and usually end up expressing risks in terms of their effects on the project.

The most succinct definition of project risk is “uncertainty that matters” (Hillson). In the project context, this refers to any uncertainty that has a positive or negative impact on one or more project tasks or objectives. Let me put this another way using an analogy: whilst the outcome of a horse race may well be uncertain, it only becomes a risk when you place a bet. As soon as you do this, the objective of making money is affected by the uncertainty which can result in you have more (positive risk) or less (negative risk) money after the race.

Identifying risks is harder than you might imagine. The biggest problem is conflating uncertainties and effects. Let’s take an example: suppose that a website is going to be migrated from a physical to a virtual server. Take a moment to think about what the risks might be. Did the thought of the website not being available just pop into your head? It might surprise you to learn that this is not a risk but rather an effect of the migration being unsuccessful. To understand this you must ask the question: why might the website no longer be available? If you hear from the team that they are not sure how to configure the DNS or not certain if the virtual server has the same configuration as the physical server, then you are on the right track since you are encountering statements that express uncertainty.

ScrumKanbanModelRiskIdentification

Of the numerous techniques for identifying risks, the one that we have found to work best for agile teams is precisely this what/why approach. During a risk workshop (which usually forms part of iteration planning) ask the team members to brainstorm and write down “what” might happen. Once this list is complete and has been reviewed (e.g., elimination of duplicates), it is time to ask why each “what” might occur. A common approach here is to write down each “what” as the title on a separate blank page. Then pass the pages around in a round robin, inviting everyone to contribute their whys. In the example above, the “what” page entitled “website not available” might have the following whys “DNS might not be configured correctly” or “virtual server configuration might differ from that of the physical server”. One final note: be careful not to frame the “what” question negatively (i.e., what might go wrong) as everyone in the team needs to be open to the possibility that there might be opportunities ripe for exploitation in the project.

Once identified, risks should be logged and analyzed further to assess their likelihood and impact (together these are known as risk exposure) – generally in each case T-shirt sizing (i.e., small, medium and large) suffices. Risk assessment is difficult at the best of times so it worth asking why we even bother. There are two reasons: first, since not all risks are equal, we need a simple means of prioritizing them and second we need to understand exposure in order to determine how to treat risk. Another detail associated with risk assessment is the ability to score risk bands which will later become important in risk monitoring. It is important though to understand the limitations of risk assessment techniques such as asking people (e.g., hidden agendas, confirmation bias), using past data (e.g., might not be indicative of the future) or probability models (e.g., may be based on unrealistic assumptions) so when an assessment is made, it should be challenged.

ScrumKanbanModelRiskTreatment

The classical risk response strategies of risk management (i.e., avoid, reduce/exploit, share/transfer and accept with contingency) all apply in the agile context though they translate into two possible responses. The first is to determine a task based response (e.g., exploit a risk, remove a task, share an activity) each of which belongs on the Kanban board and should be appropriately colour coded. The second is to determine the application of a suitable agile technique and tag all the tasks (including risk activities) for which this applies. For example, to tackle the uncertainty of GUI design, pair programming might be advocated as a risk mitigation measure for all GUI related tasks.

The benefit of risk tasking and risk tagging is that is gives everyone an immediate impression of the level and distribution of risk in a project. Simply by looking at the risk-modified Kanban board, allows you to question if the level of reward really warrant the risks (in the case of high risk stories) or if the absence of risk is realistic (in the case of apparently low risk stories).

ScrumKanbanModelRiskMonitoring

Risk monitoring employs a risk burndown chart to track the reduction in risk scores assigned during risk assessment. Generally speaking a drop on the burndown is due to a risk related task being completed, the application of a risk mitigation agile technique to any task, the removal of a task or the expiry of a risk making it no longer relevant. It is not possible, however, to completely eliminate risk and thus a certain amount of iteration residual risk will always be present on the burndown.