Life means dealing with bad things that may or may not happen. We call them risks. We assess, evaluate, mitigate, accept, and sometimes blithely ignore them. Building complex and original software are inherently risky and the Agile way of working does not fix that. That’s why we need to be true to the value of courage. I’ll start my argument with a refresher on the topic and some practical examples.
The dictionary defines risks as the likelihood of bad things happening. Equally important is the nature and extent of the damage when the risk becomes a reality. The exact balance between probability and consequences is the bread and butter of actuaries at insurance companies. Their models inform them how likely it is that an average dwelling goes up in flames, so they can put a price on the collective risk of their millions of customers. Several houses are certain to burn down each year and their owners need to be reimbursed. It’s a predictable risk.
The Scala World Hiking Trip
This won’t do in aviation or civil engineering. Every lethal incident prompts an investigation to raise the standard and reduce risk up to the point where we are left with black swans only, events so rare and unpredictable you can’t possibly prepare for them. Most airline crashes are the result of these unknown unknowns.
Here’s a more mundane example. At Scala World 2019 I joined the attendees for the traditional pre-conference hike in the English Lake District. I had visited the area before and arrived with waterproof gear and sturdy boots, knowing the terrain and the unpredictable British weather, which even in September can be brutal.
We set off in the sunshine and of course, it had to rain for much of the walk. Several walkers had not read or minded the instruction email, or even checked the weather forecast. Arriving woefully unprepared in cotton jeans and t-shirts, they got thoroughly soaked and a little miserable. But there was safety in numbers. Spare ponchos were shared, and nobody would have been in mortal danger if they had sprained an ankle while clambering over slippery boulders with their inadequate footwear.
Four Ways of Doing Nothing
The literature distinguishes five ways to manage risks. Funnily enough only one deals with facing the risk head-on. The other four strategies are variations of doing nothing. And the hiking trip covers all five.
- The proactive way to tackle a risk is to make sure it is less likely to happen and/or make the consequences less unpleasant when they do happen. This is called mitigation. You can’t stop the clouds from raining, but you don’t need to get soaked and cold. Sturdy boots, Merino undergarments, head torch, Garmin GPS device, emergency rations. You keep mitigating until you’re at ease (or broke). Then you pick an option from the remaining four.
- Accept – After careful consideration, you decide the risk is acceptable. You’re prepared for what might happen. Yes, it will be cold and wet, but you’re all Gore-Tex-ed up and with experienced ramblers.
- Cancel – You’ve done what you can to mitigate but decide the risks are still not acceptable. You call the whole thing off, so you’re no longer exposed to the risks.
- Transfer – What if you break a leg and need to be airlifted to the hospital? That will cost a fortune. You take out premium daredevil insurance to cover this unlikely event.
- Ignore – Not much of a strategy at all, this one. You don’t know and don’t seem to care. You think the Lakes are mere picture postcard cuteness, where you couldn’t possibly die of hypothermia when injured on a solitary winter hike, unable to call for help. You didn’t check the weather forecast, went out in jeans and trainers, and your phone charged 20%.
Now let’s see how these five strategies apply to software risks:
- Mitigation We are good boy/girl scouts. We write clean, well-tested and peer-reviewed code and are in close contact with the business to make sure expectations are aligned.
- If we feel that our code is of sufficient quality and we have a good system to deal with bugs in production, we accept the risks.
- Or, if we’re working on a self-driving car, an A minus is still not good enough. While the risks may be more about liability and legislation than about the autonomous driving skills themselves, they are still too great to accept.
- Remember when each IT outfit had its own server room and surly sysadmin? We’ve transferred the troubles of maintaining bare metal (and much more) to one-stop-shop Cloud providers.
- Last but not least, there’s the cowboy mentality of throwing your code at the wall and seeing if it sticks, which was not the idea behind continuous delivery. Still, one person’s enterprising spirit is another one’s cavalier ignorance.
The Proof Was Always in the Pudding
Not every bug costs life, but some of them do. And while all bugs are expensive, so is not shipping, or gold-plating to perfection. The world of software is too vast for a single risk management strategy.
Yet despite the variety, software risks come in two predictable categories: the wrong thing done right, or the right thing done wrong. We may deliver a product that does a terrific job of what the customer doesn’t need. Alternatively, we may understand the customer perfectly, but botch up the implementation. Both are bad, and combinations of the two are common.
This is a disheartening prospect, but building original software means finding novel solutions, which means dealing with the unknown. Things will go wrong. We can mitigate the heck out of our code with more and better design documents, testing and peer reviews, but eventually, it needs to prove itself in the real world.
In civil engineering or the Artemis space project, big up-front designs make sense. You can’t make a bridge two feet wider once it’s built. When going back to the drawing board is crushingly expensive, thorough risk minimalization is called for. We used to make software in the same fashion, and it didn’t work. The Waterfall approach tried to mitigate the risk of building the wrong thing before any working code was shared with the user. Well, good luck with that. No putative interpretation of what the user wants ever proved capable of predicting exactly what they needed. The proof has always been in the pudding.
The Agile Compromise
We cannot eliminate the risk of building the wrong thing with better design and more focus groups. We can only do it by shorter and more frequent iterations of a working product that is incomplete and maybe not that great yet. That’s the Agile compromise.
Shorter cycles decrease the risk of building the wrong thing but increase the risk of degrading the process. Accruing technical debt is one. Not a necessary consequence, just a standard price to pay for quicker deliveries. No pundit with stories about the constant commitment to quality will convince me otherwise. If you want greater speed, accept more risks.
The Agile compromise towards risk-taking also recognizes that software, as a creative discipline, by nature exposes black swans: the risks we didn’t know we would ever run into. No engineering approach provides full reassurance against them, nor can testing and validation ever give you full peace of mind. It’s a little bit scary, but if you pride yourself on an Agile mindset, you must embrace it.
Software is complex rather than complicated. Its many moving parts behave in unpredictable ways when unleashed on the world. Risks are a natural part of that property. Recognizing and embracing them with courage is the only way in a team that bears joint responsibility for fixing the consequences and doesn’t point fingers.
Leave a Reply