Control. Controls. You hear those words a lot, in our line of work. Often you hear them in the phrases ‘more controls’ and ‘lack of control’. You rarely hear people talk about ‘too many controls’ or ‘excessive controls’. And I once had ‘Head of Controls’ as a job title. It is a word that people associate with strength. They are wrong. Frequent use of the word ‘control’ should be considered a sign of weakness. The number of controls implemented in a business should be treated as an indicator of how likely a business will fail to manage its risks. Let me explain why.
But before I do, let me briefly explain why I feel compelled to cover this topic. Friend and fellow blogger Moinak Banerjee recently asked us all a very pertinent question: “are you really prepared?” As he put it:
You have your risk registers; you evaluate the risks; you add more risks and associated controls; you assess the IMPACTS and LIKELIHOODS of these risks; you test the controls for their effectiveness; you report and follow up and reassess; BUT… if disaster strikes, ARE YOU PREPARED?
It is a great question. The way he puts the question reveals a profound insight, and not one that many people feel comfortable with. For all the nicely-presented heatmaps, long and detailed risk registers, convoluted schemas to ‘measure’ risk and all the reams and reams of (digital, weightless, ethereal) paperwork that people push around, how far have they actually prepared their business to deal with the most severe risks? In telcos, the sad answer to that question is: ‘not much’. And if your aspiration is to deal with minor risks, but not the major risks, what does that say about you, and your ambition? For obvious reasons, there is no point employing someone to deal with minor risks, if nobody is managing the major risks.
Moinak was inspired by this article from Jim DeLoach, a managing director with specialist risk consultants Protiviti. DeLoach focused on a key area for risk management, the so-called ‘high impact, low probability’ risks. And when I write ‘high impact’ and ‘low probability’, I do not mean risks where the impact is 5 on a scale from 1 to 5, and the probability is 1 on a scale from 1 to 5. People who think in terms of 1 to 5 scales have already adopted a corrupted worldview. They are doomed to fail, even before they start to do their work. A one-in-a-thousand-years possibility of the death of hundreds of people is not like a one-in-five-years possibility of a 10% fall in revenues. But thanks to the endemic overuse of simplistic 1-to-5 scales, they tend to be lumped together. Tough luck for you, if you happen to be one of the people who died – perhaps we should ask the grieving family if they felt there was a fair balance struck between human safety and corporate profits. So even during a process meant to generate clear priorities, many risk managers have destroyed any hope of preparing their business for the worst kinds of crisis.
At this point, many telecoms risk managers switch off from an analysis like this, because they conclude that telcos never face the kind of severe risks that I am talking about. Happy days for them! They should be fired immediately, on the grounds that telcos are not very risky, so they are not needed. But before they are fired, I would very much like to build a base station above their bedroom, and use it to constantly beam data into the phones I would have strapped to both their ears. Though unethical, this would be a good way to test if mobile phone use increases the chance of cancer. It is easy to be complacent about safety when you are not the one feeling at risk. Also, I use the example of safety because it is obvious. There are many other severe risks that telcos face but may not like to deal with. For example, just consider the reputation damage being caused to a lot of American businesses right now, because they are believed to help the US government to spy on innocents. Even if the accusation is false, and the risk is not susceptible to objective measurement, the potential dangers are enormous, though the small-minded might adopt a strategy of ignoring the risk, in the hope it will go away.
In the real world, with no need for speculation or theorizing, there is lots of data to show that human beings suffer from significant and predictable cognitive deficits when it comes to the high impact, low probability risks. Put simply, people deal with very low probability risks the same way that they deal with zero probability risks: they ignore them, preferring to think about something else. This is a mistake. And having ignored the very low probability risks, they try to keep themselves busy and useful, by dealing with much higher probability risks with much lower impact. This behaviour compounds the original mistake. Whilst people may be deluded that lots of useful risk management has gone on, the organization is left suffering from crucial blind spots in its risk awareness, located in the very areas that will hurt most if things go wrong. Instead of doing something about those blind spots, the risk manager has become part of the problem, because he wants to please his boss, showing he is making rapid progress with trivial risks instead of struggling with the really big risks. So the risk manager becomes corrupted too. His mission transforms, and he encourages the unhelpful belief that no action is needed to prepare for the worst risks, and he reinforces the legitimacy of assessments that wrongly conclude that the probability of a risk is too low to be worth bothering with.
But when I say that even very clever people suffer from this cognitive deficit, I inevitably find that I face a lot of resistance, not least from clever people. I say there is relevant data. Clever people may initially scoff, until you point out the data. So let me remind you of how very very clever people – people as clever as you, dear reader – have made terrible miscalculations about high impact, low probability risks. The Basel II Accord (written by extraordinarily clever, highly educated and very well-paid people) failed to prevent a global meltdown of the financial sector… even though that was exactly the sort of thing it was meant to prevent (and hence why now have Basel III). The exceedingly clever people who work for NASA continued to believe that the probability or a catastrophic failure of a Space Shuttle would be 1 in 100,000… even though a disaster occurred on the 25th Shuttle launch. It took Nobel Laureate Richard Feynman (a very very very clever man) to point out what should have been painfully obvious – that NASA’s maths was absurd:
…if the probability of failure was as low as 1 in 100,000 it would take an inordinate number of tests to determine it (you would get nothing but a string of perfect flights from which no precise figure, other than that the probability is likely less than the number of such flights in the string so far). But, if the real probability is not so small, flights would show troubles, near failures, and possible actual failures with a reasonable number of trials. And standard statistical methods could give a reasonable estimate. In fact, previous NASA experience had shown, on occasion, just such difficulties, near accidents, and accidents, all giving warning that the probability of flight failure was not so very small. The inconsistency of the argument not to determine reliability through historical experience, as the range safety officer did, is that NASA also appeals to history, beginning “Historically this high degree of mission success…”
…the management of NASA exaggerates the reliability of its product, to the point of fantasy.
The same human cognitive deficits explain why the Fukushima nuclear reactor disaster was vulnerable to a wave with a calculated probability of occurring only once every 10,000 years… and hence why skilled engineers failed to implement simple safety measures that would have protected the reactor from even that kind of wave. And as even the stupidest cinema-goer knows, very clever people thought the Titanic was ‘unsinkable’. So I put it to you that many generations of very clever people have honestly believed they have conquered risk, only to be surprised to discover they were no cleverer than the last bunch of people who mistakenly believed they conquered risk.
Thankfully, human safety experts have learned a lot about managing the kinds of high impact, low probability risks that most concern most people (on the safe assumption that most of us want to keep on living for as long as possible). We can learn a lot from how they approach risk. However, hardly anyone working outside of safety management does try to learn from them. In particular, whilst risk is a two-dimensional plot of probability and impact, we should examine why the best safety experts rarely want to talk about probability. Instead of wasting time calculating if, based on current probabilities, you might be accidentally killed in the year 3365CE, or in the year 3366CE, they are just focusing on making sure that you could never be accidentally killed. As in Feynman’s example, if you want to keep certain eventualities to a very low probability, you end up recognizing that you will soon reach a point where the real probability is lower than any probability you can reliably calculate. The best safety managers do not fret about this epistemological horizon. They keep looking for ways to reduce risk, even as they stride beyond the point when they can usefully calculate probabilities. Instead, they pursue the impossible goal of zero probability like the Man of La Mancha pursued the impossible dream. Be very thankful that safety professionals have learned those lessons, and spare a thought for their hard work, next time you book a ticket on an airplane, or are inconvenienced by an evacuation drill.
But even the best experts are not perfect, and that includes highly-trained engineers and safety experts. Human beings do not know everything. Some things cannot be predicted. This is an irreducible problem for risk management. Hence, serious risks should be avoided wherever imaginable, not just where calculable. Sadly, nobody predicted that the first passenger jet aircraft would catastrophically fail due to metal fatigue. That was because they only discovered that there was such a thing as metal fatigue as a result of investigating why the first passenger jets kept exploding in mid-air. However, it was known at that time that circular windows are easier to fit tightly, and were better at bearing stresses than square windows – that experience came from the much older craft of shipbuilding. So there is an irony that the mitigation of the problem of metal fatigue on those jet planes involved replacing failure-prone square windows with safer, circular windows.
And if my arguments about the first passenger jets do not convince, consider what is currently happening to Boeing’s business, and reputation. Plane after plane has been grounded, after a string of fires. Why? Because of smaller and more powerful batteries used to power an ever-increasing number of cabin gadgets. The business logic is straightforward: people wants things to play with in the cabin, so you give them what they want. But then, people do not want their plane to catch on fire, either. The risk maths should also be straightforward, but even for a business with the resources and experiences of Boeing, mistakes can be made.
And how does telecom risk management compare, to the kinds of risk management conducted by banks, or airplane manufacturers, or by NASA? They compare in the same way that 2001: A Space Odyssey showed a comparison in how different people use technology. In one scene, an apeman wields a bone, using it club an animal to death. In delight, he throws it into the air. Cut to a picture of an orbiting space station…
Some of you (especially those of you who believe in the power of the TM Forum’s standards) may be unsure which of these two scenarios – bashing things with bones, or cruising amongst the stars – is the closer approximation to the current state of telecoms risk management. Allow me to observe that none of us live on a spaceship (though some people act like they come from outer space).
So where are people going wrong? Why are our businesses so poorly prepared? DeLoach sums up the root of our problem, when he writes:
Fires cannot be fought with a committee
How do telco risk managers try to manage everything? With a committee. They need committees, because they feel too weak to argue without the support of a committee. Every kind of risk is hence mangled through the same broken formula:
- Ask a committee about risks the company faces
- Write down what the committee says about the risks the company faces
- Ask a committee about what the company should do about the risks it faces
- Write down what the committee says about what the company should do about the risks it faces
- Do the things on the list of things to do, unless they are politically inconvenient, in which case repeat the previous steps until the risk or the action has been so deprioritized that doing nothing is no longer considered a cause of embarrassment
You can substitute some alternative words for ‘committee’, but they will be names for collections of people with arbitrary prejudices and limited interest in managing the organization’s risks. Imagine if NASA built spaceships according to a whims of a committee with no knowledge or training in risk management? How would you feel about flying on a new Boeing jet, if they worked like that? The reason you would fear for your life is that such general-purpose and poorly-motivated committees never make tough decisions, especially in cases where one committee member will fail to earn their bonus as a consequence. So when you need someone to stand up and really make a tough decision, or to make a genuinely expert call on the degree of risk that is faced, one of three things will happen:
- The CEO will step up, and make the call
- The Board will step up, and make the call
- Everybody pretends they did not hear the question, until the risk manager goes away and comes up with a fudged non-answer
The problem with this scenario should be plain. Unless the organization’s leading risk manager has the balls and brains to drive the risk agenda, and receives some thanks and recognition from the Board (or possibly the CEO, though CEOs always suffer conflicting pressures), and has the courage to override the unscientific, biased and error-prone nonsense that inexpert and committees will come out with… then he is buggered. Risk management does not occur thanks to lucky chance. It requires a diamond-hard methodology based on concrete data, and the iron will of a man who will force his business to follow that methodology, no matter where it leads.
Bias is endemic in human decision-making and it leads to flawed calculation of risk. Committees cannot be relied upon to eliminate bias. On the contrary, many committees thrive on bias, as everyone defends their own personal interests and none have their goals aligned to a genuine reduction of risk. So instead of mitigating bias in decision-making, many risk managers are in danger of becoming biased too, by promoting and managing a weak process which favours compromise over the proper assessment and evaluation of risk.
So far, so long, and I have hardly mentioned controls. I can think of something else that hardly mentions controls. ISO31000, is a quite good ERM standard in a world of depressingly mediocre risk management. ISO31000 hardly uses the word ‘control’. (Trust me, I have read it, unlike some of the people who like to give advice on how to manage risk!)
Why is the word ‘control’ rarely used by the ISO31000 standard? To my mind, the answer is straightforward. The standard correctly describes all the possible ways to respond to a risk. One option is to do nothing about the risk. Another is to stop doing the things that create the risk in the first place. You can share the risk, by taking out insurance or other means. You can change the probability of the risk, or you can change the magnitude of the impact, when it does occur. Where does the word ‘control’ fit, in this expansive schema of possible risk responses? At best, a ‘control’ is a term for ongoing activities that may reduce the probability or impact of specific risks. Hence, controls are a strictly limited subset of just one kind of response to risk. There are many other things that the organization might do, to change the probability or impact of a risk, even though they cannot be called controls. And there are other possible responses to risk which do not involve changing probability or impact. So why is it that people talk about ‘controls’, as if controls are the only possible response to risk?
At this point, I must apologize to Moinak, because I will illustrate my point by quoting from his post. I hope he can see that my intentions are good. There is a lot of bad, corrupting advice about how to do risk management. People dredge up nonsense that I first heard 20 years ago, when I started working as a risk consultant. They repeat it like they are reading from a religious text, that can never be questioned, but without saying which text they are quoting (this is always shrouded in mystery). They repeat the parts they like (‘controls, controls, and more controls!!!’), and ignore the parts they do not like (which is why many risk managers never get advice on how to pursue the other kinds of risk response). And they completely ignore all the work that has taken place in the last 20 years, often questioning and undermining the nonsense repeated by people like them. For example, a series of thoughtful papers have blown apart the mythological absurdity of attempting to calculate an ‘inherent’ risk, and yet there are many who insist on talking as if nobody has ever criticized the concept. And I told some of them to stop doing it (grrr… TMF… my blood pressure is rising…) but they never listen to me. And I can hardly blame them for not listening to me, because listening to me might lead to
lower sales of cVidya software. I mean, it might lead to… yeah, it is unclear why they refuse to listen to me. Apart from the risk of lower sales. Or the risk that people need to learn more, and improve, at their job. (But why do people treat learning as a downside risk? Do we tell our children that they should behave like they know everything, and do not need to learn anything new? If children need to keep learning, then why must adults stop learning?)
I find that my critics (and there are many of them!) never really bother to answer me. Mostly they pretend not to hear me, even when I shout. And because I shout, that becomes an excuse to ignore me even more. Harrumph. But I digress. So, to return to the point, I blame this festering environment of bad advice and hostility to learning, when I notice that Moinak writes:
…you add more risks and associated controls; you assess the IMPACTS and LIKELIHOODS of these risks; you test the controls for their effectiveness…
Moinak mentions controls, but he does not mention the other ways to respond to risk. Just like the 1-to-5 risk scale, this is another example of how the process of managing risk can be corrupted, before it is even begun. The practitioner is doomed by their assumptions, even before they begin their real work. But I ask for Moinak’s forgiveness in pointing this out. As he said himself, he is new to the field, and I believe he is only repeating what has become ‘true’ by repetition – the myth that risks are only countered by controls – just like it was ‘true’ that NASA had a 1-in-100,000 failure rate, because they had repeated it so often.
To return to the question, let me theorize why people talk so much about controls, and so little about all the other ways to respond to risk. When somebody talks about controls, it is because they lack control. Controls are desired by those who lack power, and feel their lack of power. Controls are a substitute for real influence over decision-making. Real influence might manifest itself in lots of ways. For example, a project may be stopped because the reputation risks were correctly assessed to outweigh the financial benefits. Or a new commercial relationship may be entered into, because the risk manager brought in an outside firm who are more adept at handling specific kinds of risks that lay outside of his organization’s core expertise. Or management may sit down, and plan for how they would deal with various types of crisis. None of these risk responses can be called ‘controls’. They are valid risk responses – but many risk managers are nowhere near to having the right level of influence to make them happen. Instead, they only aspire to do something minor, internal, and after the important decisions have been taken, because they lack the influence to do anything else. Meanwhile, preparing for the really big risks – especially those which are high impact and low probability – usually requires making some important decisions today, instead of putting them off indefinitely.
In fact, so many decisions have such significant risk dimensions, and businesses can be so incompetent at making them, that the risk management decision may be taken out of the business’ hands. Consider all the demands for more regulation of banks – a theme which has become very popular in some countries. In such cases, a business that failed to adopt an adequate risk culture may end up being forced to adopt a compliance culture. Did they comply with rules to protect the health of employees? Did they safely dispose of fuel? Are the accounts compliant with IFRS123? What about SOX? And so on… it never even occurs to people that governments are encouraged to implement rules when businesses fail to manage their own risks. So one cost of poor risk management is the compliance cost associated with inefficient and excessive legal and regulatory burdens.
An irony here is that even governments can sometimes hint that they understand their limits. Gently, gradually, governments are pushing the agenda that all businesses need to have better risk management. They avoid being too specific about how to do risk management, because the governments do not really know how to do good risk management (consider the mistake made by the US government, when allowing Ed Snowden to gain access to its data). But they know, vaguely, that businesses need a push to take risk more seriously, in order to stem the torrent of business failures and scandals. But because the push is broad and general, and does not involve a lot of excessive prescription, how much of it filters down to the average numbskull, who merrily uses the words ‘risk management’ without knowing what they mean? Well, none of it filters down to that level. So some of them end up innovating new ways for telcos to manage risk, just like an innovative battery manufacturer can help Boeing to reduce the ‘risk’ of bored passengers, or an O-ring manufacturer can eliminate the risk of critical failure at low temperatures by simply not testing if their product will fail at low temperatures. This is the kind of ‘help’ that businesses, governments, and the entire human race is better off without.
The typical telecoms risk manager finds themselves slotted into this overwhelming architecture, which is driven by governments, board members and CEOs. They also need to accept the wisdom of many risk specialists who will evaluate specific risks more confidently than any generalist could – whilst having the skill to separate the real specialists from the charlatans. Faced with all these impressive people, many risk managers are underprepared and undertrained. But instead of focusing on training and strengthening the person, we wrongly focus on developing the tools around him – and that leads to an overoptimistic belief that controls are the solution to every problem. It is natural for untrained and ill-loved risk manager to grasp for any tool he might wage in battle. Faced with such overwhelming odds, and having so few allies, the average risk manager can feel like a dwarf waging an endless war against an encircling army of Goliaths. Important decision-makers do not consult the risk manager. They do not even remember his name. He cannot stop risky decisions being made. He only finds out afterwards. He is not encouraged to think of clever ways to share risk. He might be allowed to take out common forms of insurance, but nobody trusts him to negotiate anything more sophisticated. Of all the ways there are to change probability and impact, he is only allowed to suggest the most basic and reactive types of response, that burden workers who are well down the organizational hierarchy, and so less likely to push back. And he is too scared to ever suggest the correct response is to do nothing, even when that would be the right thing to do. If he did that, it might become obvious that he adds no significant value. Hence, he would rather implement a control where the cost outweighs the benefits, then be seen to do nothing.
This is why controls have become synonymous with risk management. The psychologist Abraham Maslow said:
If you only have a hammer, you tend to see every problem as a nail.
In telecoms, many risk managers only have one hammer, and they use it to hammer every risk they come near to. Nobody showed them any tools other than the hammer, and nobody wants them to use any tools that might lead them to influence real and important decisions. So the poor risk manager settles on hammering everything that comes into reach, by trying to turn every risk into an opportunity for more controls. And over time, the other managers allow less to come into the risk manager’s reach, and keep things secret from him, for fear that their budgets will be hammered too. And the poor risk manager shrugs as he realizes that everybody else considers him to be a bureaucratic deadweight, who should be avoided. And because he only knows about hammers, and only talks about hammers, he never gets the money to buy a copy of the world’s leading risk management standard, which makes it very clear that there are many tools he can learn about, and not just hammers. In such circumstances, I can sympathize with why risk managers accept a fate of deploying more and more controls. If they lack support to do anything else, it is tempting to retreat to the safe territory of implementing controls, which are treated by most senior managements as an annoying additional burden that will never stop them from making any of the decisions that really matter to them.
To reverse Maslow’s quote, not every problem is a nail. Some demand a different response. The risk manager also has a duty, as well as a challenge. It is his duty to deal with risks, even if there is no control that is adequate to the task. Those pesky high impact low probability risks tend to provide good examples of why controls cannot address every risk. Training staff to evacuate a building may be considered a ‘control’, though it is not a very natural use of the term. Using fire retardant materials in the construction of a building is clearly not a control, but it will lower risks, and may save lives. A backup cooling system run from an independent energy source may have prevented the worst effects of Fukushima, but if you decide to get all your electricity from burning gas or solar cells, then you have no need for this additional control. There is no need to adopt unrealistic expectations for a space vehicle, and hence put lives at risk, if the organization is willing to lose some of its budget to a rival organization, that would have implemented unmanned launch vehicles. And maybe some of those passengers on the Titanic would have been better off staying at home. As the risk management options open up, so the need for controls diminishes, because risk will have been reduced long before the opportunity for designing and implementing controls arises.
To answer Moinak’s original question: of course we are not prepared. To be prepared, we must think about risks when we are making decisions, not afterwards. Trying to use controls to hammer all risks into manageable shapes is another corruption of the purpose of risk management. If we think we can achieve our goals by solely using controls, there will always be many risks that we will not adequately prepare for. And painful as it is, the only way to prepare businesses to handle the full spectrum of risks is for risk managers to take a lead in assessing all the risks, and proposing all the options for how to respond. As impossible as it might seem, as difficult and painful as it might prove to be, every good, honest and moral risk manager must be like the Man of La Mancha, and reach for the unreachable star. And if we keep doing that for long enough, we might yet evolve from apemen to astronauts…
One of the problems is that RA specifically is seen as a control function – not a risk managing function. There are various risks which we are not trained to identify, not trained to deal with and certainly not mandated to deal with. Let me explain – I come to work as an an RA analyst with an IT/programming background and slowly acquire a high level understanding of telecoms in order to perform periodic dictated activities (or controls). This however, is actually the infant stage of RA and risk management – correct me if I am wrong.
In the grand scheme of things risk is everywhere (for telcos as well as others). The major risks with low proability shuold be taken off the table with a comprehensive plan to evaluate and deal with the risk. However, there are also industry risks – i.e. Google is a bigger threat to telco’s at the moment than anything else with the danger of reducing them all to simple mobile ISPs. Many are trying to evolve to provide “digital services” as well….
Specifically RA is not seen as part of risk management because of the process required to get to the final data/conclusions. It is a different matter to process the information and create a report with valuable information and another to analyse it. And usually most people – even top management – do not seem to get this.
I completely understand your rant, however I do not see things changing, since RA, risk and information reporting is still convoluted for many even within the industry. The priorities are selling and market share.
This risk management blog of yours is some kind of record length for talkRA: 8,000 words! And yet there are many fine insights in the piece, so I need to reread this in more detail when I have some time.
I love the idea of testing the safety of wireless signals by strapping a couple of cell phones to the head. I think it was Nassim Taleb who told a similar risk management story in ancient Rome. Whenever a Roman engineer designed a stone bridge, he and his family were forced to have their home under that very bridge.
Both examples point to at least a partial answer to the RM problem: accountability, having skin in the game, eating your own dog food, and paying the consequences of failure. And yet we live in a world where “brilliant” top managers are given “golden parachutes” no matter how much global financial havoc they cause.
Paperwork drills and dashboard alerts are merely sensory extensions of an experienced expert’s brain. And I don’t think we should arbitrarily limit the number of controls a person has at his disposal. In fact, people of greatest experience and wisdom are entitled as many controls as they want. The trouble comes when the controls exceed the domain knowledge and capacity of the people using them.
In ancient Japan, they revered the huge shimenawa (straw rope) as a sacred thing and draped them at the entrance to shrines. Likewise, a skilled business expert is known for her ability to weave the many threads of people, process, and technology together.
Still, you rightly point out the key danger: professional development in a domain must not take a backseat to glittering — but essentially dumb — technical gadgets.