What’s Wrong with the Numbers? A Questioning Look at Probabilistic Risk Assessment

by Jack Crawford

[Editor Note: This Opinion piece was originally published in Volume 37 Issue 3 of the Journal of System Safety) in 4Q 2001. The article has been reformatted, but the text is unchanged.]

Probabilistic Risk Assessment (PRA), or Probabilistic Safety Assessment as it is called in the nuclear power industry, has been developed over the last 30 years as a discipline heavily influenced by the mathematical theory of probability. But while its mathematical methods are endlessly extended and refined in industry literature, how confident can we be that the output numbers mean what they claim to mean — i.e., probabilities of future events? I believe that the time has come to think about this basic issue.

I recently initiated a study of the foundations of PRA. In this article, I identify some key questions that needed to be asked about PRA’s credibility, and even come up with a few (albeit provisional) answers. The factors that prompted this study are:

The incredible magnitude of many probability numbers
The often overly optimistic assumption that an assessment encompasses all credible failures
The observation of gross discrepancies between predictions and outcomes
The difficulty in finding examples of accidents caused by genuinely random component failures
The narrow focus of PRA on measurable events, especially failure rates, and the fact that it can ignore accidents which are not caused by failures

During 17 years of involvement in risk and safety assessment in the weapon systems field in the U.K. and in Australia, I have been bombarded with numerical probabilities. Many of them have seemed incredible, or at best have ventured into the unknowable, with some of the powers of ten ascending into the high teens and even the twenties. The record in my experience was a probability of premature functioning of a mine fuzing system predicted to be 1 in 1044.

Between us, we could think of only one example of an accident caused by a combination of genuinely random events.

In another example, the design authority (DA) for a weapon system decided to include in it an electromechanical device which had an excellent record in another application. After pages of calculations to assess the effects of stresses in its new application, the DA predicted that its probability of mechanical failure would be 9.116 in 109 operating hours. The operating cycle time of the device was only 40 seconds at a likely rate of fewer than 10 cycles per battlefield day, so the predicted failure rate should have seen us through many times more use than the system would ever get in service. But in a system test that included four of the devices, we had two mechanical failures before they had accumulated one hour of operation. The failures happened in two different modes, neither of which had been considered in the analysis.

I have found it more difficult to find examples of accidents caused by what textbooks and safety standards describe as “random” failures. Three years ago, a dozen of us attended a meeting in the U.K. Ministry of Defence at which the contribution of random hardware failures to accidents was questioned. Between us, we could think of only one example of an accident caused by a combination of genuinely random events. Six years ago, the U.K. Health & Safety Executive published a booklet called Out of Control [Ref. 1], which contained 34 examples of control system failures. In the summary of causes at the end of the booklet, not one system failure is attributed to random hardware failure. If that kind of failure were indeed a major cause of accidents, we could surely expect it to turn up somewhere in 34 examples.

Readers may remember the disastrous first flight of the European Space Agency’s Ariane 5 rocket in June 1996, when it exploded 40 seconds after launch. According to Aviation Week [Ref. 2], the pre-launch estimate of the probability of a successful mission was 98.5%. The reality, as the report of the Board of Inquiry [Ref. 3] showed, was that the design ensured that the rocket would crash after 40 seconds. The real probability of success was zero, so the estimated probability was optimistic by a factor of infinity. This example illustrates:

There was a gross discrepancy between prediction and outcome
There was nothing random about any of the causes
The accident was not caused by component failures; the inquiry did not report that any component of the rocket system failed to behave as it was designed to behave throughout the short flight
The analysis did not consider the real causes of the accident, which in this case were errors of management.

After observing these and other examples, it seemed reasonable to look into the methodology of PRA. In the course of a few quick checks, my pocket calculator failed to find anything wrong with the mathematics of any of the assessments that were readily at hand, so my next step was to investigate the basis on which the mathematical structures were built.

For several years, I have been searching for a test of the theory that says we can draw probabilistic data on failure rates from past experience, and then synthesize a selection of the data in order to predict the failure rate of a new system. Safety and reliability literature does not help much because it generally goes no deeper than the mathematics upon which the theory is built.

My search has involved talking to many people in the U.K., including the Civil Aviation Authority, the Health & Safety Executive, and several leading engineering companies, as well as academic and engineering institutions. The only entity that attempted to test the theory was AEA Technology. They gave me a study [Ref. 4] that compared predicted and observed reliability figures for equipment used in nuclear power plants. It concluded that the correlation was reasonably good. That was useful, but the study seemed to have two shortcomings. One was that it looked at failure rates at the reliability level, rather than at the safety level, which (in the military field at least) are much harder to predict. The other was that it had been done as an afterthought, so it was not the properly designed and controlled experiment for which I had been looking.

Based on this data, I find myself being driven toward a conclusion that the scientific method may never have been applied to this particular theory — i.e, PRA. The apparent lack of science in this field threatens to become the most disappointing finding of this study. I hope to be proven wrong in this conclusion.

The Main Questions

Having observed that PRA might be questionable, it became necessary to decide what questions should be asked. There are four key questions: one practical, one theoretical, one philosophical and one contingency question that depends on the answers to the other three.

Question 1: To what extent does PRA encompass the main causes of accidents?

This is the key practical question. First, it is inevitable that any potential causes, modes and effects of failure which have not been foreseen will escape the attention of PRA. One of the effects of the ever-increasing complexity of systems is that we must expect to find some failure modes that we have failed to anticipate. We can and should do more thinking to reduce the number of missed tricks. But, when we have done our best, we still have no way of knowing whether we have thought of everything, as the example of the electromechanical device illustrated.

Second, PRA tends to lead us into a mindset which assumes that systems fail only if their critical components fail. It does not lead us to think enough about the class of accidents in which everything functions as designed.

Here are some examples:

Turner [Ref. 5] describes a collision on an unmanned railway level crossing. The drivers of the train and the road vehicle did nothing wrong, and there was no equipment failure.
Kletz, quoted by Leveson [Ref. 6], describes an accidental release from a computer-controlled chemical reactor. No human operator was involved. The automatic control system, in triggering the release, functioned as designed.
From my own experience, an antitank mine design was proposed, which in certain conditions would have killed soldiers laying the mines according to the correct drill.

A third gap in the coverage of PRA is caused by invalid, or invalidated, assumptions. The assumptions made in a safety assessment are not always made explicit and may later be forgotten. When an important assumption is invalidated by changed circumstances, and nobody knows that it was made or that anything depended on it, an accident will be waiting to happen as soon as certain conditions prevail. One of the findings of the subsequent inquiry is likely to be that in those conditions, the probability of the accident was one.

A major source of uncertainty is the way people respond to their perceptions of risk. For example, Adams [Ref. 7] produces evidence that the compulsory use of seat belts has not improved road safety. He shows how the reduced risk to people in vehicles has been balanced, through small changes in drivers’ behavior, by an increased risk to those who are not in vehicles. He also provides an example of such “risk compensation” being enshrined in law: in Germany, coaches fitted with seat belts are allowed to travel faster than those without. In civil aviation, there has been concern about the frequency of near misses between aircraft queuing to land at busy airports. Yet the U.K. National Air Traffic Services, observing that aircraft have become better at station-keeping, have decided to reduce the vertical interval between aircraft “stacked” while awaiting clearance to land. Even NATO is not immune. The announcement of a forthcoming workshop on insensitive munitions [Ref. 8] specified objectives which included both “reduction in collateral damage in the event of an accidental initiation” and “reduction in safety zone for storage and transportation.” The organizers seemed unaware that the latter benefit can be gained only at the expense of the former. In these ways, potentially effective measures to improve safety, for which quantified claims are commonly made, may in practice be consumed in return for some other benefit such as improved performance.

In many fields, the fact that an accident had not happened for a long time would be seen as indicating a low, and probably diminishing, risk. As the time since the last accident increases, that view will be reinforced by conventional statistical methods indicating that the probability of an accident is being reduced because the mean time between failures is increasing. The reality may be quite different. Many of us will have come across examples of accident-free periods leading to complacency and greatly increased risk. In the civil engineering field, Petroski [Ref. 9] identifies the “design climate” as a critical factor in catastrophic failures of bridges. His argument, based on examples, is that a period of successful use of a novel design can lead a designer to become overconfident and consequently to underdesign a new structure in the interest of economy or beauty. The bridge is then liable to fail if it is subjected to extreme conditions. In situations such as these, where risks change inversely as people’s perceptions of risk change, our attempts to pin down numerical probabilities of accidents are likely to be about as successful as trying to capture a will-o’-the-wisp.

Of all the sources of risk which PRA overlooks, management must be the most prolific. Many apparent technical failures have their roots in management weaknesses. Leveson [Ref. 10] points out that “unmeasurable factors (such as …. management errors) are ignored even though they may have greater influence on safety than those that are measurable.” As she was writing those words, the European Space Agency was committing the management errors that led to the Ariane Flight 501 debacle, while using measurable data to predict a high probability of success.

An important aspect of risk management is the quality of the culture in an organization. For example, the Piper Alpha inquiry found that “Senior management … adopted a superficial response when issues of safety were raised,” and the judge in the Herald of Free Enterprise case criticized the “disease of sloppiness” which had spread down from the top of the Townsend Thoresen company. In each case, the company’s safety culture had contributed much to the disaster.

All of those sources of risk are soft, or unmeasurable, factors. They affect the frequency and scale of accidents, but PRA does not encompass them. It focuses, rather, on the measurable causes, modes and effects of failure. With so limited a view of the scene, PRA must be expected to deliver optimistic results, contrary to what we normally aim to do in risk assessments. In terms of the “As Low As Reasonably Practicable” (ALARP) principle, the consequence is that PRA can neither demonstrate that a risk is as low as reasonably practicable, nor that it is tolerable.

Question 2: Can statistical inference take us forward from the past to the future?

This question addresses the theoretical basis of PRA, for which the apparent absence of any proper justification or test was noted above. The clearest argument I have found is one developed by Deming [Ref. 11], in which he explores the limits of statistical inference. He argues that the historical results that provide input data for predictions depend on the sets of conditions in which they were produced, and that those exact conditions are unrepeatable. Furthermore, as Feynman [Ref. 12] reminds us, we cannot assume that all of the conditions that contributed to a result were recorded or even noticed. In other words, the historical record is not a reliable guide to the future. Worse still, it can be hard to tell whether it is even a reliable guide to the past.

In statistical terms, Deming concludes that there is no mathematical method from which to extrapolate past results to future conditions, and consequently, no objective way of assigning a numerical probability that a prediction will be right or wrong. Prediction therefore means applying judgment and knowledge of the subject to the available data, rather than just manipulating numbers.

A further problem is that most statistical methods assume that component failures will be independent. In reality, dependent failures contribute to many accidents. The “fudge factors” sometimes introduced to allow for dependencies, such as cut-offs and beta factors, do at least move the numbers in the right direction. On the other hand, they are arbitrary and are no substitute for an understanding of the dependencies within a system and their potential consequences.

As an aid to predicting the behavior of systems, Deming [Ref. 13] advocates the concept of stability developed by Shewhart. “Stability” in this context means that the functions of the system display a stable range of variation. He argues that stability is a prerequisite for predictable behavior, and that in a manmade system, it is not a natural state — it has to be achieved and maintained. Systems are constantly threatened by destabilizing influences, so their stability must be monitored and, whenever necessary, restored. Hence, a system will remain stable and predictable only by virtue of people’s vigilance, knowledge and effort. It is not a question of probability.

Without stability, there is no basis for prediction, but I have yet to find a safety or reliability database which assures us that its estimates of component failure rates were derived from stable systems by stable methods of measurement. Some may have been so derived, but even then, when we take those types of components and build them into a new system, we leave the stability behind because we have changed the operating environment. A new state of stability will have to be achieved and maintained, and new data generated for monitoring and predicting behavior.

Collectively, those arguments seem to falsify the theory that we can rely on historical frequency data to take us across the boundary between the past and the future. To that conclusion many would reply that our contracts and our regulators nevertheless insist that we deliver predictions in the form of numerical probabilities. What, then, should we do? Many years ago, Tukey [Ref. 14] offered some relevant advice: “It is far easier to put out a figure than to accompany it with a wise and reasoned account of its liability to systematic and fluctuating errors. Yet if the figure is to serve as the basis of an important decision, the accompanying account may be more important than the figure itself.” That seems to indicate a reasonable way to go.

Question 3: How much force does the mathematical theory of probability add to a probability statement?

This is the key philosophical question. In looking for an answer, I have used ideas put forward by Toulmin [Ref. 15]. When we make a prediction, especially a safety prediction, we want as much precision as we can manage. Toulmin distinguishes between precision in the sense of definiteness and precision in the sense of exactness. So for example, if we judge that an event is extremely unlikely to happen, we are relying on definiteness. But if we estimate a probability that the event will happen twice in a thousand rocket launches, we are relying on exactness. This leads to further questions, such as how much do we gain when we are able to add exactness to definiteness? And what should we do if we find that we have one, but not the other? Those sorts of questions may seem ethereal to some people, but the study indicates that they actually matter when it comes to making decisions such as whether a system is safe enough to be accepted for service.

PRA uses mathematical probability in an attempt to deliver precise predictions. But Toulmin, from a logician’s standpoint, argues that “Little is altered by the introduction of mathematics into the discussion of the probability of future events” and that “The development of the mathematical theory of probability accordingly leaves the force of our probability-statements unchanged; its value is that it greatly refines the standards to be appealed to.”

If we accept the arguments of Deming and Shewhart, the refinement is spurious in the context of PRA. (Deming [Ref. 11] points to areas in which numerical probability does provide a valid guide to action, but they do not relate to PRA.) The spurious refinement of the numbers is starkly illustrated by the examples given earlier, in each of which, when the definiteness of the prediction proved to be a delusion, its exactness was exposed as ridiculous.

A relevant, if irreverent, statement of philosophy comes from Feynman [Ref. 16], who preferred engineering judgment to what he regarded as meaningless numerical probabilities: “If a guy tells me the probability of failure is 1 in 105, I know he’s full of crap.”

Question 4: If the numbers generated by PRA do not represent probabilities of future events, are they still useful? If so, for what?

This is the contingency question, and it clearly needs to be answered. My view is that the numbers are still useful. For one thing, factors that are measurable do contribute to risk, and PRA has been successful in helping us see how to reduce risks from those causes (it may even have contributed to the scarcity of accidents from “random” causes). For another, its inherent optimism tells us, when it indicates a risk that is too high, that improvements are definitely needed. Also, I found, when working as a safety regulator in the weapon systems field, that I can learn much from the numbers by digging for answers to the questions they raise.

Conclusion

The study remains incomplete, partly because of the difficulty in finding a justification for PRA. If anyone can find or construct one, it would be very welcome. Meanwhile, the provisional conclusions to be drawn are:

a) The numbers delivered by PRA do not represent the probabilities of future events because:

The PRA methodology, by focusing on measurable factors, ignores some of the most significant sources of risk.
The theory that it is justifiable to extrapolate historical data, in order to assign a numerical probability to a future event, is false.

b) If PRA is used on its own to support an ALARP or any other safety case, it is likely to be misleading. To be complete and credible, the case should provide:

Qualitative data and arguments on the issues not covered by PRA
A reasoned account of the liability to cause an error in each quantified prediction

c) Quantitative probability statements have no more force than qualitative probability statements. At best, they may be more refined, but only if the numbers can be shown to be credible.

d) Our quest for reliable predictions would be better served by paying more attention to the stability of the systems from which we draw data, and to the stability of those whose behavior we need to predict.

Should PRA be scrapped? My answer is no, for the reasons given in the answer to Question 4. It remains an invaluable tool for focusing our minds on issues related to measurable factors. We do not need to believe that the numbers are probabilities in order to use them for purposes such as comparison of design options, sensitivity checks and the improvement of designs. It is only the “P” of PRA that ought to be abandoned if nobody can justify it.

By now, it is clear that there is a Question 5 to be answered: “What would be a better way, and what place should (P)RA have in it?” The investigation continues.

From Our Readers

Next up: Read the response to this article!

References

Health & Safety Executive. Out of Control. HSE Books, Sudbury, Suffolk, U.K., 1995.
Aviation Week & Space Technology. p. 33, July 29, 1996.
Report by the Inquiry Board. Ariane 5 Flight 501 Failure, Paris, July 19, 1996.
Snaith, E. R. “The Correlation between the Predicted and the Observed Reliabilities of Components, Equipment and Systems.” U.K. Atomic Energy Authority National Centre of Systems Reliability, Culcheth, U.K., 1981.
Turner, Barry A. Man-Made Disasters, Wykeham Publications, London, 1978.
Leveson, Nancy G. Safeware, p. 165. Addison-Wesley Publishing Company, Reading, Massachusetts, 1995.
Adams, John. Risk, Chapter 7. UCL Press, London, 1995.
NIMIC Newsletter, NATO Insensitive Munitions Information Center, Brussels, 1st Quarter 2000.
Petroski, Henry. Design Paradigms — Case Histories of Error and Judgment in Engineering, Cambridge University Press, 1994.
Leveson, Nancy G. Safeware, p. 59, Addison-Wesley Publishing Company, Reading, Massachusetts, 1995.
Deming, W. Edwards. “On Probability as a Basis for Action,” The American Statistician, Vol. 29, No. 4, pp. 146- 152, 1975.
Feynman, Richard P. The Meaning of It All, Addison-Wesley Longman Inc., 1998.
Deming, W. Edwards. The New Economics for Industry, Government, Education, Massachusetts Institute of Technology, 1993.
Tukey, John W. The American Statistician, Vol. 3, p. 9, 1949.
Toulmin, S.E. The Uses of Argument, paperback edition, Chapter 2, Cambridge University Press, 1993.
Feynman, Richard P. What Do You Care What Other People Think?, paperback edition, p. 216, HarperCollins, London, 1993.

Acknowledgments

The author acknowledges, with thanks, the constructive comments provided by Professors David Kerridge and Henry Neave, and by Felix Redmill, editor of Safety Systems, in which an earlier version of this article was published.

About the Author

Colonel Jack Crawford spent most of his working life in the British Army, having been commissioned into the Corps of Royal Engineers in 1949. He has served, among other places, in Korea, Norway, Germany, the Pacific and Australia. He became seriously interested in risk and safety assessment during his appointment to the Ordnance Board of the U.K. Ministry of Defence in 1978. After serving on the Ordnance Board, he became a member of its counterpart, the Australian Ordnance Council, during an interesting period of major Royal Australian Navy and Royal Australian Air Force re-equipment programs. Since leaving the Army, he has continued to work in the safety field, mostly for the Ministry of Defence, and is currently working on improvements in the methods used for safety assessment.

3 thoughts on “What’s Wrong with the Numbers? A Questioning Look at Probabilistic Risk Assessment”

From Our Readers – Blog of System Safety says:

October 15, 2022 at 3:08 am

[…] following is a response to Jack Crawford’s article entitled “Opinion: What’s Wrong with the Numbers? A Questioning Look at Probabilistic Risk Assessment.” The article appeared in the third quarter 2001 issue of […]

From Our Readers – Blog of System Safety says:

November 23, 2022 at 10:01 pm

[…] What’s Wrong with the Numbers? A Questioning Look at Probabilistic Risk Assessment Read the original article here first […]

Can Fault Tree Analysis Be Used Effectively in Accident Investigation? – Blog of System Safety says:

February 19, 2023 at 3:48 am

[…] What’s Wrong with the Numbers? A Questioning Look at Probabilistic Risk Assessment A Perspective On System Safety […]

What’s Wrong with the Numbers? A Questioning Look at Probabilistic Risk Assessment

Byadmin

The Main Questions

Question 1: To what extent does PRA encompass the main causes of accidents?

Question 2: Can statistical inference take us forward from the past to the future?

Question 3: How much force does the mathematical theory of probability add to a probability statement?

Question 4: If the numbers generated by PRA do not represent probabilities of future events, are they still useful? If so, for what?

Conclusion

References

Acknowledgments

About the Author

By admin

Related Post

Can Fault Tree Analysis Be Used Effectively in Accident Investigation?

Are We Talking to Ourselves?

Foray into the Unknown – the Forbidden Science of Plain English

3 thoughts on “What’s Wrong with the Numbers? A Questioning Look at Probabilistic Risk Assessment”

Leave a Reply Cancel reply

Safety Blogroll

You May Have Missed

Can Fault Tree Analysis Be Used Effectively in Accident Investigation?

Are We Talking to Ourselves?

Foray into the Unknown – the Forbidden Science of Plain English

Dear Editor – System Safety Career Path