Sunday, September 11, 2005

Byzantine failures in safety-critical systems

Peter Ladkin has a sober post in RISKS DIGEST with very scary implications.
In this era of fly-by-wire, I am fond of saying that, as far as I know, there has never been a commercial aircraft accident caused by anomalies in flight control software. And it has been 17 years (the first A320 was introduced into service in 1988).

It is thus well to remember that designing and writing critical software-based systems for such applications is not a routine task that we now know how to perform. In fact, there are plenty of anomalies that crop up that the public doesn't hear about. Here is one that made it out, and a pointer to another...

There are various conclusions one can draw:

* The kinds of numbers used in Fault Tree Analysis for random hardware failures in software-based systems give no good indication of the rate of systematic failures (due to design or to errors in software) which can be expected.

* Fault-handling models are crucial parts of the architecture and their assumptions are critical. (This is made clear by the incidents discussed...)

* That there have been no accidents does not mean that there are no occurrences of substantial problems with potentially catastrophic consequences with software-based critical avionics.

Labels: ,


Post a Comment