Last Updated: April 04, 2002



ELIMINATION OF RISK IN SYSTEMS

Practical Principles for
Eliminating and Reducing Risk
in Complex systems

A new book by James Bradley, Ph.D.

This excerpt is just the first six introductory pages
of Chapter 8, which has 26 pages.


CHAPTER 8


Risk Elimination using Monitoring Procedures



ONCE MORE we remind the reader that the throughput capacity of a system in a risky environment is the sum of (a) the basic component KR due to risk-free deployment of the system resources R, (b) the gain component G due to reward for running risk when the hazards risked do not occur, and (c) the loss or risk component L that is the average loss due to hazards actually occurring on occasion. The mean or expected throughput capacity I is expressed concisely by the risk equation formulations:

              I = KR + G - L = KR + (c -1)L
or:         I = R[K + (c - 1)r(E)] = R[K + br(E)]
               = RK + Rcr(E) - Rr(E)

where I is the mean throughput capacity, K is a constant, R is the resources employed, r(E) is the risk per unit R due to the environment E in which the system operates, and c is a constant called the risk efficiency coefficient, which is G/L, or the amount of hazard-free gain per unit of average loss due to hazards occurring.

In the previous two chapters, we showed how it is possible to improve the throughput capacity even further, by reducing, or eliminating, the mean losses L, or Rr(E), either by adding preventive resources, or by carrying out a precautionary procedure.

There is a remaining important method of achieving the same result, which we look at in this chapter. It is elimination of risk, and thus elimination of mean losses L, by means of application of a system-supported monitoring procedure as part of an overall procedure for coping with the risky environment.

Concepts of risk elimination using system-supported monitoring procedures

Sometimes it is not known in advance where or when the risk of a hazard will be present, that is, we do not know the regions or periods in which the hazard could occur, as opposed to those regions or periods where it cannot occur.

Preventive resources cannot be used, for we do not know where to apply them. An example could be a 1,000-mile stretch of railroad track in a valley where a landslide is capable of occurring anywhere along the track. Similarly, precautionary procedures cannot be used, for likewise we do not know when to apply them. The classic naval example is a long sea lane, where a torpedo attack could occur anywhere, in one stretch one day, and in another stretch another day, and so on.

In all such cases, it is simply known that there will be some periods or regions, variable from one time period to another, where the hazard can occur, that is, where the system is exposed to risk, and that there will be other time periods and regions where the hazard cannot occur and there is no risk.

It is the function of a real time risk monitoring and detection procedure, which is a component of a risk-environment coping procedure, to detect the periods or regions where the risk is present, and so generate an alert that triggers a response procedure, also part of the risk-environment coping procedure, to take immediate action to eliminate the risk.

As an example of this principle in action, consider an important satellite in an orbit, where there is a chance of a piece of orbital debris being encountered, too large for any protective shield (preventive resources P) to be any good, but too small to be tracked from Earth, which would enable its orbit is known. Suppose, to eliminate this risk, the satellite is equipped with a monitoring radar apparatus that can detect such a piece of debris at least five minutes before a collision. Detecting the debris is clearly not enough, if it is on a highly probable (but not dead certain) collision course. Action is also needed. In this case the action could be operation of propellant jets to shift the satellite to a slightly different orbit, and so eliminate the chance of a collision, that is, eliminate the risk.

Clearly, we have two components, the risk detection component, and the resulting action component. In some cases, there may be very little doubt that the hazard will occur, if action is not taken, as with the satellite example above. But this near certainly is usually not the case. In practice, with the satellite example, the detector might detect only the high risk of a collision within five minutes, but not the certainty of it.

The following example should further clarify this last point, which is crucial for understanding the monitoring approach to risk elimination. Suppose we have a wartime situation, where there is surface ship that might be attacked by a submarine, and the ship is using sonar to detect any submarine in its vicinity. We can assume that, when a submarine is detected, the ship has the means to eliminate the threat. But the ship will likely never be able to detect the submarine for sure, for the sonar may be responding to a whale, or a submerged wreck, or a school of fish, or a clump of seaweed, or driftwood, and so on. Thus the detection system in practice will be able to detect only the risk of a hazard, not the certainty of it.

Another important example is in protecting computers against hostile computer-software agents such as viruses. The virus-detection software, or virus-monitoring system, checks incoming files for suspicious bit patterns that could indicate a virus. If such a pattern is detected, the monitoring system's response will be to execute the coping procedure, which will likely involve deleting the suspicious file. However, the virus-detection software can not usually determine for certain that the incoming file is hostile, only that it might well be. Dud signals are always possible.

The human immune system works in a similar fashion. The detection system is always operating, and if it detects foreign biological material in the body, it will quickly respond by producing large numbers of antibodies that attach to and mark the invaders. The marked invaders can then be easily targeted by killer cells and destroyed. However, once again, the detection system cannot always determine for sure that the foreign material is a threat, only that it might be.

Note that it is important to grasp that when a monitoring procedure is used, unlike the case of using precautionary procedures, the agent operates the system as if no risk were present, without knowledge of when or where the risk will appear, and most of the time without any significant slowdown of the system. The agent thus operates the system normally, as if no risk is present, confident that the continuously-operating environment monitoring procedure will detect risk in time to take short-term action to avoid or reduce loss of system throughput capacity.

An example of this is the use of fire alarms in buildings. Occupants normally function as if no fire risk is present, but immediately a fire alarm goes off, for whatever reason, indicating a risk, but not the certainty, of fire, the normal response procedure is to evacuate the building. In a wartime situation, where a biological weapons hazard is a possibility, a biological-agent detector would operate just like a fire alarm, this time signaling the possible presence of harmful biological agents, enabling persons in the region affected to respond by donning appropriate face masks, sealing a room, or leaving the area. A radiation detection and response system is yet another similar example.

Financial risk monitoring system

There is a common example of application of monitoring procedures in financial trading systems. Here the system is designed to make money trading (buying and selling) an individual security like a stock, bond, future contract, option, or currency.

Such a financial system operates as follows. The monitoring system monitors the behavior of the security over time. When certain price and trading volume patterns are detected, that are known from past behavior of the market to occur prior to a significant change in the price of the security, then we have a signal.

If the signal is for a future up move, there is a risk of income loss if funds are not in (or "long") the security. If the signal is for a future down move, there is a risk of loss of income if the funds are not used to support a short sale of the security. The appropriate response is to sell, and sell short, if there is a risk of the security falling, and to buy back, and buy, if the converse risk prevails.

The best-case return KR + G might be for the time period with the largest total up and down moves, during which all significant up and down moves, and only those moves, are acted upon.

Unfortunately, the price behavior of a security is often so close to random that an effective monitoring procedure is impossible. Even where an apparently effective monitoring system can be constructed, remember that the system will detect only the risk of a big change in the security's price, and never the certainty. So sometimes the signal will be a dud, or false alarm, and action in response to the risk signalled will lead to a loss.

These losses due to dud signals will prevent the system ever reaching the ideal best-case return I = RK + G. They may even cancel out all the losses L or Rr(E) eliminated by averting the risk, and worse, may even be greater than eliminated losses L or Rr(E).

Two types of response procedure

As we have seen, an environment coping procedure contains two components: (1) a real-time monitoring procedure with built-in capability of detecting the presence of risk of throughput capacity loss due to a possible hazard in the unfolding environment, and (2) a response component consisting of a set of procedures, perhaps brief precautionary procedures, to respond to and at least partially eliminate the risk of the hazards detected.

There are two ways in which the response component of the coping procedure can work. The simplest involves some additional resource deployed by the response procedure to eliminate the risk, independently of the operation of the normal system resources R. In such a case, normal system throughput is not affected by the operation of the response procedure, that is, there is no slowdown effect, even if the risk signal is a dud. The more complex response involves a precautionary procedure that will tie up the resources R for a short period, reducing normal system throughput. The simpler kind of response procedure thus does not slow down the system; the more complex kind does.

An example should clarify this. Returning to the satellite example earlier, assume that the monitoring or detection component of the coping procedure has detected the risk of a collision with a piece of space debris. Suppose further that the satellite is used for communications. If we shift the satellite to a higher orbit temporarily to eliminate the risk, that would be a precautionary response procedure employing the system resources, which could put the satellite out of operation for a short while. On the other hand, if the satellite were equipped with an effective laser gun, it could destroy the piece of space debris, without any temporary disruption in the satellite's normal operation. It would not affect system throughput if some risk signals were duds, and the laser sometimes fired at non existent debris-the response procedure would be certain of destroying real threatening debris, however.

Elimination of risk of loss of throughput capacity by means of a coping procedure, involving both a monitoring procedure and response procedures, is a more complex case than either of the two previous cases of risk elimination by either preventive resources P or precautionary procedures alone.

Risk monitoring equation with no slowdown effect

We begin our analysis of coping procedures with the simplest case. We begin with the simplest version of the risk monitoring equation, when the simplest type of response procedure is used, with no slowdown effect due to temporary diversion of system resources R. Later, we will cover the other case, where we employ a response procedure that is a precautionary procedure that does cause a slowdown effect, due to system resources R being diverted temporarily from normal operation. As we would expect, to handle this second case, we need to bring in precautionary procedures once again.

The no-slowdown risk-monitoring risk equation states that for a system with an environment coping procedure and resources R, expected throughput capacity I can be increased by increasing a complexity-measure parameter u in a real-time monitoring procedure component of the coping procedure, in accordance with the following modified risk equation formulations ....

Chapter 9 continues for another twenty pages, with the following section headings:

Risk monitoring equation with no slowdown effect
Incoming real-time data streams
Complexity measures with a simple monitoring procedure
Complexity measures for a general purpose monitoring procedure
Risk-meaningful databases
Derivation of the non-slowdown risk monitoring equation
Derivation of the slowdown risk monitoring equation
Existence of maximal throughput capacity
Economic costs
Monitoring procedures for portfolio management
The nightmare scenario--monitoring system vulnerability


COPYRIGHT: James Bradley 2002.


The official Publication Date was: March 01, 2001.
The publisher, Tharsis Professional Books (Tharsis Books), has arranged an Amazon.com listing for the book.
The U.S. Distributor for the book is BAKER & TAYLOR.
Commercial information is also available from BOWKER'S Books In Print .


Return to main page for ELIMINATION OF RISK IN SYSTEMS.
Return to James Bradley's home page.