# FMEA Tutorial

Philip Mayfield

I was suspended approximately 200 feet above a ravine when the thought occurred to me. Had the engineer(s) of the Gondola lift that my family and I were currently occupying used Failure Mode and Effects Analysis (FMEA) when designing the lift? Click on the video below for an introduction (you may have to click twice).

# What is Failure Mode and Effects Analysis (FMEA)?

by Philip Mayfield

FMEA is a simple tool used to identify and classify potential failure modes based upon their severity, probability of occurrence, and probability of detection. Put more simply, FMEA can be used to avoid engineering defects from the very small to major disasters. What is a major disaster? Unfortunately, we do not have to look very hard to find them. Examples range from the explosion of the Shuttle Columbia shortly after takeoff to the failure of the nuclear reactor in Chelyabinsk which made it the most contaminated location in the world. Another example would be the failure of the Tacoma Narrows Bridge in 1940, dubbed "Galloping Gertie" as the bridge failed spectacularly due to mechanical resonance (video below).

FMEA isn't limited in scope to major disasters that lead to loss of life. FMEA can be used to prevent "smaller" failures as well, such as Intel's problem with the Pentium's inability to perform math correctly (\$475 Million);  AT&T's switch failure in 1990 which brought down the AT&T network for 9 hours; and GE's decision to replace their refrigerators' reciprocating compressors with rotary versions which lead to GE replacing 1 million units in the 1980s (estimates from \$450 million to \$900 million).

# Getting Started with FMEA

Let's get started with a simple example by going back to the example of the Gondola lift. What would occur if the cable supporting the Gondola was to break? In FMEA parlance, the potential failure mode would be "cable break". This information is typically put into a spreadsheet designed for FMEA so that the information can be capture and shared. For example, I have placed our first failure mode, "Cable Breaks", into the matrix below. Note that the there are other rows in which we could enter more failure modes, but for now we will focus on "Cable Breaks".

With this failure mode in mind, we now need to work through the columns to the right on the spreadsheet. The next column is "Potential effect of failure" and so we should ask ourselves what would happen if the cable breaks. In this simple example, the answer is obvious: the Gondola would fall to the ground. Our new FMEA worksheet is below (new entries are in red).

The next column is "SEV" which stands for severity. Severity is usually measured on a 10 point scale with the most severe being a 10 and the least severe being a 1. Some organizations such as SAE, AIAG, ISO 9000, and ISO/TS 16949:2002 have established severity ratings. Below are the severity ratings that are included in Quantum XL.

 Severity of Effect 10 May result in safety issue or regulatory violation without warning 9 May result in safety issue or regulatory violation with warning 8 Primary function is lost or seriously degraded 7 Primary function is reduced and customer is impacted 6 Secondary function is lost or seriously degraded 5 Secondary function is reduced and customer is impacted 4 Loss of function or appearance such that most customers would return product or stop using service 3 Loss of function or appearance that is noticed by customers but would not result in a return or loss of service 2 Loss of function or appearance that is unlikely to be noticed by customers and would not result in a return or loss of service 1 Little to no impact

Since the Gondola is suspended well above the ground, if it were to fall the result could certainly be the death or serious injury of the occupants. Using the table, a severity of 10 is defined as "May result in safety issue or regulatory violation without warning" which sounds like a good description of the Gondola falling. Our FMEA spreadsheet now looks like the following.

The next column is "Class" and will be the subject of a future article. This takes us to "Potential cause". Why would the cable break? We need to brainstorm the potential causes that might make the cable break. Below are a few examples for the purpose of this introduction.

• Low flying aircraft collides with the cable
• Corrosion of the cable leading to loss of strength
• Maximum strength of cable exceeded

I have abbreviated these three failure modes to "Aircraft Collision", "Corrosion", and Max Strength Exceeded" and entered them into our FMEA spreadsheet below.

The next column is "Occur" which is short for occurrence. Similar to the ratings for Severity, each organization like AIAG and ISO 9000 have their own ratings for occurrence. The occurrence ratings used by Quantum XL are shown below.

 Probability of Occurrence 10 1 in 2 9 1 in 10 8 1 in 50 7 1 in 250 6 1 in 1,000 5 1 in 5,000 4 1 in 10,000 3 1 in 50,000 2 1 in 250,000 1 1 in 1 Million

The higher the number the more likely it will occur, with an occurrence value of 10 being the highest. Starting with "Aircraft Collision" we need to ask ourselves how likely this is to happen. You might think that this is extremely unlikely, perhaps less than one in a million. However, on Feb 3rd 1998 a US Marine EA-6B Jet impacted the cable supporting a cable car in the ski resort located on Cermis Mountain in Northern Italy. The accident resulted in the deaths of the 20 people who were in the car at the time. Additionally, there have been over 20 incidents of jets cutting power-lines in Northern Italy since 1990. Since this "cause" has already occurred, it is not a 1 in a million chance of happening. After taking into account the proximity of the lift to military training areas and the height of the lift, we decided to assign an occurrence of 3. A similar analysis resulted in an occurrence of 10 for corrosion (the cable is going to corrode) and 1 for Max Strength Exceeded.

You may have noted that we are in a section of the spreadsheet which is colored blue and labeled "Current process controls". This is set apart to show what is in effect now to prevent and detect these causes. Our next columns are "Prevention" followed by "Detection". This is what has been included in the design to prevent and/or detect this cause before it can lead to a failure.

Corrosion- If the cable is coated with an overcoat that will prevent corrosion it would be noted in the "prevention" column. If we use an electromagnetic instrument to detect corrosion damage in the cable, this would be noted in the "detection" column.

Max Strength Exceeded - The cable includes a 10X safety margin and instrumentation to detect increasing loads due to increased winds.

Aircraft Collision - The current system has no method of preventing or detecting an aircraft collision. Our updated table ...

The next column is "Det" which stands for detection. The detection ratings used by Quantum XL are in the table below.

 Likelihood of Detection 10 Absolutely uncertain that failure will be detected 9 Very remote chance that failure will be detected 8 Remote chance that failure will be detected 7 Very low chance that failure will be detected 6 Low chance that failure will be detected 5 Moderate chance that failure will be detected 4 Moderately high chance that failure will be detected 3 High chance that failure will be detected 2 Very high chance that failure will be detected 1 Almost certainty that failure will be detected

Note that the detection scale is larger for items that are unlikely to be detected with 10 being the least likely. The greater the probability of detection the lower the rating. We have almost no chance of detecting an aircraft collision before it occurs. Therefore, using the detection table the "Det" value for "Aircraft Collision" is 10. The corrosion sensor is run daily and so the detection rating for "Corrosion" is 1. Finally, the load sensor is run continuously and therefore the detection rating is also 1.

The last column is "RPN" or Risk Priority Number. The spreadsheet automatically calculates this value with the simple math RPN = SEV * Occur * Det. The higher the RPN the greater the risk of the associated cause to the overall system. Normally, there would be more than the three items we have identified thus far. However, to keep the example short I have created the Pareto of failure modes .

The goal of the Pareto is to identify the causes which present the greatest risk to the system. In this case, the largest risk item happens to be "Aircraft Collision".

It should be noted that we shouldn't stop here. If this reflected the current state of our system, then we would need to address the causes that carry the most risk. A new set of columns to the right of "Current process controls" is where we address any actions that we would like to take towards reducing the risk. The new area is entitled "Action results" and provides an area for us to enter any actions taken and the resulting Sev, Occur, and Det.

FMEA Spreadsheet with Action Area (click image to enlarge)

At this point we would brainstorm ways to reduce the risk. Since "Aircraft Collision" has the highest risk, we need to take actions to reduce the RPN of this item. We can reduce the risk by either reducing the severity, reducing the probability of occurrence, or by increasing the probability of detection. For example, placing beacons on the tops of the towers that support the cable would make it easier for the pilots to see that a hazard is in their path (much like the large blinking red lights on the top of radio towers). This would greatly decrease the probability of occurrence. We would therefore update our FMEA to include these actions.

click image to enlarge

Of course, we would want to identify other failure modes and work through the process for them as well before starting on our action list.

# FMEA Conclusion

The general process of FMEA is the process of identifying failure modes, determining the effect and causes for these modes, assigning severity (SEV), occurrence (Occur), and Detection (Det) ratings, and calculating the risk priority number (RPN).

This article serves as an introduction to help you get started with FMEA. In future articles we will expand to discuss the different types of FMEA.