Choosing the right tool for your analysis. RCM or FMEA

Reliability Centered Maintenance (RCM) reviews and Failure Modes and Effects Analyses (FMEA)s have a lot in common, but there are still some key differences.  Rather than go into the mechanics of each, let’s look at the philosophy to help you choose the appropriate tool for your organization.

RCM as events are often overshadowed as folks have started using the term RCM to mean a proactive operating philosophy; what I call manufacturing excellence.  This is not a review of that philosophy.  Here I am talking about the John Moubray pioneered RCM analysis and its legitimate offshoots. (SAE JA1011_199908)

RCM is a member of the zero culture.  No failures are acceptable.  The RCM will identify all the potential failure points and these will be engineered away.  This may take the place of re-engineering equipment or processes, or engineering a proactive inspection to reduce the risk of an unplanned interruption (failure) to zero.  RCM will rank failures in a high-medium-low fashion, but the ultimate goal is to remove all potential failures, no matter the ranking.  RCM is to the maintenance organization what zero defects is to the quality organization.

Just as true believers of the zero quality defects philosophy removed quality inspectors, a true RCM organization would place less emphasis on mechanics rushing to breakdowns.  There would be no unplanned maintenance.  An RCM organization would take the opportunity of a breakdown to review their engineering efforts and determine how to never have this happen again.  Driving unplanned maintenance to zero would be the vision of the whole organization and resources would be applied appropriately.  This requires much upfront engineering and precise execution of planned maintenance.

FMEA culture does accept some failures.  Run to failure is an option in an FMEA philosophy, but that decision is made in advance, and with eyes wide open.  A key feature of the FMEA is the risk priority number (RPN).  The lowest acceptable RPN is determined and this is called the RPN threshold.  The RPN threshold is the point at which the organization has said, the cost of reducing that failure is more than the cost of the failure itself, therefore it will not be engineered out.  Determining the failures and their mitigating activities is similar in both the RCM and the FMEA.  However the FMEA assigns a number to the failure modes’ severity, occurrence, and detection to determine the risk priority number.  The organization then chooses an RPN threshold and only assigns resources to engineering out failures whose RPNs are above that threshold.  Therefore the organization accepts that failures with RPNs below the threshold will still occur.  They have accepted a breakdown culture, to a certain degree.  This organization will rely on a combination of engineering and maintenance to perform.  The FMEA organization will have fewer engineering resources and more maintenance and troubleshooting resources than the RCM organization.

Both the RCM organization with its zero tolerance and the FMEA organization with its limited acceptance of breakdowns are legitimate operating philosophies.  Both have many successful examples.  Airlines and power producers are examples of industries that follow the zero philosophies.  Failures in these industries cost the providers huge economic penalties, so the cost of the RCM implementation is easily saved in cost avoidance.  There is also a risk of loss of life with either of these failures and, actuary tables aside, these cannot be measured in pure economic terms.

Many factories and producers adopt the FMEA philosophy of accepting risk.  However, problems arise when management provides resources to act in an FMEA environment and expects RCM zero results.  Management will keep the responsibility for the budget and approving projects to themselves, but assign the accountability for zero breakdowns to the maintenance or maintenance and engineering departments.  This mis-match in accountability and responsibility is what causes some organizations to spiral out of control and become a reactive culture.   Reactive culture is not a sustainable operating philosophy.  Just to be clear, reactive maintenance culture is not a sustainable operating philosophy.  It is not sustainable to operate your organization with a reactive maintenance philosophy.

So when choosing between FMEA and RCM, understand what the organization’s accountability and responsibility structure are for allocating and implementing engineering and maintenance resources.  It is often advisable to lean toward the RCM zero philosophy.  That way the projects to engineer out the failures are in proposal form, just waiting for the funding to be approved.  Let’s look at how a failure might be handled in each organizational philosophy:

FMEA – a failure occurs with a low RPN.  The organization demands an after action review of the failure.

The maintenance manager reviews the original FMEA, confirms the RPN number is still valid and reports to the rest of the site leadership team that this failure was one that “we” determined the organization could weather.  Added to that report are the cost of the failure, and an estimate of what it might cost to mitigate that failure.  This confirms that run to failure was the most economical plan.

All is good until someone on the leadership team states “they” were not a part of the “we” and will not accept any failure at any time; mis-match in philosophies.  Now the organization has to re-determine which philosophy they hold or should the RPN threshold be lower.  This could trigger a review of all the FMEAs against a lower RPN, or a removal of all RPNs to embrace a zero culture.


The maintenance manager reviews the original FMEA, determines the RPN has changed and it is, in fact, above the threshold now.   This triggers a project for this specific instance.  It also triggers a review of all FMEAs to recalculate the RPN for the current operating conditions.  This also sets up the need to have a trigger to review RPNs as operating conditions change.

RCM – a failure occurs. The organization demands an after action review of the failure.

The maintenance manager reviews the files, finds the failure and the project associated with its mitigation.  The project is presented to the leadership team with an updated ROI given the recent failure.  The leadership team decides project resources and timing.  This may include that the ROI on the project is not still not viable and the project goes back to waiting status.

Both RCM and FMEA philosophy are acceptable ways to run an organization.  However, if the leadership team is constantly changing faces (individuals), or the operating conditions are constantly changing, it can be advantageous to run with the zero failure philosophy of RCM.   Operating under the FMEA philosophy may make more sense in the reality of limited funds, but it takes much more finesse and an understanding of risk analysis to promote and sustain.

Choose your methodology wisely and be able to explain the philosophy to both your peers and your team.  Confidence and support for the methodology is much more important than the specific acronym you apply.  Please do choose a proactive approach, because reactive maintenance is not sustainable.  It costs way too much in lost production, equipment wear, and morale of the humans who have to operate in that environment.

Please share your stories of successful RCM or FMEA implementation.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: