What Is FMEA? Failure Mode and Effects Analysis (for Beginners)

Contents

Accelerating growth with new funding and partnerships

See MaintainX in action

Take a live, one-on-one tour with a product expert to see how MaintainX can help you.

What Is FMEA?

FMEA stands for Failure Modes and Effects Analysis. It’s an engineering analysis performed to assess the potential causes of operational failures related to product design, assembly line production, and manufacturing equipment breakdowns.

No matter the business, the ultimate goal of FMEA is simple—to keep the production of high-quality products moving! To that end, the FMEA process involves taking data-driven actions to prevent any conceivable setback from happening from design to shipment.

According to the American Society for Quality, FMEA requires a cross-functional team of subject-matter experts to brainstorm how equipment may falter and its effects. FMEA teams analyze the current processes of organizational systems, subsystems, and assemblies step-by-step to conceive of every potential outcome.

In addition to supporting equipment reliability, FMEA data can also provide insights that help promote better maintenance strategies, safety measures, and regulatory compliance.

FMEA vs. FMECA

At some point, you may also have come across the term, Failure Modes, Effects, and Criticality Analysis (FMECA). Both FMEA and FMECA are methodologies that maintenance professionals use to identify, assess, and address potential failure modes.

While FMEA is an excellent tool for generating qualitative information, FMECA incorporates an additional assessment called a criticality analysis into the mix.

ISO 31000:2009: Risk Management Principles and Guidelines describes a criticality analysis as the process of assigning assets a criticality ranking based on their potential risks.

In this case, any factor that introduces uncertainty to reaching production goals is classified as a “risk.” Put simply, FMECA is FMEA + a criticality analysis.

Manufacturers, warehouses, and oil and gas providers are most likely to use this additional component to inform sensor-based maintenance programs.

With that said, both FMEA and FMECA help organizations meet ever-growing customer demands by enabling them to deliver quality, functional, and safe products. However, for the remainder of this article, we’ll focus on the ins and outs of FMEA.

The History of FMEA

During the 1940s, the U.S. Armed Forces developed FMEA to classify failures “according to their impact on mission success and personnel/equipment safety.”

Fast forward to half a billion individuals watching Neil Armstrong take “one small step for man and one giant leap for mankind” in 1969. Most viewers didn’t realize that Apollo 11 would never have landed on the moon without FMEA.

During an interview for the Johnson Space Center Oral History Project, Armstrong said NASA’s hardware reliability specifications allowed up to four failures in every 100,000 operations.

“I can only attribute that to the fact that every guy in the project, every guy at the bench building something, every assembler, every inspector, every guy that’s setting up the tests, cranking the torque wrench, and so on, is saying, ‘If anything goes wrong here, it’s not going to be my fault because my part is going to better than I have to make it.’”

He went on to say his team wouldn’t have been successful without the thousands of NASA workers’ commitment to operational excellence. Soon after NASA adopted FMEA, other significant industries dependent on equipment reliability—automotive, healthcare, mining, and oil—followed suit.

How Does FMEA Relate to Maintenance?

FMEA helps maintenance professionals answer the age-old question: what should we fix first? Prioritizing the upkeep of hundreds of thousands of machines and parts isn’t easy.

Every O&M manager knows that running preventive maintenance on critical assets is ideal, but that doesn’t mean it always happens. Limited budgets, time constraints, and labor shortages often force operational leaders to make tough calls when assigning work orders.

This is where FMEA can be particularly useful. Maintenance leaders can use this systematic approach to identify potential equipment failure modes, estimate the frequency of their occurrence, and evaluate the seriousness of the impact.

The result is a clearer picture of how different operational processes impact one another, translating to optimized maintenance scheduling.

Acting upon realistic FMEA data leads to reduced equipment downtime, decreased breakdown maintenance expenses, and enhanced team member safety.

But launching an FMEA program isn’t for the casually run maintenance department. To be successful, maintenance teams must commit to following through with task-force recommendations.

In addition, maintenance managers must enforce standard operating procedures to gauge program progress consistently. The easiest way to organize, analyze, and act upon program data is to use a mobile computerized maintenance management software (CMMS) system.

3 Types of FMEA

Business leaders most commonly conduct FMEAs when designing, redesigning, and redeploying products and processes. It also comes in handy when preparing control plans for operational processes. Though the analytical method includes several subsets, it’s most commonly broken down into three categories:

1. System FMEA

System FMEA, also referred to as f_unctional FMEA_, analyzes failure modes from a bird’s eye view. It focuses on entire systems and subsystems related to system integration, workplace safety, and anything else that could affect overall productivity.

System FMEA focuses on the fact that failures can occur between multiple assets and processes. In other words, rarely is a system failure an isolated event. These include single-point failure modes that impact several touchpoints and complex failure modes isolated to specific machinery. Experts recommend performing system FMEA before settling on functional system designs.

2. Design FMEA

As the name suggests, the primary focus of Design FMEA is the design of products and services. Organizational leaders perform design FMEA to assess potential failures, safety concerns, and regulatory requirements related to end products.

The goal is to ensure company deliverables are safe, reliable, and of high quality for customers. Operational leaders perform design FMEA both before new deliverables move to production and during product design reiterations. The result of a successful design FMEA session is action items that will mitigate product malfunctions.

3. Process FMEA

Process FMEA focuses on the process used to make the product instead of the end product itself. Managers use process FMEA to identify issues within the production processes involved in getting a product to market.

For example, a laptop includes several parts, each part resulting from unique production processes unto themselves. Assuming leadership identifies a problem with the computer’s processor, it will have to evaluate all of the procedures necessary to install it within the computer.

The scope of Process FMEA includes assembly operations, product labeling, parts management, and the transportation of materials. Organizations typically perform this type of FMEA during the “feasibility study phase” in pre-production. The ultimate objective is to build robust production processes that support design specifications with minimal downtime and required reworks.

It’s worth mentioning that some operational leaders segment FMEA into additional categories. Concept FMEA, Service FMEA, Hazard Analysis FMEA, and Software FMEA are some of the terms you may hear when discussing the topic with others. Regardless of the type of FMEA, the process always contains similar facets.

Four Key Elements of FMEA

FMEA relies on the following elements to drill down on potential operational issues:

Failure Modes: Failure modes refer to the various ways an asset, or its parts, can stop working correctly. For example, a centrifugal pump can experience hydraulic failure, mechanical failure, or corrosion.
Failure Causes: This component of the analysis involves good, old-fashioned brainstorming. Task-force leaders must execute systemic risk assessments of each potential failure mode based on past experiences. Surprisingly, root causes often trace back to common procedures, sister departments, system bottlenecks, and forms of operational waste.
Failure Mode Analysis: Once leadership has identified the potential causes of failure, it’s time to measure the frequency and severity of its occurrences. This aspect of the FMEA involves ranking the estimated severity of each failure’s effects (e.g., employee safety, business profitability, asset reliability), the likelihood of its occurrence, and its detection probability. The team then multiplies the three rankings to arrive at a Risk Priority Number (RPN) to inform task prioritization for maintenance and operations.
Action and Review: Finally, the team develops an action plan to execute the FMEA’s recommendations. Again, the right CMMS can automate the entire process from digital checklist assignments to asset management to cost/time analysis, shaving off up to 20 hours per week compared to analog data management processes.

Now that you understand the basics of FMEA, let’s delve into the meat of the process—the failure mode analysis process, which includes ranking failure scores.

Understanding FMEA Failure Scores

An FMEA failure mode analysis includes the following three scoring attributes:

Severity: Severity rankings reveal which failure modes should be given priority. It considers what factors are essential to the business, its customers, and the industry at large. Leadership may weigh safety standards, business continuity, environmental requirements, and reputational damage when assigning severity rankings. A low severity ranking means that a failure mode has minimal impact on a business or its customers, while a high ranking signifies a more severe detrimental effect.
Detection: Detection ranking determines the probability of prevention before a potential failure occurs. In other words, it measures how likely workers are to detect the problem before it happens. Low-detection rankings represent easily detectable issues, while high-detection rankings represent unpredictable ones.
Occurrence: This ranking shows the probability of a failure mode occurring during an asset’s lifespan. A low-occurrence ranking means that the failure is unlikely to happen, while a high-occurrence ranking implies that the asset failing is a more likely occurrence.

Each attribute is assigned a ranking per category, typically between one and 10. Failure modes that are unlikely to occur receive a rating of one. The three metrics are then multiplied to yield a risk priority number (RPN) for each failure mode.

Failure Score Example

For example, if an espresso machine has a severity rating of 10, a detection rating of two, and an occurrence ranking of four, its risk priority number would be 80. Remember, the long-term goal is to reduce your RPNs by proactively correcting catalysts of risk.

In some cases, it’s not possible to reduce asset severity ratings. For this reason, it’s best to focus on reducing what you can control—the occurrence and detection ratings. This means you will minimize the chances of a problem occurring while also enhancing your ability to detect failure before it happens.

Ultimately, the higher the severity ranking, the larger the failure will impact business productivity, customer satisfaction, and bottom lines.

How to Conduct a Failure Mode Effects Analysis

The best time to run an initial FMEA is during your organization’s product development phase. This allows plenty of time to implement modifications to streamline product design and manufacturing processes. Of course, you can perform FMEA any time you seek to improve quality control, enhance profitability, or increase efficiency.

Before we get started, it’s worth mentioning that a more comprehensive breakdown of how to conduct an FMEA can be found in J1739_202101: Potential Failure Mode and Effects Analysis (FMEA) Including Design FMEA, Supplemental FMEA-MSR, and Process FMEA.

Read on for an overview to get started:

Step 1: Assemble Your FMEA Team

Despite the word “me” within FMEA, the analysis must include many key team members to succeed.

With this in mind, put together a task force to spearhead the initiative. In most cases, a production engineer or maintenance manager will lead the team. Other potentially relevant roles include process engineers, process designers, suppliers, marketers, and even customers.

Step 2: Gather Your Data

Once you have your team in place, gather the following information:

Next, determine which type of FMEA you will be conducting.

Step 3: Specify Your Scope

Identify the systems, subsystems, assemblies, and parts’ relationships your team will evaluate in the FMEA. In addition, set up standard operating procedures to conduct, execute, and follow up on your efforts.

You can use a flowchart to visualize the details more clearly. It’s crucial to ensure that every team member is on the same page before the project begins.

Step 4: Identify Potential Failure Modes

At this point, you’re ready to consider the various ways your assets can fail. Use maintenance history, frontline employee knowledge, manufacturer’s guidelines, and other helpful resources to brainstorm potential outcomes.

Your goal is to generate an exhaustive list of what could go wrong. It’s essential to emphasize that specific failure modes can trigger other failures within systems or subsystems. For this reason, always consider the overall context when identifying potential issues. Besides failure modes that result in assets completely breaking down, consider those that may result in:

Unintended functions or results from an asset.
Substandard performance.
Reduced functionality.

Isolate each asset and its components to identify all potential failure modes, including the hidden ones. Standard equipment failure modes include cracks, product deformation, torque fatigue, and electrical short circuits.

Step 5: Determine Severity Rankings

As mentioned earlier, severity rankings measure the overall effect of failure modes. Maintenance managers often use a scale from one to 10 to rank impact from low to high.

Generally, failure modes that impact safety or the company’s bottom line are given higher severity rankings. Failure modes with high severity rankings should always be given priority. For example, an automobile manufacturer may assign an airbag installation issue a high severity ranking because of its detrimental impact on safety and production delays.

Step 6: Determine Occurrence Rankings

Occurrence ranking measures the frequency in which the failure modes are likely to happen. It’s also measured on a scale of one to 10.

When determining the occurrence ranking, it’s essential to consider all the potential causes of the failure mode and existing prevention control to prevent it from happening.

Using the example mentioned in the previous step, why would the car manufacturer receive faulty airbags or delayed deliveries? It could be a mixup from the supplier, discrepancies when ordering, or a communication breakdown. Are there any controls to ensure these don’t happen? If yes, then incorrect airbag installation will receive a low or moderate occurrence ranking.

Step 7: Determine Detection Rankings

At this stage, determine the detection ranking for the identified failure modes. This means establishing how easy it will be to identify failure before it occurs. High detection rankings mean it’s nearly impossible to identify and resolve problems before failures occur.

Step 8: Calculate RPN and Prioritize Actions

Now, calculate the Risk Priority Number (RPN) for each failure mode by multiplying the three rankings: detection, occurrence, and severity. This will enable you to prioritize design, process improvement, and maintenance initiatives. Here’s a look at the RPN formula:

Risk Priority Number (RPN) = Severity * Occurrence * Detection

Keep in mind that different industries and organizations have different definitions of “critical.” Therefore, it’s up to your team to determine what to consider as the RPN for critical failure modes that should be given priority.

However, you should assess non-critical RPNs to determine their relationship with other systems and if there’s a possibility of triggering failure in those systems. Such failure modes also should be given priority.

Step 9: Develop a Preventive Maintenance Program

This is the stage where you focus on the core purpose of your FMEA program—to mitigate the identified risks through data-informed actions. Develop a proactive maintenance program to help minimize the risk of asset failure. Priority should always be given to failure modes with high RPNs.

Because it’s impossible to deal with every potential failure mode at once, focus your energy on the ones that most impact productivity, safety, and customer satisfaction.

Collect maintenance data and analyze your efforts to determine the most suitable course of action for each failure mode. You also can make changes to functions or consider process and design improvements.

In addition, prioritize the assets with high severity rankings regardless of whether their RPN is low. Why? Because a high severity rating means the failure will significantly impact business productivity, customer satisfaction, and safety.

Step 10: Recalculate Your RPNs

Finally, periodically recalculate your RPNs to see whether the actions you implemented are generating results. Your task force should re-rank each value (severity, occurrence, and detection) to calculate a new RPN for each failure mode and then compare it to the old RPN.

The goal is to continually develop and implement new strategies that lower RPNs to the point of no longer needing any interventions. For this reason, FMEA is an ongoing process—not a one-off event.

FMEA Process Example

Here’s a quick example of an FMEA process conducted for a vehicle airbag installation:

Function: Airbag installation

Failure Mode: Incorrect airbag installed

Failure Effects: Deployment malfunction resulting in physical injury to the driver

Severity Ranking: 10

Causes of Failure: Human error

Occurrence Ranking: 4

Prevention Controls: Manually inspect airbags after the supplier delivers them

Detection Controls: Use sensors to test airbag functionality after installation

Detection Ranking: 6

RPN: 240

Common Roadblocks with FMEA

FMEA is a powerful tool to improve both process and product reliability. When conducted accurately, it’s been proven to help organizational leaders mitigate risks, minimize product development timelines, and safeguard quality control.

But not everyone who undertakes an FMEA process achieves the expected results. Some organizations encounter the following common roadblocks:

Poor Communication: Many businesses approach FMEA as a compliance item that needs to be crossed off the checklist. Failure to see FMEA as a tool for quality and reliability improvement leads to a half-hearted process without any real success.
Ineffective Team: Assembling an FMEA team without representation from parts of the product or process lifecycle is likely to be ineffective. The team won’t have sufficient knowledge of all the product or process lifecycle. FMEA is a continuous process that spans the entire product lifecycle. This means that you have to address all the issues from the early design stage to development, manufacturing, and deployment for the process to be a success.
Disorganized Documents: Some teams, in a rush to complete the process, haphazardly compile information and documents in a disorganized manner. Unfortunately, they become victims of the failures of their own systems when attempting to review progress. Frustration, overwhelm, and confusion prevents them from recalculating RPNS and pivoting their efforts as needed.
Over-Reliance on RPN: Lastly, teams that solely focus on RPNs draw inaccurate conclusions about failure modes. It’s best to use the RPN as an overall benchmark while still considering the impact of the individual scores.

Streamline FMEA with MaintainX

As we have shown, FMEA is a valuable tool for businesses aiming to increase asset reliability, prioritize O&M initiatives, and streamline product design. The key to success?

Implement the recommendations that result from your task force and stick to a periodic data review process! Progress won’t happen overnight, so make plans to revisit RPNs and team efforts.

MaintainX enables manufacturing teams to share crucial data points, enhance team communication, and glean insights from user-friendly advanced reporting. Our work order software consistently receives high marks for usability, comprehensiveness, and value on third-party review sites.

But don’t take anyone else’s word for it—try MaintainX for yourself.

Topics

Caroline Eisner

Caroline Eisner is a writer and editor with experience across the profit and nonprofit sectors, government, education, and financial organizations. She has held leadership positions in K16 institutions and has led large-scale digital projects, interactive websites, and a business writing consultancy.