Learning FMEA (Part 1): What is it and how to think about it

Learning to think in failure modes before they happen in the field.

7 Oct 2025

/

1 min read

7 Oct 2025

/

1 min read

7 Oct 2025

/

1 min read

I didn’t study manufacturing engineering. Most of what I know about reliability comes from the field as they happen like when a tap leaks, when the front panel jams on unboxing, when an LED turns yellow in first production lot.

So I’m writing this series as a way to learn vocationally by documenting, testing, and applying. Thanks to two of my team members - Krishna (Industrial Designer) and Tejas (Design Engineer) who are in the constant pursuit of driving FMEA across our products and educating me about it as well. As I'm learning their documentation, I decided to put together a set of notes for myself so I can share it ahead with the broader team and keep reflecting in the future.

There are four sources I'm learning from: a. Documenting all failures across stages of deep R&D, designing the machine, b. issues from assembly, processes, quality checks, and moulding, c. the always active document two of my team members are building, and d. books: The basics of FMEA & FMEA from theory to execution.

1. What is FMEA?

FMEA stands for Failure Mode and Effects Analysis. It's basically a way to ask if this thing we're building ever fails, what's the worst that could happen, how likely is it to happen, and how soon would we catch it?

A “failure mode” is simply the way something can go wrong. For example, a water-purifier tap can leak, jam, or break off. Each of those is a different mode of failure.

An “effect” is what that failure causes. It could be water damage, angry user, warranty claim, or a bad post on social media.

So, FMEA = listing the ways things fail + what those failures cause + what we can do about them. That’s it. But because it’s so simple, it scales beautifully from a tap lever to a probably a jet engine.

2. Why is this important?

Because discovering failure late is expensive. Very expensive. Changing a CAD model costs nothing in comparison but changing a mould costs lakhs. Changing a field-installed unit costs money and most importantly, reputation.

FMEA brings failures forward in time so we could catch them when they’re cheap and harmless to fix. It’s also the only meeting where pessimists are welcome. Everyone gets to say “I told you so” in advance.

3. Three buckets: Design, Process, and User

  1. Design FMEA (DFMEA)
    This is done on the product itself. E.g. Will the lever hinge go through mechanical fatigue after 500 cycles or 10,000 cycles. Each answer exposes a potential design weakness before tooling begins. DFMEA lives mostly inside CAD, material specs, and assembly diagrams. The goal is to minimise retooling and make the design inherently robust without any later fixes.

  2. Process FMEA (PFMEA)
    Could the operator install the lid with a different orientation, or install the wrong gasket? Could the water tank get contaminated during assembly? PFMEA is where design hands off to manufacturing, and both teams work together to control variation.

  3. User FMEA (UFMEA)
    How people use (or misuse) the product. E.g. The user pulls the lever with a jerk. They accidentally wiped the fascia with a dirty cloth scratching the fine, matte finish surface. The user isn't making the mistakes by choice. But it's our goal to make misuse unlikely by design and make the correct use obvious. UFMEA helps us design for wet hands, low light, impatience, and known habits.


Lens

Focus

Owners

Output

DFMEA

The product

R&D, industrial Design, design engineering, electronics & firmware

Stronger architecture, design, fit, materials, etc.

PFMEA

The build process

Manufacturing, assembly, quality, supplier

Clear SOPs, fixtures & jigs, torque & inspection controls, part quality, manufacturing variability

UFMEA

Humans using the product

Design, user experience, service & install

Clear design with affordances, feedback, error-proofing

4. How to rate & compare failure risks

Every potential failure is scored on three things:

Factor

What is it

Scale

Severity (S)

How bad is it if it happens?

1 – 10

Occurrence (O)

How likely is it to happen?

1 – 10

Detection (D)

How likely are we to catch it before the user does?

1 – 10

Then you multiply them to assign a Risk Priority Number (RPN). RPN = S × O × D. High RPN = danger zone. Low RPN = probably fine.

This scoring is just a conversation starter. The real activity here is when a designer argues with a manufacturing engineer about why something should be a 6 instead of a 4. That argument is the learning. The learning compounds.

5. How to run FMEA

FMEA books list ten steps. Here’s the same thing in plain English (for my understanding) and where all it should be baked in the full product development process:

Design stage

What's happening now in product development (high-level)

FMEA step

  1. Requirements & specifications defined

Product vision is locked, design brief is approved, and high-level requirements are frozen.

-

  1. Structure & functional analysis

Surface CAD from Industrial Designers is complete. Basically, the shape & form is closed. Initial mechanical layout starts now.

DFMEA / UFMEA

Review the product or process

  1. Concept development

Design Engineers start working here. Form–fit–function exploration with early DFMEA.

DFMEA / PFMEA / UFMEA

Brainstorm everything that could fail

  1. Risk analysis & failure study

Design and aesthetics (non-CMF) close here. Functional reliability targets are defined.

DFMEA / PFMEA / UFMEA

Note the effect of each failure


Features of the machine are closed here. Feature-level trade-offs are also resolved.

DFMEA / UFMEA

Rate severity (S)


Engineering assumptions are tested here. Material/process selection are in progress.

DFMEA / PFMEA

Rate occurrence (O)


Reliability test planning begins. Controls and checkpoints are identified.

DFMEA / PFMEA / UFMEA

Rate detection (D)


Risks sheet is created. Top 10/20 items become visible to all teams.

RPN = S x O x D

  1. Design detailing & DFM/DFA iterations

Final 3D ready for manufacturing readiness. DFM/DFA reviews under way.

DFMEA / PFMEA / UFMEA


Tooling release preparation begins. Vendor & fixture readiness detailing also start shortly after.

DFMEA / PFMEA / UFMEA

  1. Verification, validation & reliability testing

Testing phase is live. Re-validation of design + process changes.

DFMEA / PFMEA / UFMEA

  1. Pilot production & process optimisation

Manufacturing SOPs freeze here. Service, installation training begins.

PFMEA / UFMEA

  1. Launch & field feedback

Product is in the market. FMEA evolves via service feedback and data.

DFMEA / PFMEA / UFMEA

That’s the whole game. It’s a simple but exhaustive process. But, it forces everyone to speak the same language of risk and appreciate it.

6. Trade-offs of FMEA and cutting traditional biases

Tradeoffs and biases are important to understand because I know what it feels like to fight against time. Time-to-market matters in hardware. There’s always a push to launch the next version, meet factory deadlines, and lock BOMs before the quarter ends. But somewhere in that rush, the slow, unglamorous thinking gets squeezed out to meet timelines, and that’s usually where reliability dies first. FMEA is slow because it’s the only time when the team can ask “What if it breaks, and why?”

6.1. Time trade-off

FMEA adds days or weeks early in design, but saves months later in rework, tooling changes, and assembly firefighting. Every untested failure mode that was skipped eventually shows up as a customer issue. And when that happens, it costs time & trust.

6.2. Mindset trade-off

I'm guilty of designing for success. Most of us are. This required us to also start designing against failure. That’s not natural for builders. We love imagining how things work, not how they break. Of course, the danger is swinging too far on the opposite side obsessing over unlikely edge cases and delaying good ideas.

We should treat FMEA as a hygiene, sanity lens and limit each cycle of FMEA to 15-20 high-impact risks. If an item's failure doesn't affect user trust, safety, or service cost, we could park it for later.

6.3. Cutting biases pragmatically

I've seen that the concept of exhaustive FMEA attracts inputs from two kinds of people who have their own biases:

  1. The over-believers: People from traditional manufacturing/ large OEMs treat FMEA as gospel. They love the process because it's a bias toward completeness, not necessarily the purpose of the different dynamics of a fast-moving start-up.

  2. The under-believers: People from start-ups (like us) who love velocity, iteration, and proof over paperwork. It also comes from a decade+ exposure to software building. The bias is towards intuition. It's only when things break that we realise it's not a software bug fix push but a nightmare with 1000s of real machines in the field.

6.4. Middle ground

The goal is not to adopt or reject FMEA but to use it as a default thinking framework. If the analysis helps us understand failure and reduced service revisits extensively, we should do it. Are we learning faster so that we minimise the similar failures for another machine in the near future? If yes, we should do it. And consistently look for the minimum structured effort required to produce real insights.


I'll continue to write next parts as I continue to absorb it more deeply and get into running the process, specifics of DFMEA, PFMEA, and UFMEA, scoring and maybe along the way I'll explore even faster and more efficient ways to do it. Maybe.

Godgeez®

Thank you for visiting & spending time on my website.

This site is where I think out loud, build in public, and document the parts of me that don’t fit neatly on LinkedIn.

P.S.: I built the website for myself. Hope you find it interesting!

Godgeez®

Thank you for visiting & spending time on my website.

This site is where I think out loud, build in public, and document the parts of me that don’t fit neatly on LinkedIn.

P.S.: I built the website for myself. Hope you find it interesting!

Godgeez®

Thank you for visiting & spending time on my website.

This site is where I think out loud, build in public, and document the parts of me that don’t fit neatly on LinkedIn.

P.S.: I built the website for myself. Hope you find it interesting!