Blog

Feed

Experiments

Contact

Learning FMEA (Part 1): What is it and how to think about it

Learning to think in failure modes before they happen in the field.

7 Oct 2025

1 min read

Hardware

7 Oct 2025

1 min read

Hardware

7 Oct 2025

1 min read

Hardware

I didn’t study manufacturing engineering. Most of what I know about reliability comes from the field as they happen like when a tap leaks, when the front panel jams on unboxing, when an LED turns yellow in first production lot.

So I’m writing this series as a way to learn vocationally by documenting, testing, and applying. Thanks to two of my team members - Krishna (Industrial Designer) and Tejas (Design Engineer) who are in the constant pursuit of driving FMEA across our products and educating me about it as well. As I'm learning their documentation, I decided to put together a set of notes for myself so I can share it ahead with the broader team and keep reflecting in the future.

There are four sources I'm learning from: a. Documenting all failures across stages of deep R&D, designing the machine, b. issues from assembly, processes, quality checks, and moulding, c. the always active document two of my team members are building, and d. books: The basics of FMEA & FMEA from theory to execution.

1. What is FMEA?

FMEA stands for Failure Mode and Effects Analysis. It's basically a way to ask if this thing we're building ever fails, what's the worst that could happen, how likely is it to happen, and how soon would we catch it?

A “failure mode” is simply the way something can go wrong. For example, a water-purifier tap can leak, jam, or break off. Each of those is a different mode of failure.

An “effect” is what that failure causes. It could be water damage, angry user, warranty claim, or a bad post on social media.

So, FMEA = listing the ways things fail + what those failures cause + what we can do about them. That’s it. But because it’s so simple, it scales beautifully from a tap lever to a probably a jet engine.

2. Why is this important?

Because discovering failure late is expensive. Very expensive. Changing a CAD model costs nothing in comparison but changing a mould costs lakhs. Changing a field-installed unit costs money and most importantly, reputation.

FMEA brings failures forward in time so we could catch them when they’re cheap and harmless to fix. It’s also the only meeting where pessimists are welcome. Everyone gets to say “I told you so” in advance.

3. Three buckets: Design, Process, and User

Design FMEA (DFMEA)
This is done on the product itself. E.g. Will the lever hinge go through mechanical fatigue after 500 cycles or 10,000 cycles. Each answer exposes a potential design weakness before tooling begins. DFMEA lives mostly inside CAD, material specs, and assembly diagrams. The goal is to minimise retooling and make the design inherently robust without any later fixes.
Process FMEA (PFMEA)
Could the operator install the lid with a different orientation, or install the wrong gasket? Could the water tank get contaminated during assembly? PFMEA is where design hands off to manufacturing, and both teams work together to control variation.
User FMEA (UFMEA)
How people use (or misuse) the product. E.g. The user pulls the lever with a jerk. They accidentally wiped the fascia with a dirty cloth scratching the fine, matte finish surface. The user isn't making the mistakes by choice. But it's our goal to make misuse unlikely by design and make the correct use obvious. UFMEA helps us design for wet hands, low light, impatience, and known habits.

Lens	Focus	Owners	Output
DFMEA	The product	R&D, industrial Design, design engineering, electronics & firmware	Stronger architecture, design, fit, materials, etc.
PFMEA	The build process	Manufacturing, assembly, quality, supplier	Clear SOPs, fixtures & jigs, torque & inspection controls, part quality, manufacturing variability
UFMEA	Humans using the product	Design, user experience, service & install	Clear design with affordances, feedback, error-proofing

4. How to rate & compare failure risks

Every potential failure is scored on three things:

Factor	What is it	Scale
Severity (S)	How bad is it if it happens?	1 – 10
Occurrence (O)	How likely is it to happen?	1 – 10
Detection (D)	How likely are we to catch it before the user does?	1 – 10

Then you multiply them to assign a Risk Priority Number (RPN). RPN = S × O × D. High RPN = danger zone. Low RPN = probably fine.

This scoring is just a conversation starter. The real activity here is when a designer argues with a manufacturing engineer about why something should be a 6 instead of a 4. That argument is the learning. The learning compounds.

5. How to run FMEA

FMEA books list ten steps. Here’s the same thing in plain English (for my understanding) and where all it should be baked in the full product development process.

First off, before FMEA even begins, there’s a phase where we should do a deep, structured breakdown of the product into functions → systems → sub-assemblies → components, and map requirements to each level.

Pre-FMEA

Stage	What happens here	Why it does
1. Requirements gathering	Collect all user, safety, performance, and aesthetic requirements.	Defines what the product must achieve.
2. Function definition	Break the product into core functions: e.g. filtration, water dispensing, etc.	Translates requirements into what the product does.
3. System & sub-system breakdown	Divide the product into assemblies (filtration system, electronics, housing, dispensing unit, etc.) and then into sub-assemblies and components.	Shows where each function physically lives in the machine.
4. Requirement-to-function mapping	For each function or subsystem, assign which requirements apply. E.g. “Hygiene” → filtration & storage tank materials; “Strength” → bracket, housing.	Creates traceability. Every requirement has an owner.
5. Structure & function analysis	Create the “structure tree” and “function tree.” For each node (component), define what it does and what could go wrong if it doesn’t do that.	This becomes the input to DFMEA.

Design stage	What's happening now in product development (high-level)	FMEA step
Requirements & specifications defined	Product vision is locked, design brief is approved, and high-level requirements are frozen.	- Pre-FMEA groundwork starts next.
Requirements breakdown & functional mapping	Break down high-level requirements into core functions.	- Foundations for DFMEA
System → Assembly → Sub-assembly → Component Mapping	Identify all systems, subsystems, and components. Map which requirements apply to which components.	- Establish traceability between requirements and physical parts
Structure & functional analysis	For each component, define its function and failure effect. Assign ownership of requirements to responsible functions/components.	Inputs prepared for FMEA sheets
Concept development	Early DFMEA starts. Teams brainstorm potential failure modes at the concept level even before final CAD. Identify what could go wrong, what causes it, and what effect it might have.	DFMEA / PFMEA / UFMEA (Concept-level FMEA)
Risk analysis & failure study	Concept shortlisting happens. Functional reliability targets defined. Design and aesthetics (non-CMF) close here.	DFMEA / PFMEA / UFMEA Note the effect of each failure
Severity rating (S)	Teams discuss if this fails, how bad is it?	DFMEA / PFMEA
Occurrence rating (O)	Reliability test planning begins. Identify where variation or failure likelihood is highest.	DFMEA / PFMEA / UFMEA
Detection rating (D)	Establish inspection points, testing methods, and control plans. Identify how early each risk can be caught.	DFMEA / PFMEA / UFMEA
Calculate RPN (S x O x D)	Risks sheet is created. Top 10/20 items become visible to all teams.	All FMEA types
Design detailing & DFM/DFA iterations	Final 3D ready for manufacturing readiness. DFM/DFA reviews under way.	DFMEA / PFMEA / UFMEA
Tooling release preparation	Final 3D CADs ready for manufacturing readiness. DFM/DFA reviews in progress. High-risk items are addressed here.	DFMEA / PFMEA / UFMEA
Verification, validation & reliability testing	Testing phase live. Validation of design and process improvements. Capture learnings back into FMEA sheets.	DFMEA / PFMEA / UFMEA
Pilot production & process optimisation	Manufacturing SOPs freeze here. Service, installation training begins. Minor PFMEA refinements.	PFMEA / UFMEA
Launch & field feedback	Product is in the market. FMEA evolves via service feedback and data.	DFMEA / PFMEA / UFMEA Living document.

That’s the whole game. It’s a simple but exhaustive process. But, it forces everyone to speak the same language of risk and appreciate it.

6. Trade-offs of FMEA and cutting traditional biases

Tradeoffs and biases are important to understand because I know what it feels like to fight against time. Time-to-market matters in hardware. There’s always a push to launch the next version, meet factory deadlines, and lock BOMs before the quarter ends. But somewhere in that rush, the slow, unglamorous thinking gets squeezed out to meet timelines, and that’s usually where reliability dies first. FMEA is slow because it’s the only time when the team can ask “What if it breaks, and why?”

6.1. Time trade-off

FMEA adds days or weeks early in design, but saves months later in rework, tooling changes, and assembly firefighting. Every untested failure mode that was skipped eventually shows up as a customer issue. And when that happens, it costs time & trust.

6.2. Mindset trade-off

I'm guilty of designing for success. Most of us are. This required us to also start designing against failure. That’s not natural for builders. We love imagining how things work, not how they break. Of course, the danger is swinging too far on the opposite side obsessing over unlikely edge cases and delaying good ideas.

We should treat FMEA as a hygiene, sanity lens and limit each cycle of FMEA to 15-20 high-impact risks. If an item's failure doesn't affect user trust, safety, or service cost, we could park it for later.

6.3. Cutting biases pragmatically

I've seen that the concept of exhaustive FMEA attracts inputs from two kinds of people who have their own biases:

The over-believers: People from traditional manufacturing/ large OEMs treat FMEA as gospel. They love the process because it's a bias toward completeness, not necessarily the purpose of the different dynamics of a fast-moving start-up.
The under-believers: People from start-ups (like us) who love velocity, iteration, and proof over paperwork. It also comes from a decade+ exposure to software building. The bias is towards intuition. It's only when things break that we realise it's not a software bug fix push but a nightmare with 1000s of real machines in the field.

6.4. Middle ground

The goal is not to adopt or reject FMEA but to use it as a default thinking framework. If the analysis helps us understand failure and reduced service revisits extensively, we should do it. Are we learning faster so that we minimise the similar failures for another machine in the near future? If yes, we should do it. And consistently look for the minimum structured effort required to produce real insights.

I'll continue to write next parts as I continue to absorb it more deeply and get into running the process, specifics of DFMEA, PFMEA, and UFMEA, scoring and maybe along the way I'll explore even faster and more efficient ways to do it. Maybe.

You might like these too!

Dec 2025

Two to six weeks

Team & culture

Dec 2025

Two to six weeks

Dec 2025

Two to six weeks

Dec 2025

What is this product supposed to be?

Hardware

Dec 2025

What is this product supposed to be?

Dec 2025

What is this product supposed to be?

Dec 2025

The real problem with 'simple' hardware

Hardware

Design

Dec 2025

The real problem with 'simple' hardware

Dec 2025

The real problem with 'simple' hardware

Dec 2025

Seat at the table

Design

Career

Dec 2025

Seat at the table

Dec 2025

Seat at the table

Dec 2025

CMF is hardware UX too!

Hardware

Design

Dec 2025

CMF is hardware UX too!

Dec 2025

CMF is hardware UX too!