5 Why
We encounter issues in our work environments. Most of the time we just find a solution and move on. But some issues keep on repeating and need a deeper understanding. Some issues are significant and have a large impact on the business or on you. Such issues need to be dealt with in different ways. We need to find the root cause for these issues. It is always better to resolve the root cause than the symptoms of the problems.
I often use the 5 whys method to perform RCA. This is simple and effective.
Let us consider a simple example.
Why was I driving rashly?
Cause: I was late for a meeting,
Why was I late for the meeting?
Cause: I woke up late.
Why did I wake up late?
Cause: I slept late yesterday.
Why did you sleep late despite needing to be on time for the meeting?
Cause: There was a support call yesterday night, it took a while to find a solution.
Now you know how a support call caused driving recklessly. Solution should be informing the meeting organizer you won't be able to attend the meeting. No need for heroism.
Now consider a practical example.
Problem: Production server stopped responding for some inventory queries.
Why did the production server stop responding?
Cause: Database connection was timing out,
Evidence: Attached relevant entries from the log file...
Why was the database connection timing out?
Cause: There was a deadlock on a table.
Evidence: Attached entries from DB logs.
Why was there a deadlock?
Cause: New feature added caused this issue.
Evidence: Log file entries attached.
Why do new feature cause this issue?
Cause: Transaction handling was not right.
Evidence: Code changes for the fix.
Why did testing or code review not catch this issue?
Cause: Code review was rushed to include the important feature.
Evidence: Internal communication.
If you observe this RCA was done after the issue was fixed. Many times RCA is done while finding the solution for the issue. But it is not documented as RCA. We do it intuitively. It is recommended to document the RCA for all major breakdowns. Sometimes it brings out issues which are different from the fixed issue. There might be a better way to resolve the issue. During production issues, it is important to find the quickest solution. After the issue is fixed, take time to identify the root cause and fix it properly.
Some points to note when doing RCA with 5 whys.
In some cases there will be more than one root cause. In such cases you will see two answers to a why. They will diverge when you explore deeper.
In some cases they will converge into a single cause. Do not be discouraged when you find more than one cause. Explore them.
In cases where you find more than two root causes check the analysis. There might be some issues in the analysis. Such a system will be highly unstable.
Comments
Post a Comment