6 min read

Root Cause Analysis: Leonard, Howard, and the 5 Whys

By Amanda Babb on Mar 10, 2021 9:50:40 AM

Blogpost-display-image_Root Cause Analysis- Leonard, Howard, and the Five WhysDIY or DIE!

For those of you watching from home, I have been on a home improvement journey for quite some time. Applying an Agile mindset to home improvement (or really anything I do) is one of my passions. Even at my most recent Women in Agile meeting, we discussed applying Agile concepts to daily life and feeding these back into building a great resumé. One of the principles of the Agile Manifesto reads: At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly. We all know this applies to Agile development practices, but it also applies to IT Service Management. Specifically, Incident and Problem Management. For me, it applies to my recent home improvement adventure. 

Strong fences make great neighbors

My neighbor and I spent the better part of a Saturday fixing our mutual fence. You see, I have two dogs: Leonard and Howard.

 IMG_4511IMG_4512

Both are rescues. Leonard is eight and was "free to a good home" while Howard is four and was adopted from my county's animal shelter. Both dogs have been with us since their puppyhood and, as any dog owner will say, they are the BEST. DOGS. EVER. Except when they're not. This was not the first time my neighbor and I had to work on the fence. Observe one of the troublemakers in his natural habitat. 

IMG_4507

This epic saga started in May of last year. I would diligently fix loose boards, prop items against the fence to "patch" holes, and monitor their outdoor activity while I was awake (awake being the key word here: 3am barking and fence-patching sessions are no fun). I supplied my neighbor with fence planks because, well, they're my dogs. We fixed the section above and let the others lapse until a series of shenanigans prompted my neighbor and I to spend our Saturday replacing three additional sections. My neighbor and I became united in making sure my two didn't escape. While my neighbor "doesn't care" that my dogs are in his yard, my (very good) boys take the opportunity to break out of his fence and wander the neighborhood. Howard usually comes back, but Leonard meanders through the streets, swims in pools or the lake, and generally causes mayhem until I can coax him in my car to come home. 

IMG_4508

Not in my back yard...

Before this latest patch, I was determined to find the root cause. Previous to May of last year, this was not a problem. My puppers would frolic in the backyard and simply bark at other dogs in the neighborhood as they walked by. I made sure they were let out several times per day to make sure they were relieved in addition to daily walks. While I was traveling, they were also well-taken care of and monitored. What changed? 

Root cause analysis is, simply put, problem solving. While it is widely used in sciences and engineering, it is also a key element of IT Service Management Incident and Problem Management. When reacting to an incident, the team must restore functionality as quickly as possible. Upon resolution, root cause analysis helps us understand why. It then prompts us to ask, "Is there an action I can take to prevent this from happening again?" Incident Management leads to Problem management and through root cause analysis, we can move from a reactive organization to a proactive organization. 

Of the many techniques of root cause analysis, my favorite is the "Five Whys". It is the simplest technique: ask why until you've identified the root cause. Not like a petulant child, however. Asking the first why should be easy, then continuing to ask well-curated questions based on the previous answer helps you determine the root cause. I applied this to my situation: 

  • Why do I have to replace parts of the fence? 
    Because the dogs are chewing through the fence.
  • Why are the dogs chewing through the fence?
    Because they can access the backyard whenever they need.
  • Why can the dogs access the backyard whenever they need?
    Because we installed a dog door.

IMG_4509

HA! I found it. The root cause. And it didn't even take me all five whys. 

Any root cause analysis technique does not stand alone. There exists a plethora of other techniques. Pareto charts determine that 80 percent of your problems are derived from 20 percent of the causes. An Ishikawa (fishbone) diagram looks at measurement, materials, methods, machines, management, and mother nature. Scatter plots let us look at correlation and causation. Was the dog door the root cause? The existence of a dog door doesn't change the behavior of my boys. Having access to the backyard doesn't make them chew through the fence planks. Did we ask enough questions to actually identify the root cause? Did I also consider a Pareto analysis, an Ishikawa diagram, or a scatter graph to understand why I was constantly chasing my boys through the neighborhood? 

I stopped at three whys: "I have a dog door."

What happens if I keep asking why? 

  • Why did we install a dog door? 
    Because Howard wasn't fully potty trained. 
  • Why wasn't Howard fully potty trained? 
    Because I didn't take the necessary time to train him. 

AHA! My Ishikawa diagram identified "management" as the issue. My Pareto identified the 80 percent as my time to train my puppers. My scatter plot showed the amount of time spent correlated to the amount of dog-induced shenanigans. I would add these to the post, but won't because...reasons. More importantly, I simply kept asking, "Why?" until I identified the root cause. 

Actions speak louder than words

Now that I have a root cause, what is it that I can do to prevent this issue from recurring? When looking at Incident and Problem Management, Atlassian products such Opsgenie and Statuspage can ingest, aggregate, correlate, and trigger the creation of Jira Service Management issues. With Confluence, we can create specific root cause analysis templates to be shared with our customers and stakeholders. However, it's up to our techniques and processes to help us determine the actions we need to take going forward. 

For me and my puppers, it's simple. 

  1. Take at least 30 minutes out of my day for dedicated doggie exercise
  2. Reinforce good behavior while in the yard
  3. Lock the dog door overnight (no more 3AM "let me sing you the song of my people" moments)
  4. Finish replacing the aged planks on the fence

By taking these actions based on my root cause analysis, I should have this solved quickly with redundancies built in. My puppers will be safer and happier, I will have a beautiful new feature of my home, and the three of us will have less stress day-to-day. Using root cause analysis techniques, and Agile mindset, and drawing from IT Problem Management, I can easily solve this problem and any additional ones around my home.

BRB, gotta run and get some more fence planks.

IMG_4510

Topics: blog confluence plan problem statuspage incident-management itsm women-in-technology agile opsgenie jira-service-management health-check
5 min read

Tips for Performing a Successful Root Cause Analysis

By Mary Roper on Mar 5, 2021 10:55:01 AM

Blogpost-display-image_Tips for Performing a Successful Root Cause AnalysisRoot Cause Analysis: The Under-appreciated Hero

When implementing an IT Service Management (ITSM) system, I always look forward to spending time on root cause analysis (RCA). Of course Incident and Problem Management play the central role in ITSM design- it's crucial to give your teams, customers, and systems intuitive ways to communicate when something has gone wrong. However, it is equally important that organizations spend time identifying the key driver of these problems by performing an RCA to prevent them from reoccurring. This is because, at the end of the day, incidents and problems cost your organization money, and a good RCA can help save it. It's this viewpoint that has led me to dub RCA the under-appreciated hero of ITSM and in this post I will share with you the aspects of a successful RCA that can help vanquish problems once and for all. 

It's important to distinguish between Problem Management and Incident Management. In broad strokes: the goal of Problem Management is to get to root cause, and we can understand its goal to be increasing the meantime between failures by determining root cause of one or more incidents thereby addressing with appropriate change to prevent recurrence of the incident; in this sense it's a proactive approach. On the other hand, Incident Management's goal is to reduce the meantime to recovery by responding and resolving fast; its approach is reactive.

What is Root Cause Analysis?

The core function of root cause analysis is to uncover the core reason why a problem occurred. While there are many different tools and approaches to perform an RCA, I've consolidated the key steps into the diagram below: 

Root Cause Analysis Blog Post

  • Define the problem: First, make sure you and your teams align on "What happened?" and are speaking to the same problem.
  • Collect Data: Then, the focus needs to be "How did this happen?" and gathering data around the problem, whether customer testimony or incident reports.
  • Identify Casual Factors: Casual factors also help to answer "How did this happen," and in this step, teams should be guided to identifying fixable causes.
  • Identify the Root Cause: Next, teams should leverage one of the techniques of the RCA process, such as the "Five Whys," Fishbone Diagram, or Fault-Tree Analysis, to drive to the root cause of all the causal factors. 
  • Recommend and Test the Solution: After the root cause has been identified, teams should work to develop a solution that gets recommended to the Executive team for approval before testing can begin. Once approved, the solution should enter a testing phase, where it can be rolled back if not successful. 
  • Implement and Monitor: Once the solution is implemented, teams should continue to monitor it in the production environment to ensure that it is working as expected. This active analysis step is why RCA is depicted as a cycle; if the solution did not resolve the problem, it could be that the problem was a casual factor and the team needs to begin the RCA process again. 

Why Does It Matter?

I've worked with teams who have a well-defined RCA process and others who are just beginning. I reference this diagram when we focus on RCA because it helps to illustrate how simple of a process RCA can be. There aren't rigid guidelines or rules to follow; organizations can adopt their own RCA policies. What many don't realize, especially those who have yet to adopt RCA as a business process, is that it has a big pay-off: cost savings.

Root cause analysis can be a cost saving tool for organizations for a couple of reasons. First, identifying and acting on problems early saves money. The longer a problem goes on the more money it costs the organization, and a properly deployed RCA process is built to help organizations become more proactive rather than reactive. Second, the main goal of the RCA process is to prevent incidents from cropping up again. If the incident does not reoccur, then there won't be downtime or lost production, saving money in the long run.  

How Can I Help My Organization Embrace RCA?

When working with organizations to implement an RCA process, there are several aspects that I help coach my clients on which can help the organization embrace RCA. They are:

  1. Talk about what went well.....and what could have gone better
    1. When the team is starting the RCA process, guide them to start by discussing what happened and framing the problem. Then, go one step further and document what went well. This will provide you data and help to explain what is not the issue or what not to blame. It's equally important to talk about what could have gone better, as this will likely begin the discussion and documentation of your causal factors. 
  2. Make it work for you
    1. In some organizations, "Root Cause Analysis" can be viewed as too formal and intimidating. I've come across some resistance to them due to their structure or even the invitee list. For these reasons, it's important to make sure you're adopting a RCA structure that feels natural for your organization. This could mean:
      1. Being mindful of the attendees, especially if the invitees include senior management and above. Ensure you include the right people in the room at the right time. Your front line team has the most firsthand knowledge of the systems or processes, and you will want them to feel comfortable participating candidly in any discovery meetings. 
      2. Having a neutral party leading the meetings. The leader shouldn't have anything to gain by the results of the RCA process and should be able to maintain a "blame free" atmosphere.
      3. Reframing RCA as something more approachable, such as a "Lessons Learned meeting,"  where the RCA process is still followed, but in a less formal way. Feedback and idea can be gathered via sticky notes and shared on a board so that it is anonymous for example. 
  3. Root causes can only solve one problem
    1. Remember that the main goal of RCA is to avoid future incidents. Teams should not be applying a previous root cause to a current or future problem- if that is the case, then it indicates that rather than identifying the root cause, the team actually identified a casual factor. In these instances, I've coached teams to go back and take their RCA process one step deeper, for example asking another "Why" question if the "Five Whys" is used. 

The goal of Problem Management is to get to root cause. Incident Mgmt goal: reduce the meantime to recovery (by responding and resolving fast); reactive
Problem Mgmt goal: increase the meantime between failures (by determining root cause of one or more incidents thereby addressing with appropriate change to prevent recurrence of the incident); proactive.

Ultimately, where incidents and problems cost your organizations money, RCA saves it. It is for this reason that I think of RCA as an under-appreciated hero of ITSM. While the biggest barrier to accomplishing RCA can be time, putting in the time upfront to accomplish the RCA process will prevent repeat incidents from cropping up, saving your company time and resources in the long run. By implementing a few of these tips, I hope you come to appreciate RCA as I have, and if you have any questions let us know, we'd love to help. 

Topics: blog plan incident-management itsm health-check

Praecipio Consulting is an Atlassian Platinum Partner

This means that we have the most experience working with Atlassian tools and have insight into new products, features, and beta testing. Through our profound knowledge of Atlassian environments and their intricacies, we can guide your organization as you navigate these important changes.

Atlassian-Platinum-Solution-Partner

In need of professional assistance?

WE'VE GOT YOUR BACK

Contact Us