7 min read

Root Cause Analysis: Leonard, Howard, and the 5 Whys

By Amanda Babb on Mar 10, 2021 9:50:40 AM

Blogpost-display-image_Root Cause Analysis- Leonard, Howard, and the Five WhysDIY or DIE!

For those of you watching from home, I have been on a home improvement journey for quite some time. Applying an Agile mindset to home improvement (or really anything I do) is one of my passions. Even at my most recent Women in Agile meeting, we discussed applying Agile concepts to daily life and feeding these back into building a great resumé. One of the principles of the Agile Manifesto reads: At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly. We all know this applies to Agile development practices, but it also applies to IT Service Management. Specifically, Incident and Problem Management. For me, it applies to my recent home improvement adventure. 

Strong fences make great neighbors

My neighbor and I spent the better part of a Saturday fixing our mutual fence. You see, I have two dogs: Leonard and Howard.

 IMG_4511IMG_4512

Both are rescues. Leonard is eight and was "free to a good home" while Howard is four and was adopted from my county's animal shelter. Both dogs have been with us since their puppyhood and, as any dog owner will say, they are the BEST. DOGS. EVER. Except when they're not. This was not the first time my neighbor and I had to work on the fence. Observe one of the troublemakers in his natural habitat. 

IMG_4507

This epic saga started in May of last year. I would diligently fix loose boards, prop items against the fence to "patch" holes, and monitor their outdoor activity while I was awake (awake being the key word here: 3am barking and fence-patching sessions are no fun). I supplied my neighbor with fence planks because, well, they're my dogs. We fixed the section above and let the others lapse until a series of shenanigans prompted my neighbor and I to spend our Saturday replacing three additional sections. My neighbor and I became united in making sure my two didn't escape. While my neighbor "doesn't care" that my dogs are in his yard, my (very good) boys take the opportunity to break out of his fence and wander the neighborhood. Howard usually comes back, but Leonard meanders through the streets, swims in pools or the lake, and generally causes mayhem until I can coax him in my car to come home. 

IMG_4508

Not in my back yard...

Before this latest patch, I was determined to find the root cause. Previous to May of last year, this was not a problem. My puppers would frolic in the backyard and simply bark at other dogs in the neighborhood as they walked by. I made sure they were let out several times per day to make sure they were relieved in addition to daily walks. While I was traveling, they were also well-taken care of and monitored. What changed? 

Root cause analysis is, simply put, problem solving. While it is widely used in sciences and engineering, it is also a key element of IT Service Management Incident and Problem Management. When reacting to an incident, the team must restore functionality as quickly as possible. Upon resolution, root cause analysis helps us understand why. It then prompts us to ask, "Is there an action I can take to prevent this from happening again?" Incident Management leads to Problem management and through root cause analysis, we can move from a reactive organization to a proactive organization. 

Of the many techniques of root cause analysis, my favorite is the "Five Whys". It is the simplest technique: ask why until you've identified the root cause. Not like a petulant child, however. Asking the first why should be easy, then continuing to ask well-curated questions based on the previous answer helps you determine the root cause. I applied this to my situation: 

  • Why do I have to replace parts of the fence? 
    Because the dogs are chewing through the fence.
  • Why are the dogs chewing through the fence?
    Because they can access the backyard whenever they need.
  • Why can the dogs access the backyard whenever they need?
    Because we installed a dog door.

IMG_4509

HA! I found it. The root cause. And it didn't even take me all five whys. 

Any root cause analysis technique does not stand alone. There exists a plethora of other techniques. Pareto charts determine that 80 percent of your problems are derived from 20 percent of the causes. An Ishikawa (fishbone) diagram looks at measurement, materials, methods, machines, management, and mother nature. Scatter plots let us look at correlation and causation. Was the dog door the root cause? The existence of a dog door doesn't change the behavior of my boys. Having access to the backyard doesn't make them chew through the fence planks. Did we ask enough questions to actually identify the root cause? Did I also consider a Pareto analysis, an Ishikawa diagram, or a scatter graph to understand why I was constantly chasing my boys through the neighborhood? 

I stopped at three whys: "I have a dog door."

What happens if I keep asking why? 

  • Why did we install a dog door? 
    Because Howard wasn't fully potty trained. 
  • Why wasn't Howard fully potty trained? 
    Because I didn't take the necessary time to train him. 

AHA! My Ishikawa diagram identified "management" as the issue. My Pareto identified the 80 percent as my time to train my puppers. My scatter plot showed the amount of time spent correlated to the amount of dog-induced shenanigans. I would add these to the post, but won't because...reasons. More importantly, I simply kept asking, "Why?" until I identified the root cause. 

Actions speak louder than words

Now that I have a root cause, what is it that I can do to prevent this issue from recurring? When looking at Incident and Problem Management, Atlassian products such Opsgenie and Statuspage can ingest, aggregate, correlate, and trigger the creation of Jira Service Management issues. With Confluence, we can create specific root cause analysis templates to be shared with our customers and stakeholders. However, it's up to our techniques and processes to help us determine the actions we need to take going forward. 

For me and my puppers, it's simple. 

  1. Take at least 30 minutes out of my day for dedicated doggie exercise
  2. Reinforce good behavior while in the yard
  3. Lock the dog door overnight (no more 3AM "let me sing you the song of my people" moments)
  4. Finish replacing the aged planks on the fence

By taking these actions based on my root cause analysis, I should have this solved quickly with redundancies built in. My puppers will be safer and happier, I will have a beautiful new feature of my home, and the three of us will have less stress day-to-day. Using root cause analysis techniques, and Agile mindset, and drawing from IT Problem Management, I can easily solve this problem and any additional ones around my home.

BRB, gotta run and get some more fence planks.

IMG_4510

Topics: blog confluence plan problem statuspage incident-management itsm women-in-technology agile opsgenie jira-service-management health-check
4 min read

Leveraging Statuspage To Support Remote Teams

By Larry Brock on May 15, 2020 9:15:00 AM

2020 Blogposts_Statuspage

As many writers from a variety of perspectives have observed, we are truly living in interesting times. Before we get into how StatusPage helps remote workers, I would like to express my sincerest wish that you and your loved ones are safe and remain so through the COVID-19 pandemic.

Due to the state of business in which we now find ourselves, many companies have transitioned their operations and workforce to a more distributed model. This has exposed or amplified many procedural failures, demonstrating how the severity of some issues can significantly impact business success. 

I have experienced a few of these workforce transitions and almost without exception, the underlying failure already existed. Because people worked in close proximity to each other, they were able to mask this issue at hand, and that issue is only exposed when the proximity changes. A multitude of reasons can explain why this change may occur, with one being that the business is experiencing phenomenal growth, or like our current state of the world, maybe external factors are the cause. Regardless of the reason, change is inevitable and processes must adjust, adapt, and improve.

There's currently a plethora of information flooding the web on better ways to work, and while a lot of them are useful, what I haven't seen is content focused on how to make better use of tools you already have to solve some of these new challenges related to external forces.

Many organizations today use the fantastic Atlassian tool Statuspage to communicate the status of their services to their customers, users, and possibly any interested party on the Internet. What these organizations may not realize is that Statuspage is also a great way to communicate important information regarding system availability to their internal staff.

So, how can Statuspage be used to do this? Well, I'm glad you asked! Consider these situations:

  • A staff member needs an item from their desk and wants to know if the office is open and accessible.
  • A staff member expects the VPN gateway that they use to access internal systems to be fully operational, 24x7.
  • The staff of a particular department needs to receive timely updates on developments that may affect them as they attempt to complete a particular task.

In each of these situations, the need for information is similar in nature to that of a customer: both are trying to stay updated and informed

If you're already using Statuspage to communicate to your customers, then you know and appreciate the power of letting customers determine what information they receive and the channel through which they receive it. Now, let's revisit the above situations, but with Statuspage:

  • A staff member who subscribes to the company's Internal Operations Status page will have already received an SMS message, a phone call, and/or an email from Statuspage about the closure of some offices when the incident was created under the Physical Locations service. They can check this message for more details or better yet, visit the Statuspage to see up-to-the-minute information regarding building access.
  • A staff member who is having trouble with VPN can check Statuspage to see if there are any notices about VPN, or they can subscribe to the incident in-progress to get updates and know when they can safely resume their connection.
  • The team that builds out your data center infrastructure or computer cluster is waiting on delayed equipment to arrive before converging on the datacenter from their various shelter-in-place locations. With Statuspage, you can easily broadcast the update about the equipment arrival to subscribers who follow the incident related to the delay. 

These are just a few examples of how you can utilize Statuspage to arm your staff with valuable information using a tool that you already have available. Not using Statuspage? Look into this powerful communication tool, which eliminates the guesswork of who to contact and how, not to mention that it allows teams to focus more on their key functions of serving the organization. 

If you would like to learn more about how to leverage Statuspage, check out our webinar. We also have some great resources available on how different tools can help your remote teams, such as Workato and Jira

Topics: atlassian statuspage work-from-home

Praecipio Consulting is an Atlassian Platinum Partner

This means that we have the most experience working with Atlassian tools and have insight into new products, features, and beta testing. Through our profound knowledge of Atlassian environments and their intricacies, we can guide your organization as you navigate these important changes.

atlassian-platinum-solution-partner-enterprise

In need of professional assistance?

WE'VE GOT YOUR BACK

Contact Us