7 min read

Root Cause Analysis: Leonard, Howard, and the 5 Whys

By Amanda Babb on Mar 10, 2021 9:50:40 AM

Blogpost-display-image_Root Cause Analysis- Leonard, Howard, and the Five WhysDIY or DIE!

For those of you watching from home, I have been on a home improvement journey for quite some time. Applying an Agile mindset to home improvement (or really anything I do) is one of my passions. Even at my most recent Women in Agile meeting, we discussed applying Agile concepts to daily life and feeding these back into building a great resumé. One of the principles of the Agile Manifesto reads: At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly. We all know this applies to Agile development practices, but it also applies to IT Service Management. Specifically, Incident and Problem Management. For me, it applies to my recent home improvement adventure. 

Strong fences make great neighbors

My neighbor and I spent the better part of a Saturday fixing our mutual fence. You see, I have two dogs: Leonard and Howard.

 IMG_4511IMG_4512

Both are rescues. Leonard is eight and was "free to a good home" while Howard is four and was adopted from my county's animal shelter. Both dogs have been with us since their puppyhood and, as any dog owner will say, they are the BEST. DOGS. EVER. Except when they're not. This was not the first time my neighbor and I had to work on the fence. Observe one of the troublemakers in his natural habitat. 

IMG_4507

This epic saga started in May of last year. I would diligently fix loose boards, prop items against the fence to "patch" holes, and monitor their outdoor activity while I was awake (awake being the key word here: 3am barking and fence-patching sessions are no fun). I supplied my neighbor with fence planks because, well, they're my dogs. We fixed the section above and let the others lapse until a series of shenanigans prompted my neighbor and I to spend our Saturday replacing three additional sections. My neighbor and I became united in making sure my two didn't escape. While my neighbor "doesn't care" that my dogs are in his yard, my (very good) boys take the opportunity to break out of his fence and wander the neighborhood. Howard usually comes back, but Leonard meanders through the streets, swims in pools or the lake, and generally causes mayhem until I can coax him in my car to come home. 

IMG_4508

Not in my back yard...

Before this latest patch, I was determined to find the root cause. Previous to May of last year, this was not a problem. My puppers would frolic in the backyard and simply bark at other dogs in the neighborhood as they walked by. I made sure they were let out several times per day to make sure they were relieved in addition to daily walks. While I was traveling, they were also well-taken care of and monitored. What changed? 

Root cause analysis is, simply put, problem solving. While it is widely used in sciences and engineering, it is also a key element of IT Service Management Incident and Problem Management. When reacting to an incident, the team must restore functionality as quickly as possible. Upon resolution, root cause analysis helps us understand why. It then prompts us to ask, "Is there an action I can take to prevent this from happening again?" Incident Management leads to Problem management and through root cause analysis, we can move from a reactive organization to a proactive organization. 

Of the many techniques of root cause analysis, my favorite is the "Five Whys". It is the simplest technique: ask why until you've identified the root cause. Not like a petulant child, however. Asking the first why should be easy, then continuing to ask well-curated questions based on the previous answer helps you determine the root cause. I applied this to my situation: 

  • Why do I have to replace parts of the fence? 
    Because the dogs are chewing through the fence.
  • Why are the dogs chewing through the fence?
    Because they can access the backyard whenever they need.
  • Why can the dogs access the backyard whenever they need?
    Because we installed a dog door.

IMG_4509

HA! I found it. The root cause. And it didn't even take me all five whys. 

Any root cause analysis technique does not stand alone. There exists a plethora of other techniques. Pareto charts determine that 80 percent of your problems are derived from 20 percent of the causes. An Ishikawa (fishbone) diagram looks at measurement, materials, methods, machines, management, and mother nature. Scatter plots let us look at correlation and causation. Was the dog door the root cause? The existence of a dog door doesn't change the behavior of my boys. Having access to the backyard doesn't make them chew through the fence planks. Did we ask enough questions to actually identify the root cause? Did I also consider a Pareto analysis, an Ishikawa diagram, or a scatter graph to understand why I was constantly chasing my boys through the neighborhood? 

I stopped at three whys: "I have a dog door."

What happens if I keep asking why? 

  • Why did we install a dog door? 
    Because Howard wasn't fully potty trained. 
  • Why wasn't Howard fully potty trained? 
    Because I didn't take the necessary time to train him. 

AHA! My Ishikawa diagram identified "management" as the issue. My Pareto identified the 80 percent as my time to train my puppers. My scatter plot showed the amount of time spent correlated to the amount of dog-induced shenanigans. I would add these to the post, but won't because...reasons. More importantly, I simply kept asking, "Why?" until I identified the root cause. 

Actions speak louder than words

Now that I have a root cause, what is it that I can do to prevent this issue from recurring? When looking at Incident and Problem Management, Atlassian products such Opsgenie and Statuspage can ingest, aggregate, correlate, and trigger the creation of Jira Service Management issues. With Confluence, we can create specific root cause analysis templates to be shared with our customers and stakeholders. However, it's up to our techniques and processes to help us determine the actions we need to take going forward. 

For me and my puppers, it's simple. 

  1. Take at least 30 minutes out of my day for dedicated doggie exercise
  2. Reinforce good behavior while in the yard
  3. Lock the dog door overnight (no more 3AM "let me sing you the song of my people" moments)
  4. Finish replacing the aged planks on the fence

By taking these actions based on my root cause analysis, I should have this solved quickly with redundancies built in. My puppers will be safer and happier, I will have a beautiful new feature of my home, and the three of us will have less stress day-to-day. Using root cause analysis techniques, and Agile mindset, and drawing from IT Problem Management, I can easily solve this problem and any additional ones around my home.

BRB, gotta run and get some more fence planks.

IMG_4510

Topics: blog confluence plan problem statuspage incident-management itsm women-in-technology agile opsgenie jira-service-management health-check
5 min read

Tips for Performing a Successful Root Cause Analysis

By Mary Roper on Mar 5, 2021 10:55:01 AM

Blogpost-display-image_Tips for Performing a Successful Root Cause AnalysisRoot Cause Analysis: The Under-appreciated Hero

When implementing an IT Service Management (ITSM) system, I always look forward to spending time on root cause analysis (RCA). Of course Incident and Problem Management play the central role in ITSM design- it's crucial to give your teams, customers, and systems intuitive ways to communicate when something has gone wrong. However, it is equally important that organizations spend time identifying the key driver of these problems by performing an RCA to prevent them from reoccurring. This is because, at the end of the day, incidents and problems cost your organization money, and a good RCA can help save it. It's this viewpoint that has led me to dub RCA the under-appreciated hero of ITSM and in this post I will share with you the aspects of a successful RCA that can help vanquish problems once and for all. 

It's important to distinguish between Problem Management and Incident Management. In broad strokes: the goal of Problem Management is to get to root cause, and we can understand its goal to be increasing the meantime between failures by determining root cause of one or more incidents thereby addressing with appropriate change to prevent recurrence of the incident; in this sense it's a proactive approach. On the other hand, Incident Management's goal is to reduce the meantime to recovery by responding and resolving fast; its approach is reactive.

What is Root Cause Analysis?

The core function of root cause analysis is to uncover the core reason why a problem occurred. While there are many different tools and approaches to perform an RCA, I've consolidated the key steps into the diagram below: 

Root Cause Analysis Blog Post

  • Define the problem: First, make sure you and your teams align on "What happened?" and are speaking to the same problem.
  • Collect Data: Then, the focus needs to be "How did this happen?" and gathering data around the problem, whether customer testimony or incident reports.
  • Identify Casual Factors: Casual factors also help to answer "How did this happen," and in this step, teams should be guided to identifying fixable causes.
  • Identify the Root Cause: Next, teams should leverage one of the techniques of the RCA process, such as the "Five Whys," Fishbone Diagram, or Fault-Tree Analysis, to drive to the root cause of all the causal factors. 
  • Recommend and Test the Solution: After the root cause has been identified, teams should work to develop a solution that gets recommended to the Executive team for approval before testing can begin. Once approved, the solution should enter a testing phase, where it can be rolled back if not successful. 
  • Implement and Monitor: Once the solution is implemented, teams should continue to monitor it in the production environment to ensure that it is working as expected. This active analysis step is why RCA is depicted as a cycle; if the solution did not resolve the problem, it could be that the problem was a casual factor and the team needs to begin the RCA process again. 

Why Does It Matter?

I've worked with teams who have a well-defined RCA process and others who are just beginning. I reference this diagram when we focus on RCA because it helps to illustrate how simple of a process RCA can be. There aren't rigid guidelines or rules to follow; organizations can adopt their own RCA policies. What many don't realize, especially those who have yet to adopt RCA as a business process, is that it has a big pay-off: cost savings.

Root cause analysis can be a cost saving tool for organizations for a couple of reasons. First, identifying and acting on problems early saves money. The longer a problem goes on the more money it costs the organization, and a properly deployed RCA process is built to help organizations become more proactive rather than reactive. Second, the main goal of the RCA process is to prevent incidents from cropping up again. If the incident does not reoccur, then there won't be downtime or lost production, saving money in the long run.  

How Can I Help My Organization Embrace RCA?

When working with organizations to implement an RCA process, there are several aspects that I help coach my clients on which can help the organization embrace RCA. They are:

  1. Talk about what went well.....and what could have gone better
    1. When the team is starting the RCA process, guide them to start by discussing what happened and framing the problem. Then, go one step further and document what went well. This will provide you data and help to explain what is not the issue or what not to blame. It's equally important to talk about what could have gone better, as this will likely begin the discussion and documentation of your causal factors. 
  2. Make it work for you
    1. In some organizations, "Root Cause Analysis" can be viewed as too formal and intimidating. I've come across some resistance to them due to their structure or even the invitee list. For these reasons, it's important to make sure you're adopting a RCA structure that feels natural for your organization. This could mean:
      1. Being mindful of the attendees, especially if the invitees include senior management and above. Ensure you include the right people in the room at the right time. Your front line team has the most firsthand knowledge of the systems or processes, and you will want them to feel comfortable participating candidly in any discovery meetings. 
      2. Having a neutral party leading the meetings. The leader shouldn't have anything to gain by the results of the RCA process and should be able to maintain a "blame free" atmosphere.
      3. Reframing RCA as something more approachable, such as a "Lessons Learned meeting,"  where the RCA process is still followed, but in a less formal way. Feedback and idea can be gathered via sticky notes and shared on a board so that it is anonymous for example. 
  3. Root causes can only solve one problem
    1. Remember that the main goal of RCA is to avoid future incidents. Teams should not be applying a previous root cause to a current or future problem- if that is the case, then it indicates that rather than identifying the root cause, the team actually identified a casual factor. In these instances, I've coached teams to go back and take their RCA process one step deeper, for example asking another "Why" question if the "Five Whys" is used. 

The goal of Problem Management is to get to root cause. Incident Mgmt goal: reduce the meantime to recovery (by responding and resolving fast); reactive
Problem Mgmt goal: increase the meantime between failures (by determining root cause of one or more incidents thereby addressing with appropriate change to prevent recurrence of the incident); proactive.

Ultimately, where incidents and problems cost your organizations money, RCA saves it. It is for this reason that I think of RCA as an under-appreciated hero of ITSM. While the biggest barrier to accomplishing RCA can be time, putting in the time upfront to accomplish the RCA process will prevent repeat incidents from cropping up, saving your company time and resources in the long run. By implementing a few of these tips, I hope you come to appreciate RCA as I have, and if you have any questions let us know, we'd love to help. 

Topics: blog plan incident-management itsm health-check
3 min read

Three Things No One Tells You About Custom Fields in Jira

By Mary Roper on Mar 4, 2021 12:19:10 PM

Blogpost-display-image_Three Things No One Tells You About Custom FieldsCustom fields can be an over-looked configuration point in Jira, and it's easy to see why: they're easy to create, modify, and make available for your users. Although Jira ships with several system fields, it's inevitable that teams using Jira will reach a point where they require additional fields to input specific information into their issues. But in order to maintain Jira's performance as well as instance hygiene, it's important that Administrators take great care when it comes to custom field creation. That's why today we're sharing with you a few custom field insights we've gleaned over the years. Read on to learn three things no one tells you about custom fields. 

1. Technically, there is no limit to the number of custom fields you can have. BUT...

Custom fields do impact system performance in Jira. Below are some recent results breaking down each configuration item's impact on Jira. Here, we can see that custom fields have an impact on the speed of running a large instance. Your teams may feel this impact in the load time of issue screens. As an admin, one indication can be having a long page of custom fields to scroll through. Additionally, this is often accompanied by longer than usual load time for the custom field Administration page. 

Response Times for Jira Data Sets

To combat this, Jira Administrators should partner with the requestor and other impacted users to determine some guidelines for creating custom fields. For instance, requiring the requestor to submit an example of how they plan to report on the custom field or having the Administrator ensure the custom field can be used in the majority of projects (>=80%). Execution is crucial here: once the guidelines are aligned with management and stakeholders, it's crucial they are followed to prevent your custom field list from unnecessarily growing.

2. There are native alternatives to custom fields.

There are a few usual suspects to look for when reviewing custom fields. Duplicate custom fields ("Additional Comments" as a supplement to the "Comments" system field), variations of custom fields ("Vendor" vs "Vendors"), and department specific custom fields ("Company ABC" vs "Vendor") are a few custom fields that can needlessly drive up your custom field count. To prevent this from happening, Admins can offer their business partners alternative suggestions to creating a custom field by looking at the following:

  1. Utilize an existing custom field that may be more general, but is fit for the purpose to get the most out of what is already in place.
  2. Rather than implementing a custom field, Labels or Components can be used to help organize issues and categorize them for future reporting.
  3. Apply a custom field context to help maximize the potential for picker, select, checkbox, and radio button custom field types. Adding field context enables Administrators to pair different custom field select options or their default values to specific projects or issue types within the same project.

3. You can proactively manage custom fields.

Rather than waiting for custom fields to pile on and create a lag on the instance speed time, proactively scheduling time to scrub your instance for stale custom fields will help Administrators keep on top of their custom field list. This can be a visual check to understand what fields aren't associated to a screen- those are good candidates for removal- or if there are similarly named fields- those can be good candidates for consolidation. More information from Atlassian on how to identify and clean up these fields can be found here.

Ultimately, a well-maintained Jira instance includes a good understanding of custom fields and their overall impact on the system. As your instance grows overtime, the guidelines around custom field development will become all the more important. Bringing these tips to life will help your instance run at top speeds for your users. 

Need help making the best out of your Jira instance? Our experts know Jira inside-out: contact us and we'll get in touch!

Topics: jira blog best-practices optimization standardize configuration bespoke health-check
3 min read

Tips for maintaining a Jira instance

By Chris Hofbauer on Feb 11, 2021 12:07:37 PM

Blogpost-display-image_Tips for maintaining a Jira instanceAtlassian's Jira is a powerful tool to promote best practices of internal processes and provide efficiency to development teams within your organization. The powerful nature of the tool is not only with the features offered by Atlassian but with a vast variety of options at your disposal to customize the instance. These customizations can come from the native features and options available as well as the apps brought to you via the Atlassian Marketplace. While these can all be great in building your Jira instance to get the most out of it, they can also have the potential to be detrimental to the health of the instance and negatively affect your organization's teams. 

Marketplace apps

Following best practices when configuring your instance as well as proper control over the integrations added to your instance is critical. If not properly managed you can experience system issues resulting in downtime due to a number of reasons but most commonly high memory or CPU. While installing apps through the marketplace may seem trivial and rather safe, keep in mind that each install of these apps does modify the database and can also be creating items such as custom fields in your instance. Make sure to properly vet all apps, check the reviews in the marketplace for any reports of impact to the instance. Also, review any documentation for the app to see how the application integrates with your instance. Most importantly it's highly recommended to install any apps in a lower environment (Dev or QA) before installing it in production. Thoroughly testing all new installs will give you the best idea of how the application will impact your instance once installed into production. 

Configuration

In addition to the configuration items created by apps are the ones created manually. Being mindful when adding items such as custom fields, statuses, workflows, etc. can save headaches long-term. It's important to reuse configuration items wherever possible. Having numerous, similar or duplicate, custom fields and statuses will create an administrative burden. Having a large number of these items will also have an impact on exporting issues and projects as well as for instance performance when loading reports, project boards, and dashboards. 

User Management

Proper user management will help to keep licensing costs to a minimum as well as give better control over access to the instance. Use groups wherever possible in permission schemes, boards, and filters. Provide only Jira administrator access and Service Desk agent licenses to those that need it. All users may not need Service Desk agent licenses and since these are billed separately in the instance, assigning all users to the Service Desk group can incur unnecessary charges going forward. Frequent review of active users is important as well. Based on business rules, users who have not logged in for some time (3 to 6 months) may be able to be made inactive. Frequent review of these types of users will also allow you to keep access to a minimum, save licensing counts, and in turn reduce user tier costs.

Stale Data

Review stale or old data is critical in maintaining a Jira instance as well. Instances will begin to grow over time and as your organization and teams grow, so will the ticket count in your instance. The larger the instance size, the high likelihood for performance degradation and instance issues. Analyzing your instance for stale old data is a key step in maintaining a healthy instance. For stale data, take a look at any unresolved tickets as well as any older tickets that have no resolution or that are not in a "Closed" status. You will also want to review any projects that have not had a ticket created in them for a long period of time (we generally recommend 3 to 6 months). After thorough analysis, you will want to close any stale tickets and archive any projects that are deemed to no longer be in use. 

Praecipio Consulting's Managed Services

Praecipio Consulting offers guidance and services to help maintain your Jira instance and provide you with industry best practices. Through years of experience, we at Praecipio have developed a wealth of knowledge in properly configuring and managing Atlassian products that will ensure you get the most out of the product for every use case in your organization. As part of our Managed Services offering, we deploy our proprietary Health Checks. These Health Checks include a thorough review of various aspects of maintaining your instance. Praecipio's Health Checks are split into two main categories: Infrastructure and Process; and include topics such as Licensing, Database Health, Security Vulnerabilities, User Management, Upgrade Readiness, Performance, Process Consolidation, Stale Data, apps/App and Workflows. With these Health Checks and working with Praecipio Consulting's Managed Services, your instance will be in an optimal state for growth and longevity.

Topics: atlassian blog best-practices managed-services optimization health-check

Praecipio Consulting is an Atlassian Platinum Partner

This means that we have the most experience working with Atlassian tools and have insight into new products, features, and beta testing. Through our profound knowledge of Atlassian environments and their intricacies, we can guide your organization as you navigate these important changes.

atlassian-platinum-solution-partner-enterprise

In need of professional assistance?

WE'VE GOT YOUR BACK

Contact Us