3 min read

Incident Management Best Practices

By Charlotte D’Alfonso on Nov 1, 2022 12:00:20 PM

1102x402 - Blog Featured (55)

A company's users cannot access their reports. A company's website is down for 40% of their users. A new firewall rules causes integration with a channel partner to fail. A user can not change their address in their profile. Incidents don't just impact your users. Your bottom line also takes a hit with lost data, employee time, and loss of revenue. What is going on and how do we stop it? The answer begins with having a strong incident management process.

What is incident management?

What are incidents? Incidents are unplanned events that disrupt or reduce the quality of your service (or threaten to do so).

A major incident is a critical disruption to a service that requires an emergency response. It has high impact, and involves many people to resolve. A minor incident is low impact and a front-line customer service agent can resolve.  

Incident management is the process of responding immediately when something goes wrong and restoring service to its operational state. This is one of the core IT Service Management guiding practices. Effective incident management requires a strong team culture, an incident management guiding practice and tools such as Atlassian's Jira Service Management which can be integrated with other third party tools.

Challenges of incident management include:

  • Frequency of major incidents and outages
  • Use of multiple ticketing and monitoring systems and communication outlets potentially preventing effective automation, possible data loss and difficulty in learning from the incident
  • Alert overload potentially leading to long and undetected outages
  • Configuration management difficulties leading to long diagnostic cycles
  • Poor communication and visibility

Best Practices and Incident Management Life Cycle

  • Have a single source of truth.
  • Follow a process.
  • Utilize a workflow where you can put safeguards around each step.
  • Have a response team designated in advance so work is delegated to the right person/people.
  • Automate any activities, notifications, alerts that will help shorten the process.

The Lifecycle with tips to improve

1. Detect - Use monitor and alerting tools that will automatically detect and inform your team about an incident before your customers even notice.

2. Classify and respond - Assess the impact and classify it to help in the response by the appropriate team. Prioritizing and categorizing the level of incident into major/minor allows you to escalate the incident to the right people immediately if it needs a swarm of people to tackle the issue.

3. Communicate - Communicating quickly and regularly about incidents helps to build trust with customers. Automating communications can deliver a consistent message.

4. Investigate and diagnose - Leverage a Configuration Management Database (CMDB) for a faster resolution. A CMDB helps the response team understand the interdependencies and relationships within your IT infrastructure. Knowing this not only allows you to better diagnose potential causes of the incident but also correct any domino effects of the incident. Set up a an internal communication channel so that your response team can work together. 

5. Learn and improve - Determine what can be done to prevent similar incidents from happening in the future and what actions were taken to mitigate and resolve the incident. This is called an "incident postmortem" or "post-incident review." This is also where you can determine service improvement and help identify better ways of working across teams.

Conclusion

How do you completely eliminate future incidents? You don't! Trying to do so will slow your organization down. It will add complexity and too many checks to your software development process. The goal instead is to resolve incidents quickly and reduce future incidents by continuously learning and improving. Want to learn how to modernize your IT operations, facilitate collaboration, and deliver new services with agility? Download our eBook that walks you through ITSM practices that are essential for keeping up with today's fast-paced world and accelerating business transformation.

Engaging with an expert in full solutions will help you embed best practices into your organization and reduce incidents. Praecipio is here to help guide you in all steps of software development and best practices. If you'd like to chat with an expert, drop us a line; we'd be happy to help.

Topics: incident-management itsm
6 min read

How Atlassian Cloud Enables Organizations To Scale ITSM Practices

By Praecipio on Sep 8, 2022 10:00:00 AM

1102x402 - Blog Featured (16)

Cloud-based ITSM use is rapidly becoming prevalent across several different industries. The global cloud ITSM market is expected to increase with an annual growth rate of 22.3 percent between 2022 and 2030.

Why is this?

Choosing a cloud-based solution for your ITSM strategy can significantly increase the speed of your IT service delivery and save you money by reducing admin costs. But what works for a small organization can quickly fall apart when presented with the challenges of big-scale growth and the impact scaling has on your resources. 

To help you scale successfully, Atlassian Cloud offers features that enable you to extend your ITSM practices across different teams in your enterprise. 

Scaling ITSM with Atlassian Cloud

Atlassian Cloud allows you to scale IT Service Management (ITSM) seamlessly with features that help your organization overcome the barriers and difficulties of introducing new tools, services, and processes.

Uptime 

With ITSM, your entire planning, development, and release processes are grounded in customer satisfaction. If you experience an outage or other downtime, the ITSM goal of serving your customers well isn’t met. Not only is this disappointing and frustrating to your end-user, but it can result in poor business reviews, a loss of customers, and high costs.

As your business grows, your ITSM processes will need to grow with you. Changing your process and the tools can cause downtime. Atlassian Cloud adheres to strict Service Level Agreements (99.90 percent uptime for Premium products and 99.95 percent for Enterprise), which means that your systems will be available nearly 24/7, helping prevent any negative impact on your user experience. 

Security 

Scaling your ITSM practices enables you to consistently — and satisfactorily — meet your customers’ needs. However, rapidly expanding your services and ITSM can have some security risks.

Maintaining secure data access is one challenge your organization can face while scaling. Some strict security measures can be neglected during this transition, making your network vulnerable.

How do you stay on top of these security challenges while scaling your ITSM? 

Atlassian Cloud handles compliance on your behalf, minimizing internal resources spent planning and executing compliance roadmaps and working with auditors. Atlassian Cloud also offers data residency, which enables you to choose where your in-scope product data resides for Jira, JSM, and Confluence. You can choose whether you’d like to host your data in a defined geographic location or globally. Data residency allows you to keep your data secure and meet compliance requirements that accompany highly-regulated industries.

Additionally, Atlassian Cloud provides user provisioning and de-provisioning, reducing the risk of information breaches. Based on the principle of least privilege (PoLP), user provisioning and de-provisioning allow you to control user access to your resources tightly. Additionally, de-provisioning automatically removes user access for users that leave the company, eliminating the security risks that former employees — especially disgruntled ones — can pose.

Finally, Atlassian Cloud implements thorough security measures and constantly monitors for issues related to your cloud infrastructure. If any issues are detected, Atlassian handles these potential threats before they cause damage to your cloud resources and app functionality.

And, because Atlassian Cloud is backed by multi-level redundancy, your system won’t go down while Atlassian handles any unexpected issues.

Flexibility 

As your business grows, you’ll adopt new features, tools, and perhaps more Atlassian products to your stack. With this growth, you’ll also need to extend your ITSM principles across different teams without worrying about hardware-related complications. 

Atlassian Cloud provides a comprehensive stack of Atlassian products that you can implement in ways that align with the capabilities and needs of your organization.

Furthermore, with Jira, you have access to flexible application and project types so you can manage projects in the best way for your teams. Additionally, Atlassian Cloud allows you to upgrade and downgrade resources depending on your business needs. 

Atlassian Cloud Suite of ITSM Tools and Your ESM Strategy 

Atlassian Cloud’s suite of ITSM tools helps your organization improve your Enterprise Service Management (ESM) strategy by supporting core ITSM principles. Some of its features include the following.

Incident Management 

In developing your ESM strategy, your organization must include plans or processes for responding to service disruption resulting from unplanned events and restoring the services to normal. To do this, ITSM teams rely on multiple applications and tools to track, monitor, resolve and even anticipate incidents. 

To keep up with the velocity of today’s incident management, the Cloud versions of Jira Service Management (JSM) and Jira Service Desk place all these functions in one place, enabling your ITSM team to have a transparent and collaborative response to incidents. With this, you can track and manage incidents from the incident report to its resolution in real-time and resume normal operation with the least possible hindrance.

Asset Management and Configuration

One key aspect you need to consider in your ESM strategy is Asset Management and Configuration. You can store hardware assets, software licenses, facility assets, and more using JSM’s cloud-based asset management and configuration services.

Jira Service Management Cloud provides a centralized asset database, making searching for asset and resource information less stressful.

Multiple members of your ITSM team can access assets and asset information from any device with an Internet connection — and in any location — without error or conflicting information. It also synchronizes your asset database across all your organization’s branch offices in real-time, reducing or eliminating asset loss.   

Service Delivery 

To provide an effective service to your end-user, you need to identify customers’ needs and any issues that arise. A quality ticketing/response system improves your service delivery through increased awareness and ability to triage, enhancing visibility into potential issues.

With JSM, your teams can receive incoming issues and requests from customers and team members. This enables you to better prioritize and understand the scope of issues and service requests so you can first address time-sensitive requests.  

Additionally, you can configure JSM to direct tickets to the appropriate ITSM team automatically. With this, the appropriate team can address the customer’s request and escalate issues if further assistance is required to address customer requests — while skipping the process of determining who should handle the ticket.

Conclusion 

Operating in Atlassian Cloud enables your organization to expand ITSM capabilities throughout your entire organization. 

While scaling your ITSM practices may seem daunting, it doesn’t have to be with proper guidance and support. Praecipio Consulting, an Atlassian Platinum Solution Partner, can help you take the guesswork out of scaling ITSM. From developing a solid ESM strategy to tips on how to increase efficiency and eliminate downtime, Praecipio Consulting is here to help. Contact Praecipio Consulting today to start scaling your ITSM practices with Atlassian Cloud.

Topics: scalability security incident-management itsm atlassian-cloud
3 min read

Why ESM Should Be Part Of Your Business Strategy

By Praecipio on Aug 22, 2022 10:00:00 AM

1102x402 - Blog Featured (21)You need effective communication across your organization’s departments to boost productivity and service delivery. Managing workflows, operations, and complaints in a growing workforce can be challenging, especially when dealing with siloed teams. Rooted in IT Service Management (ITSM) principles, Enterprise service management (ESM) is one of the most effective frameworks for managing collaboration and improving efficiency across IT and non-IT workflows. 

The Service Desk Institute found that in 2021, 68 percent of organizations employed ESM strategies and that 80 percent of those organizations accelerated their digital transformation in 2020 with the help of ESM processes and tools. This widespread use of ESM is driven by its ability to manage and encourage corporate collaboration by providing an efficient portal for real-time communication and resource monitoring — ultimately boosting productivity.

Benefits of Adopting ESM in Your Business 

There are numerous advantages to adopting ESM, but today, we’ll discuss five of these benefits.

Reduce Operational Costs

Having many support personnel on the payroll will inflate the cost of running your business. ESM has incorporated tools like chatbots, virtual assistants, and smart analytics to significantly reduce the number of staff required to manage employee and customer issues. Additionally, automation can reduce maintenance and training costs by making workflows more efficient.

Improve Customer Experience

Satisfied customers are the key to meeting business objectives. One way to improve customer experience is by offering fast and real-time responses to inquiries. It’s difficult to guarantee a fast response time when your company is over-dependent on human interaction. 

ESM technologies use artificial intelligence (AI) to handle basic customer inquiries and complaints, helping to ensure that no customer issues are missed and that customers have access to support when they need it. Additionally, using ESM can help to ensure that all of your teams play an active role in delivering value to your customers and that the customer experience is treated as a top priority across your organization.

Improve Department Efficiency

When the departments in your company operate efficiently, the overall productivity of the enterprise increases. ESM provides effective collaborative and communication tools that can be used among departments, reducing or eliminating the need to manually print and distribute memos or reports.

 ESM also helps in task monitoring to keep up with project specifications and due dates. You can use project management tools backed by automation to handle corporate tasks, including scheduling and resource monitoring. This can greatly reduce unnecessary human errors and oversights and minimize the time and financial investment in performing repetitive, manual management tasks.

Reduce Siloing 

ESM helps to reduce or eliminate siloing among teams in an enterprise. One of the leading causes of overall low productivity and performance in the enterprise is poor interaction among team members. When teams work independently vs. collaboratively, status reports may not always be communicated, and business objectives could hold different weights — or shift entirely — from team to team.

ESM offers a fast and efficient interaction among unit members. Using a central line of communication helps different teams interact with each other and offers a space to share relevant documents, analyses, and workflows. Plus, our experience shows that employees like working collaboratively within a single system.

Improved Incident Management

Managing emergencies and unexpected challenges is difficult, but it’s easier when you apply ESM capabilities. ESM tools like Jira Service Management have AI-enabled capabilities and automation incorporated into the management processes. This means that incidents are quickly flagged and the appropriate mitigation protocols are initiated.

Conclusion

In today's fast-paced business world, teams everywhere are experiencing growing pains due to disparate tools and delayed decision-making. ESM enables organizations to break down silos, drive business agility, and deliver high-velocity service experience, leading to increased customer and employee satisfaction. 

To learn more about where ESM fits into your business strategy and for guidance on how to adopt ESM, contact Praecipio.

Topics: incident-management itsm jira-service-management enterprise service management
6 min read

Root Cause Analysis: Leonard, Howard, and the 5 Whys

By Amanda Babb on Mar 10, 2021 9:50:40 AM

802x402 - Blog Featured (28)DIY or DIE!

For those of you watching from home, I have been on a home improvement journey for quite some time. Applying an Agile mindset to home improvement (or really anything I do) is one of my passions. Even at my most recent Women in Agile meeting, we discussed applying Agile concepts to daily life and feeding these back into building a great resumé. One of the principles of the Agile Manifesto reads: At regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly. We all know this applies to Agile development practices, but it also applies to IT Service Management. Specifically, Incident and Problem Management. For me, it applies to my recent home improvement adventure. 

Strong fences make great neighbors

My neighbor and I spent the better part of a Saturday fixing our mutual fence. You see, I have two dogs: Leonard and Howard.

 IMG_4511IMG_4512

Both are rescues. Leonard is eight and was "free to a good home" while Howard is four and was adopted from my county's animal shelter. Both dogs have been with us since their puppyhood and, as any dog owner will say, they are the BEST. DOGS. EVER. Except when they're not. This was not the first time my neighbor and I had to work on the fence. Observe one of the troublemakers in his natural habitat. 

IMG_4507

This epic saga started in May of last year. I would diligently fix loose boards, prop items against the fence to "patch" holes, and monitor their outdoor activity while I was awake (awake being the key word here: 3am barking and fence-patching sessions are no fun). I supplied my neighbor with fence planks because, well, they're my dogs. We fixed the section above and let the others lapse until a series of shenanigans prompted my neighbor and I to spend our Saturday replacing three additional sections. My neighbor and I became united in making sure my two didn't escape. While my neighbor "doesn't care" that my dogs are in his yard, my (very good) boys take the opportunity to break out of his fence and wander the neighborhood. Howard usually comes back, but Leonard meanders through the streets, swims in pools or the lake, and generally causes mayhem until I can coax him in my car to come home. 

IMG_4508

Not in my back yard...

Before this latest patch, I was determined to find the root cause. Previous to May of last year, this was not a problem. My puppers would frolic in the backyard and simply bark at other dogs in the neighborhood as they walked by. I made sure they were let out several times per day to make sure they were relieved in addition to daily walks. While I was traveling, they were also well-taken care of and monitored. What changed? 

Root cause analysis is, simply put, problem solving. While it is widely used in sciences and engineering, it is also a key element of IT Service Management Incident and Problem Management. When reacting to an incident, the team must restore functionality as quickly as possible. Upon resolution, root cause analysis helps us understand why. It then prompts us to ask, "Is there an action I can take to prevent this from happening again?" Incident Management leads to Problem management and through root cause analysis, we can move from a reactive organization to a proactive organization. 

Of the many techniques of root cause analysis, my favorite is the "Five Whys". It is the simplest technique: ask why until you've identified the root cause. Not like a petulant child, however. Asking the first why should be easy, then continuing to ask well-curated questions based on the previous answer helps you determine the root cause. I applied this to my situation: 

  • Why do I have to replace parts of the fence? 
    Because the dogs are chewing through the fence.
  • Why are the dogs chewing through the fence?
    Because they can access the backyard whenever they need.
  • Why can the dogs access the backyard whenever they need?
    Because we installed a dog door.

IMG_4509

HA! I found it. The root cause. And it didn't even take me all five whys. 

Any root cause analysis technique does not stand alone. There exists a plethora of other techniques. Pareto charts determine that 80 percent of your problems are derived from 20 percent of the causes. An Ishikawa (fishbone) diagram looks at measurement, materials, methods, machines, management, and mother nature. Scatter plots let us look at correlation and causation. Was the dog door the root cause? The existence of a dog door doesn't change the behavior of my boys. Having access to the backyard doesn't make them chew through the fence planks. Did we ask enough questions to actually identify the root cause? Did I also consider a Pareto analysis, an Ishikawa diagram, or a scatter graph to understand why I was constantly chasing my boys through the neighborhood? 

I stopped at three whys: "I have a dog door."

What happens if I keep asking why? 

  • Why did we install a dog door? 
    Because Howard wasn't fully potty trained. 
  • Why wasn't Howard fully potty trained? 
    Because I didn't take the necessary time to train him. 

AHA! My Ishikawa diagram identified "management" as the issue. My Pareto identified the 80 percent as my time to train my puppers. My scatter plot showed the amount of time spent correlated to the amount of dog-induced shenanigans. I would add these to the post, but won't because...reasons. More importantly, I simply kept asking, "Why?" until I identified the root cause. 

Actions speak louder than words

Now that I have a root cause, what is it that I can do to prevent this issue from recurring? When looking at Incident and Problem Management, Atlassian products such Opsgenie and Statuspage can ingest, aggregate, correlate, and trigger the creation of Jira Service Management issues. With Confluence, we can create specific root cause analysis templates to be shared with our customers and stakeholders. However, it's up to our techniques and processes to help us determine the actions we need to take going forward. 

For me and my puppers, it's simple. 

  1. Take at least 30 minutes out of my day for dedicated doggie exercise
  2. Reinforce good behavior while in the yard
  3. Lock the dog door overnight (no more 3AM "let me sing you the song of my people" moments)
  4. Finish replacing the aged planks on the fence

By taking these actions based on my root cause analysis, I should have this solved quickly with redundancies built in. My puppers will be safer and happier, I will have a beautiful new feature of my home, and the three of us will have less stress day-to-day. Using root cause analysis techniques, and Agile mindset, and drawing from IT Problem Management, I can easily solve this problem and any additional ones around my home.

BRB, gotta run and get some more fence planks.

IMG_4510

Topics: blog confluence plan problem statuspage incident-management itsm women-in-technology agile opsgenie jira-service-management health-check
5 min read

Tips for Performing a Successful Root Cause Analysis

By Praecipio on Mar 5, 2021 10:55:01 AM

Blogpost-display-image_Tips for Performing a Successful Root Cause AnalysisRoot Cause Analysis: The Under-appreciated Hero

When implementing an IT Service Management (ITSM) system, I always look forward to spending time on root cause analysis (RCA). Of course Incident and Problem Management play the central role in ITSM design- it's crucial to give your teams, customers, and systems intuitive ways to communicate when something has gone wrong. However, it is equally important that organizations spend time identifying the key driver of these problems by performing an RCA to prevent them from reoccurring. This is because, at the end of the day, incidents and problems cost your organization money, and a good RCA can help save it. It's this viewpoint that has led me to dub RCA the under-appreciated hero of ITSM and in this post I will share with you the aspects of a successful RCA that can help vanquish problems once and for all. 

It's important to distinguish between Problem Management and Incident Management. In broad strokes: the goal of Problem Management is to get to root cause, and we can understand its goal to be increasing the meantime between failures by determining root cause of one or more incidents thereby addressing with appropriate change to prevent recurrence of the incident; in this sense it's a proactive approach. On the other hand, Incident Management's goal is to reduce the meantime to recovery by responding and resolving fast; its approach is reactive.

What is Root Cause Analysis?

The core function of root cause analysis is to uncover the core reason why a problem occurred. While there are many different tools and approaches to perform an RCA, I've consolidated the key steps into the diagram below: 

Root Cause Analysis Blog Post

  • Define the problem: First, make sure you and your teams align on "What happened?" and are speaking to the same problem.
  • Collect Data: Then, the focus needs to be "How did this happen?" and gathering data around the problem, whether customer testimony or incident reports.
  • Identify Casual Factors: Casual factors also help to answer "How did this happen," and in this step, teams should be guided to identifying fixable causes.
  • Identify the Root Cause: Next, teams should leverage one of the techniques of the RCA process, such as the "Five Whys," Fishbone Diagram, or Fault-Tree Analysis, to drive to the root cause of all the causal factors. 
  • Recommend and Test the Solution: After the root cause has been identified, teams should work to develop a solution that gets recommended to the Executive team for approval before testing can begin. Once approved, the solution should enter a testing phase, where it can be rolled back if not successful. 
  • Implement and Monitor: Once the solution is implemented, teams should continue to monitor it in the production environment to ensure that it is working as expected. This active analysis step is why RCA is depicted as a cycle; if the solution did not resolve the problem, it could be that the problem was a casual factor and the team needs to begin the RCA process again. 

Why Does It Matter?

I've worked with teams who have a well-defined RCA process and others who are just beginning. I reference this diagram when we focus on RCA because it helps to illustrate how simple of a process RCA can be. There aren't rigid guidelines or rules to follow; organizations can adopt their own RCA policies. What many don't realize, especially those who have yet to adopt RCA as a business process, is that it has a big pay-off: cost savings.

Root cause analysis can be a cost saving tool for organizations for a couple of reasons. First, identifying and acting on problems early saves money. The longer a problem goes on the more money it costs the organization, and a properly deployed RCA process is built to help organizations become more proactive rather than reactive. Second, the main goal of the RCA process is to prevent incidents from cropping up again. If the incident does not reoccur, then there won't be downtime or lost production, saving money in the long run.  

How Can I Help My Organization Embrace RCA?

When working with organizations to implement an RCA process, there are several aspects that I help coach my clients on which can help the organization embrace RCA. They are:

  1. Talk about what went well.....and what could have gone better
    1. When the team is starting the RCA process, guide them to start by discussing what happened and framing the problem. Then, go one step further and document what went well. This will provide you data and help to explain what is not the issue or what not to blame. It's equally important to talk about what could have gone better, as this will likely begin the discussion and documentation of your causal factors. 
  2. Make it work for you
    1. In some organizations, "Root Cause Analysis" can be viewed as too formal and intimidating. I've come across some resistance to them due to their structure or even the invitee list. For these reasons, it's important to make sure you're adopting a RCA structure that feels natural for your organization. This could mean:
      1. Being mindful of the attendees, especially if the invitees include senior management and above. Ensure you include the right people in the room at the right time. Your front line team has the most firsthand knowledge of the systems or processes, and you will want them to feel comfortable participating candidly in any discovery meetings. 
      2. Having a neutral party leading the meetings. The leader shouldn't have anything to gain by the results of the RCA process and should be able to maintain a "blame free" atmosphere.
      3. Reframing RCA as something more approachable, such as a "Lessons Learned meeting,"  where the RCA process is still followed, but in a less formal way. Feedback and idea can be gathered via sticky notes and shared on a board so that it is anonymous for example. 
  3. Root causes can only solve one problem
    1. Remember that the main goal of RCA is to avoid future incidents. Teams should not be applying a previous root cause to a current or future problem- if that is the case, then it indicates that rather than identifying the root cause, the team actually identified a casual factor. In these instances, I've coached teams to go back and take their RCA process one step deeper, for example asking another "Why" question if the "Five Whys" is used. 

The goal of Problem Management is to get to root cause. Incident Mgmt goal: reduce the meantime to recovery (by responding and resolving fast); reactive
Problem Mgmt goal: increase the meantime between failures (by determining root cause of one or more incidents thereby addressing with appropriate change to prevent recurrence of the incident); proactive.

Ultimately, where incidents and problems cost your organizations money, RCA saves it. It is for this reason that I think of RCA as an under-appreciated hero of ITSM. While the biggest barrier to accomplishing RCA can be time, putting in the time upfront to accomplish the RCA process will prevent repeat incidents from cropping up, saving your company time and resources in the long run. By implementing a few of these tips, I hope you come to appreciate RCA as I have, and if you have any questions let us know, we'd love to help. 

Topics: blog plan incident-management itsm health-check

Praecipio Consulting is an Atlassian Platinum Partner

This means that we have the most experience working with Atlassian tools and have insight into new products, features, and beta testing. Through our profound knowledge of Atlassian environments and their intricacies, we can guide your organization as you navigate these important changes.

Atlassian-Platinum-Solution-Partner

In need of professional assistance?

WE'VE GOT YOUR BACK

Contact Us