Most common incident key performance indicators
It’s important to know how many notifications are created over a specific amount of time (daily/weekly/monthly) if you use an alerting solution within your managed services software, most of which will create reports and dashboards or analysis.
If there are periods of spikes or dips then the reasons can be investigated and strategies developed to flatten the curves.
‘First contact resolution rate’ is the number of customer interactions (incidents or service requests) that are resolved during the initial contact with the customer, usually at Tier 1 (Level 1) and is a crucial customer service metric. If a client’s problem can be fixed during the initial interaction then the end-user satisfaction is greatest and reduces the load on other support personnel at the same or further support levels.
‘Incidents over time’
Similar to ‘alerts created’ except these are generally end-user created. Is the frequency of events increasing or decreasing over time? If there is a problem with the number of occurrences, you can investigate why it is increasing or remaining at a higher rate than expected, as well as what can be done to address the situation.
MTBF is the ‘mean time between failures’. This is the typical observable interval between a tech product’s ‘repairable failures’ and can be monitored to keep track of a product’s dependability and uptime.
MTBF serves as a starting point for more complicated analysis. If MTBF is lower than expected or tolerated within the system then extra focus on this aspect of your ITSM will pay dividends.
MTTA is the ‘mean time to acknowledge’ and is the average time it takes for the support team to notice an incident after an alert and to start working to resolve. Measuring how responsive support members are to problems is crucial to any service desk.
Once aware of a problem with alertness to incident recognition then team management can address the issue, whether it is a resource issue, under-staffing or confusion on who is responsible.
MTTD is the ‘mean time to detect’ and is the typical time to detect an issue and is more commonly used in, for example, cybersecurity where system breaches are being monitored.
MTTR is the ‘mean time to R’ where R can be respond, recovery, repair or resolve.
‘Mean time to resolve’, which accounts for not just the time spent diagnosing and fixing an issue, but also the time spent making sure the issue doesn’t recur, is used most often. Whatever the ‘R’ this is a crucial KPI which must be monitored and optimised to ensure an enviable service desk.
On-call time is a metric to monitor how much time support members spend on call and ensures that team members are not overworked.
An SLA is a service level agreement which specifies quantifiable measures between a service provider and a consumer of that service, usually a client which specifies, for instance, system or product uptime, response and resolution times to incidents and other responsibilities.
An SLO (‘service level objective’) is a component within an SLA. The SLA will include multiple SLOs which might, for instance, pertain to uptime or a maximum resolution time to a specific priority incident (P1, P2, P3…).
The proportion of time that systems are accessible and properly working is referred to as uptime. A new-comer to ITSM may wonder why anything less than 100% uptime is tolerated but with complicated systems and with myriad dependencies on modern systems, 100% is, over the long-term, a lofty goal and 99.9% uptime is regarded as a good value for this metric.