Track: Service Engineering and Service Management
Abstract
By reducing Mean Time to Recovery (MTTR) companies can minimize the duration of system outages and restore services promptly, thereby reducing the impact on the business. Typically, in a Contact Centers or IT Service Desk environment, once the ticket has been tagged in critical status, a serious issue is identified and defined as a problem (e.g., bug or incident, not an enhancement request), this could impact a client or trial’s workflow to the point where existing or new revenue is at risk. This research conducted a case study of an actual IT service desk company based in Manila, Philippines on how they handled “Critical Tickets” due to system-wide outages or unexpected downtime of software products. The identified causes of High MTTR that were based on the data of Fiscal Year 2022 were the primary basis. The identified root cause is the following: (1) Not meeting the required Service Level Agreement (SLA) for a minimum Average Handling Time of 15 mins. (2) Promptness of the scheduled EOC in acknowledging and Responding to Emergency Critical System Alerts. (3) Availability of Dedicated Software Dev Engineers / Product Specialists from a different region to respond to Critical Paging Alert. Based on the identified causes, the study proposed two major solutions to address the uncontrolled MTTR in responding to Critical Tickets. First, the enhancement of the current Customer Relationship Management (CRM) User Interface (UI) by conducting usability testing based on the (ISO 9241-11) metrics. The SUM results of enhanced CRM UI give 6.97 higher compared to the existing CRM design. The Effectiveness improved for getting 100% quality, the Efficiency having 90.45% quality, and the User-Satisfaction garnering 90.67%. Second, is the implementation of RoboEOC for full automation of the paging alerts system replaces the EOC as the main personnel that responds to/acknowledges the ticket once it was tagged as critical. This brings a well-automated paging of the concerned team (On-Call Software Engineers / Product Specialists) based on the product that was used to file the ticket. RoboEOC also features “Squadcast-On call scheduling”. This feature is a way to see who is in charge and when they oversee a specific critical ticket. A workflow Simulation using ARENA was also performed to test the newly proposed workflow if it will do a significant change in lowering the MTTR and handling time of Engineers/Specialists.