The digital transformation within organizations is pressing ahead at an increasing rate for numerous reasons, with existing systems being upgraded, new tools introduced, existing workflows digitalized and new digital workflows provided at an ever-increasing frequency, while developments like IoT, Mobile and Blockchain also evidence a continual and partially exponential increase in the volume of accumulating data.
These trends almost inevitably result in the IT constellations, today already complex and fragmented, becoming increasingly more heterogeneous and error-prone, thereby presenting the ever-growing challenge of operating, supporting and analyzing them as well as optimizing them as to efficiency, speed and quality, while all the time ensuring conformance and compliance.
IT service management in particular underscores this trend in digital transformation, since the necessary execution of digitalized business processes requires an optimally smooth interplay across different systems and sectors.
Specifically, it must be known as precisely as possible, at all times, where exactly an issue exists or could arise, and it must be possible promptly to initiate the proper measures in order to resolve the issue or prevent it from occurring in the first place.
At the same time, there is also the demand continually to improve the service experience at the customer's end, without sacrificing the efficiency of the IT service processes, however.
In this article, we illustrate with examples the possibilities afforded by process mining in combination with artificial intelligence for continually optimizing, under the conditions of increasing IT complexity, the "service experience" as well as efficiency, speed and quality while ensuring conformance and compliance through pro-active and predictive IT service management wholly in keeping with ITIL Continuous Service Improvement (CSI).
Process Mining and Execution in the End-to-End Scenario of "Issue-to-Resolve"
Residing in the "issue-to-resolve" process sector from the end-to-end perspective, IT service management in the present example is supported end-to-end by the cloud-based ServiceNow system, wherein cases, incidents, service requests and change requests can be consistently processed free from disruptions in media or systems.
For the purposes of analysis and reporting, ServiceNow already affords IT service managers a comprehensive selection of integrated analysis and reporting options at the manufacturer's end, such as numbers and SLA fulfillment rates of tickets according to various criteria like IT asset class or product category, customer, ticket type and priority, and period of time.
These classical business intelligence methods unfortunately do not suffice for really understanding processes and detecting and directly resolving issues in procedures and thereby continually improving processes, since, while their affirmative orientation makes them quite suitable for tracking optimization options already known, they lack exploratory possibilities for identifying yet unknown weaknesses and acute operational process issues.
The Celonis Execution Management System thus ideally supplements ServiceNow when it comes to analyzing actually experienced processes in nearly real time, to diagnosing systemic and acute operational problems and to preventively or remedially intervening in running process instances either already problematic or likely to become so.
Thanks to the ServiceNow connectors available at the Celonis App Store, the data link between ServiceNow and Celonis can be set up across all IT service processes quite easily and fast. Once that connection is established, Celonis can continually retrieve current transaction data – in our example, service ticket data plus meta and master data – from ServiceNow, in order then to perform end-to-end analyses.
The classical process mining method enables the display, analysis, simulation and benchmarking of IT service processes, based on the ServiceNow data with their variants, including key figures, and the investigation of such processes as to their conformance and compliance or original causes and effective impacts of non-conformance/non-compliance.
The so-called "Action Engine" available in the Celonis Execution Management System goes even further by automating process steps or by pro-actively or predictively providing the IT service manager, via their personal inboxes and optional push notifications, with concrete operational recommendations for action – so-called "next best actions" – for purposefully resolving issues, stopping their occurrence or even avoiding issues before they arise.
This includes, for example, concrete automated or recommended manual intervention with Multi-Hop tickets, repeated "Hey Joe" tickets for specific user groups, non-assigned high-priority tickets and tickets incorrectly assigned or likely to jeopardize an SLA.
Added Utility Through Machine Learning
With the aid of artificial intelligence, the Celonis Execution Management System can also detect relationships whose identification is not possible with classical process mining analysis tools but requires the additional possibilities of machine learning.
Situations of this sort include significant and sudden changes in key figures, outliers occurring outside a particular spectrum and certain types of events and deviations from expected developments – in each case relative to common models and trends – with time-related anomalies being basically of interest, since in the historical view of individual service ticket types, such anomalies or reports thereof often and typically accumulate at certain times.
In order to create and process complex machine learning algorithms, Celonis employs its own integrated machine learning module, which on the one hand already contains pre-configured use cases, such as duplicate detections, and on the other hand also enables the realization of custom use cases, as in the present example.
In IT service management, these methods enable automatic detection of any anomalies in occurring service tickets for an IT asset class or product category in a temporal context, so that the pertinent IT service manager can be notified, who can then directly contact the IT asset manager or product manager in order to ascertain the reason for the unusual service ticket frequency, in terms either of level, volatility or dynamics of the given change.
The reasons for relevant anomalies in this regard can be quite multifarious, and range from system changes without sufficient prior delta user training to more comprehensive system malfunctions or malware attacks, with this type of application of machine learning being understandably more effective the higher the general service ticket volume. That's why the application target group typical for our example can be characterized by Group Helpdesks, managed service providers and system houses.
Meeting the Typical Challenges of Machine Learning
In practice, machine learning initiatives are not seldom distinguished by a high level of complexity and high costs, as well as by the promise of high added value, which often is not fulfilled in productive applications, however, and consequently, according to surveys, up to 85% of all machine learning initiatives founder in operational use.
Celonis addresses these challenges through seamless end-to-end integration in the Execution Management System, enabling immediate use via company-wide processes without complex application integrations and without any necessary cost-intensive resources for the complete data science life cycle, including model runs and data pipelines, thanks to pre-existing scalable computing resources.
Moreover, the unrestricted access by the machine learning environment to the previously cleaned-up and pre-structured process data as well as the integrated infrastructure for testing and training machine learning models with the aid of PyCelonis, a Celonis-specific Python package, ensures access to all Celonis modules and their contents. In addition, process mining per se affords full transparency for all basic process-related KPIs, by means of which the ROI of a Celonis machine learning project can be assessed continually and transparently.
Mode of Functioning and Procedure Within Celonis
Celonis uses pre-configured dashboards to afford transparency of the various process key figures, including overviews of all service-ticket-related information, such as trends in service ticket volume per IT asset class or product category.
From the regular mere observation of the displayed ticket history, an experienced IT service manager will generally already discern any anomalies expressly requiring investigation.
This time-consuming, not very reliable and also generally inadequate observation in real time is a task that can be superbly automated through machine learning.
For performing the statistical analyses and preparing the machine learning models required in the application example, the Celonis Machine Learning Workbench comes with a fully integrated Jupyter Notebook development environment.
For actually detecting anomalies in the service ticket volume, the pre-existing Celonis data model is used, in which all relevant ServiceNow process data are stored and which contains, among other things, the key figure for the number of service tickets as well as the corresponding service-ticket-related meta and master data, which serve as the basis for training the detection algorithm.
The so-called "seasonal decomposition" algorithm is used for extracting the daily seasonal components. Theoretically, other pre-configured algorithms can also be used, however, in order to run other functions for cleaning up or enriching data.
Trained in this way, the machine learning model can now be applied for anomaly detection with new service ticket data, namely by identifying information on normal or unusual data points of each IT asset class or product category and transferring this information back to the data model.
On the basis of the expanded data model, different key figures, such as "anomalies per day" and visual time series plots, can be output.
However, the central function lies in promptly and automatically notifying an IT service manager once an anomaly occurs in his or her area of responsibility. Hence it is advisable to choose the time-controlled execution of both the ETL cycle and the anomaly detection at intervals as small as possible.
Authors: Daniel Misof and Patrick Schneider, Scheer GmbH