Hate seeing that troubleshoot error pop up? Wouldn’t it be a lot less taxing when you have the problem identify itself with solutions, instead of you scrambling to fix the issue?
Modern IT networks are growing increasingly important as the bedrock for underpinning the successful commercial operation of organizations in every sector ranging from retail, manufacturing, banking, financial services, aerospace, transportation, government, and public sector charities to name just a few. With the prevalence of e-commerce and digital business, all organizations need “always on” IT networks, systems and applications to run their day to day operations.
The Impact of the Modern Day IT Infrastructure
Modern IT networks are also becoming increasingly complex – what may have started out some years back as a simple infrastructure landscape comprising switches, routes, firewalls, modems, wired LANs and wired WANs has now rapidly evolved into an IT ecosystem which includes wired & wireless network equipment, a myriad of gateways, hubs, bridges and more recently encompasses IoT devices, smart sensors and apps.
The impact of all of the above on IT network operations personnel is manifold: increased workload as the number of incident being logged is increasing rapidly, increased Mean Time to Repair (MTTR) as the time to troubleshoot the root cause is longer due to the number of components in the infrastructure. However, what is often not stated are the other impacts – increased pressure on already over-worked IT operations teams from in-house senior management. Especially Lines of Business (LoB) Owners who cannot conduct business when there is an outage and irate customers. Don’t forget that partners and suppliers may also be hampered conducting business when your network is not fully available.
There are a number of Operational Maturity Models that are being increasingly adopted by companies to map their IT operations capability and help shape ways to increase efficiency and effectiveness. Examples include Gartner’s IT Infrastructure and Operations Maturity Model which identifies six overall levels of maturity:
- Business Partnership
Regardless of the Maturity Models that are available, there are 4 primary modes of operation that define the capability of an IT operations function:
- Mode 1 - Reactive
- Mode 2 - Proactive
- Mode 3 - Predictive
- Mode 4 - Pre-emptive
Let’s look at each of these in turn and why organizations today are increasingly looking for ways to elevate their IT operations function to higher levels of capability.
Mode 1 – Reactive
Organizations operating in this mode are essentially operating a “Break/Fix” model. When an incident is reported by an end user or customer, an incident (or ticket) will be logged, usually in a helpdesk system and auto-routed to an IT operations engineer for analysis and resolution. Typically this mode relies on human know-how.
Mode 2 – Proactive
In this mode, the organization is able to identify that a network component or system may be about to go off-line or suffer an outage and the reason is known. For example, a disk may be filling up and an alert has been triggered when a threshold has been breached. This mode typically relies on human know-how and “constraint-based IT tools” (i.e. admin can set a threshold for a particular parameter that can be measured and send notifications when it is breached).
Mode 3 – Predictive
For this mode, organizations have troubleshooted or identified the issues that might affect normal network operations by understanding the root cause and a method to fix them.
Mode 4 – Pre-Emptive
This is the highest operating mode. The IT team has troubleshooted or identified the issues that might affect normal network operations, understood the root cause and has already taken action to fix them.
Progressing to a Higher Operations Maturity Model
So how do organizations progress to these higher level modes of operations and what are the business benefits?
Operating in Reactive mode (break/fix) is the least effective and efficient model and generally is found in organizations which troubleshoot and solve problems with manpower. Over time this model becomes increasingly costly and is not scalable.
By introducing value-add IT support tools such as helpdesk software, networking monitoring software and documenting IT support processes for handling all IT, incidents can be logged and a knowledge repository developed. This repository builds up over time which in turn records patterns of similar issues. Combining the information in these tools with human know-how allows an organization to migrate to Proactive mode.
The path to migrate from Proactive to Predictive should then be relatively straightforward. Organizations can extend and/or enhance their IT support tools and processes to not only locate faults and identify the root cause of the failure, but expedite the MTTR by using the data in the tools to quickly point IT operations personnel to information that shows how to fix the issue(s).
But what if we could go a step further again – in all 3 modes of operation above we are in effect still operating in a reactive mode – albeit at different levels of efficiency. Given the strategic importance for many businesses of having an “always on” IT network the holy grail is to get to a point when they can pre-empt issues before they happen and take advance action to prevent them occurring. With today’s modern IT helpdesk and network monitoring tools, organizations have collated vast data sets that hold a rich suite of Intellectual Capital that can be harnessed to drive even more efficiency and effectiveness.
Companies are now augmenting their IT Operations and Infrastructure tooling with Artificial Intelligence/Machine Learning (AI/ML), Analytics tools to proactively scan their networks and identify potential issues that would cause outages or expose security threats. These tool outputs can be combined with the outputs from other systems that are searching/scanning other sources of Intellectual Capital in your organisation to generate Decision Grade Information.
Consider for example that you are the managed services network provider for several very large and mid-sized corporations. Your toolset could also scan the database that holds your customer contracts and search for Service Level Agreement (SLA) commitments. Let’s assume that your systems have identified potential issues that would cause outages for several of your major customers if not acted upon soon; you may decide to prioritise the order in which customers will be fixed first based on their overall Contract Value or agreed SLA thresholds. Combining “operational network incident data” with such “commercial/contractual” data is a very powerful decision support to aid prioritization of fixes based on financial criteria as well as technical factors.
With the advent of software defined networks and devices, solutions to deal with such issues or threats can be automatically deployed without the need for human intervention. However, as AI/ML (machine learning) is still an evolving science, by combining the human know-how based on years of IT network experience with the information available from these next generation tools, and human-assisted tuning of the toolsets, IT can offer the greatest results.
We know that IT networks are going to become even more pervasive and complex. We also know that there is a current and growing shortage of skilled IT personnel to maintain, run and fix these networks. It’s time to start deploying next generation Infrastructure & Operational Maturity tools and offer a service which combines the power of the human brain and the computational outputs from AI/ML.
By moving your organization to Pre-Emptive operating mode, you will positively impact your customer’s experience and reduce your IT infrastructure operational costs.