When you evaluate a monitoring solution for your environment, consider the solution’s ability to integrate with other applications and services. Most notably, a monitoring solution should be able to integrate with an IT service management (ITSM) system in order to automate the ticketing process. You’ll realize three benefits as a direct result of this integration: visibility into your operation, shortened service restoration times, and reduced blame-shifting among technical teams.
Visibility into operations
In the infrastructure world, engineers and administrators are notorious for two things: solving problems before they become outages, and failing to document the problems they solve. Maybe failing is a bit unkind; let’s say forgetting instead. In either case, many environments suffer from a lack of visibility into operations because technical staff (whether intentionally or otherwise) obfuscate technical problems.
An integrated monitoring and ticketing system addresses this condition by automating the documentation of incidents. As soon as your monitoring solution detects an event, the ticketing integration sends the necessary information (e.g., server name, alarm name, time, etc.) to your ITSM solution, which will create a record of the incident. Now your service desk is aware of the issue without unnecessary action from the technical staff, which means engineering can still focus on resolving issues. And because the service desk is a customer-facing organization, service desk technicians will have real-time access to application and infrastructure problems that may cause a sudden increase in user calls. No more “is something wrong with Exchange?” questions; now they’ll know when Exchange is in trouble.
Quantifying the performance of an infrastructure or engineering team poses a challenge to even experienced managers; you can’t simply track the number of servers or switches deployed per day, or the number of patches applied by each administrator. In contrast, service desk metrics are easy to capture. How many tickets did each tech close today? How long did each ticket take? How many calls were resolved without follow-up? This information is easy to capture because ticketing systems record and make available all of these metrics. By tracking your events the same way you track tickets, you can generate reports on the overall health of your infrastructure just as you do with your service desk. Now you can see how many events are generated each day/week/month, how long each incident lasts, and which servers and systems generate the most tickets.
Shortened service restoration times
Justice Louis D. Brandeis famously observed, “Sunlight is said to be the best of disinfectants…” While it’s unlikely he was referring to technology (after all, Brandeis died in 1941), his statement is relevant to this discussion. When you “cut a ticket” for your events, you shine sunlight on the operation of your infrastructure. Systemic problems, such as that file server that freezes up once a week but is back online before an outage can be declared, receive attention from management through regular ticketing reviews. This situation has two immediate effects. First, all of the “normal” problems in your infrastructure are now exposed to new inquiry, and second, engineering teams have a new incentive to identify and resolve the root causes of these “normal” problems. Your infrastructure’s availability and performance improves as as result of both of these effects.
“It’s not us, it must be them” is a phrase that’s uttered at least once a day in any IT shop. Blame-shifting happens among all teams. Server teams blame the network, application teams blame the database, network blames the users. But while the teams are blaming one another, the clock is ticking, and the service interruption persists.
Integration between monitoring and ticketing defuses these contentious relationships by exposing problems from all areas to the same scrutiny. A manager with access to the ITSM solution can see events from all technical teams, thereby avoiding anecdotes and focusing solely on empirical evidence. Again, the focus can now be put on restoring service, not on assigning blame.
“You can’t manage what you don’t count.” – Arnold Felberbaum, adjunct professor, NYU Tandon School of Engineering
To effectively manage any busy IT operation, you need a direct view into the day-to-day events that occur in even the most well-designed and administered environment, and you need the ability to quantify and measure those events. When you integrate your system’s monitoring solution with your ticketing solution, you acquire visibility into your operations, resolve problems faster, and ease tensions between your technical teams. The combination of these benefits contribute to a more efficient infrastructure for your applications.