Playing Operation with Systems

By:


We live in a hectic world of day-to-day operations and aggressive business requirements that must be facilitated and made possible by IT — often in a vacuum. And we do all this while being expected to keep operations running consistently, repeatedly, and without interruption.

These are the challenges, and rightfully so, the woes, of the standard systems administrator or systems operator. What is often lost at levels above or below these core operators is that they are always on. Not only when things are at a lull and they’re performing regular and consistent maintenance, but also when things are at a peak, and everyone is running around with their hair on fire.

So what exactly is it that bridges the skill sets between an operator and a systems administrator?

operations1

Patch Tuesday is every day

Anyone who has a role in IT knows the pains of Patch Tuesday, except the larger your organization is, the more applications you support, the more often you realize Tuesday is any day that ends in a “y,” and often that day is late at night.

This role is daunting enough to drive even the most professional professionals crazy. Imagine you were a doctor and every single week someone was rushed into your care and you didn’t have to inspect them for a fever, but instead perform invasive surgery while they were alive and breathing, standing right there in front of you. Now imagine you’re doing that to hundreds or thousands a day, and if anything went wrong with a single one of them, it would be your head on the chopping block!

This is what IT professionals and systems administrators and operators face on a daily basis. It doesn’t matter if your company makes mythical widgets that are bought by other mythical organizations like Contoso, or you’re a financial services company that provides banking, credit, and information used by hundreds of millions of consumers. Same head. Same chopping block.

operations2

How not to turn noses red

If there is one takeaway we can gather from the game of Operations, which really is the best parallel for actual IT systems operations, is that we often only get one chance. One chance to do things right, and if we fail, BUZZ. RED LIGHTS. And sometimes that turns into a resume-generating event. So what can we do about it? We live in a world where we do not have to deal with tiny tweezers to solve complex problems, where we have at our disposal a deep and wide community who’ve likely experienced the same pains we have time and time again. The best part? They often publish their results and findings for others to use. In addition to merely having access to alternative solutions or ideas, we have the ability to help reduce or mitigate some of the challenges of operations, so we can get back to good if things should go south. Here are a few things to remember to help ensure a successful marriage of systems and operations:

• If you’re going to do it more than once for the first time, document it.
• If you’re going to do it more than once a week, automate it.
• If you’re going to perform it against a single system, back it up.
• If you haven’t done it before, test it.
• If you haven’t tested it before, clone and snapshot it.

And while this is by no means an absolute and definitive list, the key is to proceduralize as much as you can in your systems and operations. A lot of actions will be repetitive, regular, and consistent. The more systems you have should not be a 100% burden for each additional machine, but instead a duplicative effort of what you’ve already done and are able to do repeatedly and successfully.

There is one last bullet to add that was not mentioned above: Don’t try and reinvent the wheel. Spend a few minutes to see if someone already invented, packaged, and documented it so you can use it. As discussed earlier, we have an awesome community in IT. Someone else may have already done what you’re planning to do, so learn from their experiences, their downfalls, and their successes.

If you do that, you may just come out of this with all your body parts intact!

This blog post was previously published on SolarWind’s IT Resource Center.