Share This Article
Aligning People and Processes to Maintain a Resilient IT Infrastructure
During the last decade, the increasing pace of business and the growing dependence on IT for business operations has forced organizations to invest in technology to manage and protect their critical information assets. But keeping today's businesses operational and resilient requires more than leading-edge technologies - it requires a significant and continual investment in the people and processes that operate and support these technologies.
As dependence increases, the potential for an IT failure to disrupt business operations becomes a serious management concern. Organizations must find a way to reduce exposure to IT risks, decrease costs, and build greater capacity for IT to drive business innovation.
Despite the progressive shift to automate and streamline operations with leading-edge technology, it's ultimately people who are responsible for maintaining a resilient IT infrastructure and business continuity. Managing multiple job sites, upgrading or patching systems, as well as storing and securing critical customer, employee and partner data all take manpower. While the cause of IT failures can include technology and environmental compatibility issues, the root cause of IT failure frequently lies in process and skills issues. According to a recent study conducted by Symantec and researchers at the University of Maryland and MIT, 53 percent of IT failures were linked to process issues involving asset management, testing, change control and patching. In addition, more than 40 percent of IT failures analyzed were tied to gaps in end-user expertise and product knowledge. Graph 1 provides a breakdown of the study results.
Graph 1. Frequency of root cause of IT failure.
Master the Basics
A telecommunications carrier recently learned the value of processes the hard way. Without protocols in place for rotating and reusing backup tapes, the wrong backup tapes were erased and prepared for reuse. IT staff realized they selected the wrong tapes only after they were already cleared. Establishing and following set processes for rotating backup tapes would have saved the data, which was never recovered.
Processes must be established for disseminating information across teams of all types. The size and geographic location of IT departments often impacts the flow of communication; however, sharing best practices and lessons learned with cross-functional groups is vital for increasing productivity and eliminating further IT headaches. For instance, take a healthcare provider with three major sites. After two sites were infected with a virus, alerting the third site of the pending danger would have prevented an infection from the same virus six months later. Why was information not shared? The answer: no communication processes were in place to share experiences and learning across locations.
Reacting to the Unexpected
In addition to building processes that keep operations running under normal conditions, IT staff must simultaneously prepare for worst case scenarios. At the slightest hint of trouble, an IT staff needs to know how to react and what procedure to follow based on their accumulated awareness and expertise. When something unexpected happens that falls outside the realm of established protocol, individuals need to know where to turn for assistance or what steps to follow in order to make thoughtful and calculated decisions that will not adversely affect the IT environment. After resolving the incident, action should be taken to determine the root cause and to prevent future incidents. Dealing with the unexpected requires familiarity with existing processes as well as strong expertise and knowledge available to deal with the unexpected.
Recently, a financial institution rolled out a weekend upgrade to their cluster environment. As the roll-out progressed, a configuration issue cropped up. Although the institution had processes in place to rollout an upgrade to their environment, there was no protocol to follow for an unsuccessful roll-out. The problem was compounded because the key systems architect was on vacation at the time. Recognizing the potential for problems with the upgrade would have enabled the institution to better prepare for and respond to the issue by having the resources available to support problem resolution in a timely manner.
Processes - More than Words on a Page
For many IT departments, processes for handling change are either not comprehensive enough or organizations do not have the right pieces in place to keep them resilient. In fact, of the 53 percent of cases caused by process issues, 11 percent were due to poor execution rather than poor or missing processes. Although there are no processes that can adequately address all incidents, ITIL and Six-Sigma practices provide solid starting frameworks and disciplines to implement and reliably utilize processes in a variety of circumstances. Adopting such practices will also help mitigate many incidents.
While processes can play important roles in handling unexpected events and ensuring mistakes don't happen, it's people that help ensure the right steps occur. Investments in proper education and training not only help IT organizations significantly improve their knowledge and skill base, but also can prepare them to manage and mitigate IT risk.
For example, a recent study conducted by IDC1 showed that well-trained teams were twice as likely to properly protect their PCs from security threats and were 60 percent more likely to successfully complete backup jobs. With IT failure occurring more than 40 percent of the time from lack of IT staff skill and training, the need for proper instruction is evident. The following graph from IDC depicts these major factors that contributed most to the success of critical IT functions.
Graph 2. Key success factors for critical IT functions.
Ensuring proper skills in the team enables them to design and put in place the most efficient and effective processes. It equips individuals with the ability to identify and mitigate unforeseen risks, avoid unplanned product downtime and ensure they have the knowledge of best practices for utilizing technology.
Part of creating a resilient infrastructure is building a high performance culture that can manage change effectively. In addition to training, holding IT staff to the highest operational standards, such as those held by other critical business operations within a company, will help streamline the implementation of proper procedures. Much like the manufacturing industry, which tolerates little or no downtime, IT organizations should strive to minimize its level of tolerance for downtime by adhering to stricter policies and procedures.
In order to successfully make this paradigm shift organizations should do the following:
Although there will never be a process for every situation, IT teams can eliminate the root cause of failures - and identify the cause of failures more easily - by establishing and following a standard set of protocols and equipping people with the knowledge to manage and adapt them properly. Only then can organizations build a culture and skill set that addresses the issues standard protocols cannot.
About the Authors
© Copyright 2007 Auerbach Publications