IT Performance Improvement

IT Performance Improvement

IT Today

Auerbach Publications

Book Catalog


Author Guidelines

Share This Article

Mixx it digg

New Books

The Effective CIO: How to Achieve Outstanding Success through Strategic Alignment, Financial Management, and IT Governance

Best Practices in Business Technology Management

The Business Value of IT

Implementing the IT Balanced Scorecard: Aligning IT with Corporate Strategy

Subscribe to IT Performance Improvement

* required


Powered by VerticalResponse

Aligning People and Processes to Maintain a Resilient IT Infrastructure

Bob Yang, Catherine Anderson and George Westerman

During the last decade, the increasing pace of business and the growing dependence on IT for business operations has forced organizations to invest in technology to manage and protect their critical information assets. But keeping today's businesses operational and resilient requires more than leading-edge technologies - it requires a significant and continual investment in the people and processes that operate and support these technologies.

As dependence increases, the potential for an IT failure to disrupt business operations becomes a serious management concern. Organizations must find a way to reduce exposure to IT risks, decrease costs, and build greater capacity for IT to drive business innovation.

Despite the progressive shift to automate and streamline operations with leading-edge technology, it's ultimately people who are responsible for maintaining a resilient IT infrastructure and business continuity. Managing multiple job sites, upgrading or patching systems, as well as storing and securing critical customer, employee and partner data all take manpower. While the cause of IT failures can include technology and environmental compatibility issues, the root cause of IT failure frequently lies in process and skills issues. According to a recent study conducted by Symantec and researchers at the University of Maryland and MIT, 53 percent of IT failures were linked to process issues involving asset management, testing, change control and patching. In addition, more than 40 percent of IT failures analyzed were tied to gaps in end-user expertise and product knowledge. Graph 1 provides a breakdown of the study results.

Graph 1. Frequency of root cause of IT failure.

Master the Basics
Regular or routine activities should have established processes, which are known to all. Processes enable workers to treat all components the same, reducing effort and potential risk that would be entailed if each component is managed differently. Furthermore, process is an effective substitute for knowledge embedded deep in a single expert's mind. What may be common knowledge for one person is not always known to another person. Workforce turnover often leaves gaps in the knowledge base. In the event of an employee's permanent absence (or even a temporary one such as sickness or vacation), lack of processes can prove devastating if information is not passed along from one colleague to the next. Processes help facilitate a smooth knowledge transfer and ensure best practices are always followed.

A telecommunications carrier recently learned the value of processes the hard way. Without protocols in place for rotating and reusing backup tapes, the wrong backup tapes were erased and prepared for reuse. IT staff realized they selected the wrong tapes only after they were already cleared. Establishing and following set processes for rotating backup tapes would have saved the data, which was never recovered.

Processes must be established for disseminating information across teams of all types. The size and geographic location of IT departments often impacts the flow of communication; however, sharing best practices and lessons learned with cross-functional groups is vital for increasing productivity and eliminating further IT headaches. For instance, take a healthcare provider with three major sites. After two sites were infected with a virus, alerting the third site of the pending danger would have prevented an infection from the same virus six months later. Why was information not shared? The answer: no communication processes were in place to share experiences and learning across locations.

Reacting to the Unexpected
Establishing and following processes provides two key benefits to IT personnel responding to incidents. First, established processes leave behind an audit trail of changes and activities that can be referred to when determining the source of a crisis. Second, depending on the needs of each individual situation, personnel can customize pre-determined protocols instead of creating new ones on the fly, saving significant time, effort, and potential for error. The processes define a checklist of critical tasks to be performed and questions to be asked, allowing people to focus their attention on identifying additional tasks rather than trying to remember all of the basics. When unexpected events occur it's nice to know that certain standards will be kept and staff can spend time effectively addressing the most critical and unique elements of the problem.

In addition to building processes that keep operations running under normal conditions, IT staff must simultaneously prepare for worst case scenarios. At the slightest hint of trouble, an IT staff needs to know how to react and what procedure to follow based on their accumulated awareness and expertise. When something unexpected happens that falls outside the realm of established protocol, individuals need to know where to turn for assistance or what steps to follow in order to make thoughtful and calculated decisions that will not adversely affect the IT environment. After resolving the incident, action should be taken to determine the root cause and to prevent future incidents. Dealing with the unexpected requires familiarity with existing processes as well as strong expertise and knowledge available to deal with the unexpected.

Recently, a financial institution rolled out a weekend upgrade to their cluster environment. As the roll-out progressed, a configuration issue cropped up. Although the institution had processes in place to rollout an upgrade to their environment, there was no protocol to follow for an unsuccessful roll-out. The problem was compounded because the key systems architect was on vacation at the time. Recognizing the potential for problems with the upgrade would have enabled the institution to better prepare for and respond to the issue by having the resources available to support problem resolution in a timely manner.

Processes - More than Words on a Page
Even when processes are in place, organizations struggle with getting IT staff to follow established procedures. Unfortunately pages of notes or thick binders with step-by-step processes for handling routine or crises situations will not guarantee success.

For many IT departments, processes for handling change are either not comprehensive enough or organizations do not have the right pieces in place to keep them resilient. In fact, of the 53 percent of cases caused by process issues, 11 percent were due to poor execution rather than poor or missing processes. Although there are no processes that can adequately address all incidents, ITIL and Six-Sigma practices provide solid starting frameworks and disciplines to implement and reliably utilize processes in a variety of circumstances. Adopting such practices will also help mitigate many incidents.

While processes can play important roles in handling unexpected events and ensuring mistakes don't happen, it's people that help ensure the right steps occur. Investments in proper education and training not only help IT organizations significantly improve their knowledge and skill base, but also can prepare them to manage and mitigate IT risk.

For example, a recent study conducted by IDC1 showed that well-trained teams were twice as likely to properly protect their PCs from security threats and were 60 percent more likely to successfully complete backup jobs. With IT failure occurring more than 40 percent of the time from lack of IT staff skill and training, the need for proper instruction is evident. The following graph from IDC depicts these major factors that contributed most to the success of critical IT functions.

Graph 2. Key success factors for critical IT functions.

Ensuring proper skills in the team enables them to design and put in place the most efficient and effective processes. It equips individuals with the ability to identify and mitigate unforeseen risks, avoid unplanned product downtime and ensure they have the knowledge of best practices for utilizing technology.

Part of creating a resilient infrastructure is building a high performance culture that can manage change effectively. In addition to training, holding IT staff to the highest operational standards, such as those held by other critical business operations within a company, will help streamline the implementation of proper procedures. Much like the manufacturing industry, which tolerates little or no downtime, IT organizations should strive to minimize its level of tolerance for downtime by adhering to stricter policies and procedures.

In order to successfully make this paradigm shift organizations should do the following:

  • Recognize the value and need for investing in training, certification and expertise amongst staff.
  • Provide Six Sigma-like level of attention to IT operations around process definition, documentation, performance measurement, and continuous improvement.
  • Focus on understanding the true root cause of issues rather than settling for convenient explanations, separating near term incident management from longer term problem management.
  • Recognize warning signs and learn from near misses. Become preoccupied with small failures as a signal of deeper process or skills issues that should be addressed before larger failures occur.
  • Build a culture of resilience so that everyone in the organization can react appropriately when inevitable problems occur.

Although there will never be a process for every situation, IT teams can eliminate the root cause of failures - and identify the cause of failures more easily - by establishing and following a standard set of protocols and equipping people with the knowledge to manage and adapt them properly. Only then can organizations build a culture and skill set that addresses the issues standard protocols cannot.

1Source: IDC White Paper sponsored by Symantec, Information Security and Availability: The Impact of Training in IT Organizational Performance, Doc #206922, June 2007.

About the Authors
Bob Yang (senior director, Symantec Education Services), Catherine Anderson (Smith School of Business, University of Maryland) and George Westerman (Center for Information Systems Research, MIT Sloan School of Management).

© Copyright 2007 Auerbach Publications