ÿþ<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8"> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <html> <head> <title>Using Backup and Recovery to Track and Forecast Data Growth</title> <link rel=stylesheet type="text/css" href="http://ittoday.info/ITPerformanceImprovement/ITPIStyleSheet.css"> </head> <body style="background-color:#FCFCFC; font-family:sans-serif"> <center> <div class="banner"> <table width="1050" border="0" cellpadding="5" style="background-color:#1F7AA3"> <tbody> <tr valign="center"> <td width="25"> <p>&nbsp;</p> </td> <td width="210" align="left" > <form action="http://www.ittoday.info/ITPerformanceImprovement/ITPISearchResults.html" id="cse-search-box"> <p class="banner"> Search This Site <br> <input type="hidden" name="cx" value="007808019282534143292:vj47kjtjrzk" /> <input type="hidden" name="cof" value="FORID:9" /> <input type="hidden" name="ie" value="UTF-8" /> <input type="text" name="q" size="15" /> <input type="submit" name="sa" value="Search" /> <SCRIPT type=text/javascript src="http://www.google.com/coop/cse/brand?form=cse-search-box&amp;lang=en"></SCRIPT> <br><br> Share this Site <script type="text/javascript" src="http://w.sharethis.com/widget/?tabs=web%2Cpost%2Cemail&amp;charset=utf-8&amp;style=default&amp;publisher=c79f04e4-7f5b-483e-8eb1-d52b525c3483"></script> </form> </p> </td> <td width="400" align="center" valign="center"> <a href="http://ittoday.info/ITPerformanceImprovement/"><img src="http://www.ittoday.info/images/ITPIBannerTranspW.gif" border="0" align="top"></a> </td> <td width="30"> <p>&nbsp;</p> </td> <td width="275" align="left"> <form method="GET" action="http://oi.vresp.com?fid=f853d545d5" target="vr_optin_popup"> <p class="banner"> Subscribe Free to ITPI</b> <br> <input name="email_address" size="20"/> <input type="submit" value="Subscribe"/> <font style="font-size:.9em; text-align:left; vertical-align:center;"> <br> <b>Enter e-mail address</b> <br>Powered by <a class="banner" href="http://www.verticalresponse.com/?ref=oif" title="Email Marketing by VerticalResponse">VerticalResponse</a> </font> </form> </p> </td> </tr> </tbody> </table> </div> </center> <center> <table width="1100" valign="top" border="0" cellspacing="20"> <tbody> <tr valign="top"> <td> <table width="160" cellpadding="10" border="0" style="background-color:#DDEBF1"> <tbody> <tr> <td> <center> <a href="http://www.auerbach-publications.com"><img src="http://www.ittoday.info/images/Auerbach.gif" border="0" align="bottom" width="62" height="62"></a> </center> <p style="font-size:.75em; text-align:left; vertical-align:center; color:black">For more than 50 years, Auerbach Publications has been printing <a href="http://www.ittoday.info/catalog/cataloghome.htm">cutting-edge books on all topics IT</a>. <br> <br> </p> </td> </tr> <tr> <td> <center> <a href="http://www.ittoday.info"><img src="http://www.ittoday.info/images/ITTodayLogo125px.jpg" border="0" align="top" ></a> </center> <p style="font-size:.75em; text-align:left; vertical-align:center; color:black"><a href="http://www.ism-journal.com/ITToday/ITTarchives.htm" target="blank">Read archived articles</a> or become a new <a href="http://www.ittoday.info">subscriber</a> to <b>IT Today,</b> a free newsletter. <br> <br> </p> </td> </tr> <tr> <td> <center> <a href="http://www.infosectoday.com/"><img src="http://www.ittoday.info/images/InfoSecLogo_125px.jpg" border="0" align="top"></a> </center> <p style="font-size:.75em; text-align:left; vertical-align:center; color:black">This free newsetter offers strategies and insight to managers and hackers alike. <a href="http://www.infosectoday.com/">Become a new subscriber</a> today. <br> </p> </td> </tr> <tr> <td> <center> <hr width="50%"> </center> </td> </tr> <tr> <td> <p style="font-size:1em; text-align:center; vertical-align:top; color:black;"> <b>Partners</b> <br> </p> </td> </tr> <tr> <td> <center> <a href="http://www.productivitypress.com" target=blank><img border=0 hspace=0 align=center src="http://www.ittoday.info/images/Productivity_Press.jpg"> </a> <br> <br> <center> </td> </tr> <tr> <td> <center> <A href="http://www.guidedinsights.com" target=blank> <img src="http://www.ittoday.info/images/Guided.jpg" border="0"> </a> <center> <!-- <p><a href="http://www.guidedinsights.com" target="blank">Guided Insights</a> helps global project teams speed time to results through better collaboration across time zones, cultures and other boundaries. Special areas of focus are remote team leadership, facilitation skills, virtual team collaboration, project jumpstart workshops and design and facilitation of virtual meetings.</p> --> </td> </tr> <tr> <td> <hr width="50%"> </center> </td> </tr> <tr> <td> <p style="font-size:1em; text-align:center; vertical-align:top; color:black;"> <b>Contact</b> </p> <p style="font-size:.75em; text-align:left; vertical-align:top; color:black; font-weight:normal;"> Interested in submitting an article?&nbsp;Want to comment about an article? <br> <br> Contact <a href="mailto:John.Wyzalek@TaylorandFrancis.com">John Wyzalek</a> editor of <b>IT Performance Improvement.</b> <br> </td> </tr> </tbody> </table> </td> <td width="1px"> <p style="font-size:.05em; text-align:center; vertical-align:top; color:black;">&nbsp;</p> </td> <td width="550"> <table border="0"> <tbody> <tr valign="top"> <!--September 2010--> <td> <h2>Using Backup and Recovery to Track and Forecast Data Growth</h2> <p style="font-size:1em; text-align:left; vertical-align:top; color:black; font-weight:bold;">Preston de Guise</p> <p class="text">Even before the harsh economic reality caused by the global financial crisis, management at many companies felt overwhelmed by the challenge of predicting and budgeting for storage growth. </p> <p class="text"> A conventional problem with trying to track and budget storage growth is that it s actually not the storage we really care about&#151;it is the <i>data</i> that resides on the storage that is tangible to the business. This leads to generic storage growth estimates, such as &#147;40&#037; per annum&#148; that effectively become a self-fulfilling prophecy without little empirical evidence. In other words, the estimate is fed into budgetary requests, which become storage growth, which then becomes a year-on-year &#147;fact.&#148; </p> <p class="text"> While it s relatively easy to track the number of hard drives installed in enterprise arrays, or the size of the LUNs allocated to individual systems, this information doesn t readily map to the amount of <i>used</i> data, which in turn means that it doesn t readily map storage growth to data growth. Furthermore, the growth of thin provisioning&#047;just-in-time storage allocation demonstrates that traditional allocation methods are too inflexible. For instance, it s not uncommon in a traditional allocation environment to see systems with hundreds of gigabytes or sometimes terabytes of additional storage allocated that they don t yet (and may never) need. Thin provisioning of course allows this space to be pseudo allocated, but actually consumed from physical storage at a much more granular rate. </p> <p class="text"> Tracking and forecasting data growth is therefore more complex than just calculating the total capacity in SANs, DAS and NAS. What IT organisations may not appreciate though is that when considered from the perspective of a comprehensive backup strategy, monitoring data growth (or at least production data growth) can actually be quite trivial. </p> <p class="text"> More so, as storage thin provisioning&#047;just in time allocation becomes increasingly used within business, the data forecasting out of backup systems will also provide the added benefit of allowing storage administrators and management to accurately track when incremental provisioning is likely to be required, and not be &#147;caught out.&#148; This becomes even more important when thin provisioning is coupled with virtualisation, meaning that a failure to allocate storage growth at the right time may impact not one, but possibly dozens of production hosts. <p class="text"> If we, for instance, consider a &#147;classic&#148; backup model involving weekly full backups with daily incrementals, then every week s full backup represents a snapshot of the amount of data used. To be sure, it may not represent a snapshot of <i>all</i> data used within the environment, but it will at least be representative of all the <i>important</i> or <i>business critical</i> data used within the environment. </p> <p class="text">This data-usage snapshot though is not just a single number. It s likely (depending on the configuration) that at least the following details will be relatively simply extractable from the system: <ul class="text"> <li>For each host backed up, the amount of data backed up per filesystem</li> <li>For NAS systems, the amount of data backed up per file share</li> <li>For databases, the amount of data backed up per database or at least for all databases on the system.</li> </ul> </p> <p class="text">As a single snapshot of information, all of the above is interesting to a business, but hardly useful of itself in determining data growth. What becomes interesting is when the weekly data-used details are extracted and compiled into longer-running statistics. For the smaller businesses, this might literally be a spreadsheet, with appropriate breakdowns or pivot tables to categorise the data growth by business function, business-unit or even just simply originating host. </p> <p class="text">Larger businesses may either choose to fully automate this collation process through comprehensive scripting, or, if the expected savings of accurate data growth analysis is considered worth it, purchase commercial software explicitly designed for this purpose. (An example would be EMC s Data Protection Advisor (DPA) software.) </p> <p class="text">When developing models of data growth trends, and ultimately using them to forecast storage expansion, it s important that the data gathered from the backup system be coupled with an understanding of business practices and cycles. This allows regular spikes or troughs to be factored into forecasts more smoothly. For example, project-based companies may experience significant changes to data usage as projects are closed or archived; similarly, educational facilities need to be aware of the trends that surround semesters, and financial organisations are likely to have varying levels of growth depending on end of month, end of quarter or end of financial year. </p> <p class="text">Often this is as simple as developing forecasting models that use rolling averages. For instance, to mitigate periodic spikes and troughs, having monthly growth calculations that sees any month <i>n</i> averaged against months <i>n-1</i> and <i>n-2</i> can produce a smoother prediction of the growth cycle. </p> <p class="text">In Table 1, consider the following used data amounts collated as a monthly total (for simplicity) over a year. </p> <center> <img src="http://ittoday.info/ITPerformanceImprovement/Articles/Sept2010deGuise1.JPG" border="0"> </center> <p class="text"> <b>Table 1.</b> Sample Data Amounts Used in One Year </p> <p class="text">Figure 1 is a graph of these data amounts and shows a data usage trend. </p> <center> <img src="http://ittoday.info/ITPerformanceImprovement/Articles/Sept2010deGuise2.JPG" border="0"> </center> <p class="text"> <b>Figure 1. </b>Data Usage Trend </p> <p class="text">However, the peaks and troughs mean that the forecast at any one month does not represent actual data growth over the entire period; thus, working on the previous average (and assuming no data available for months from the previous year) we would get averages as shown in Table 2. </p> <center> <img src="http://ittoday.info/ITPerformanceImprovement/Articles/Sept2010deGuise3.JPG" border="0"> </center> <p class="text"> <b>Table 2. </b>Average Data Usage </p> <p class="text">Figure 2 is a graph of the rolling average instead of the raw data. It shows a more representative data growth model. <center> <img src="http://ittoday.info/ITPerformanceImprovement/Articles/Sept2010deGuise4.JPG" border="0"> </center> <p class="text"> <b>Figure 2. </b>Data Growth Trend </p> <p class="text">Effectively then, the process of collecting information about backup sizes needs to be a continuous metric; each new set of data from full backups can be added to the model, constantly updating the accuracy and longevity of the predictions. In an environment already using traditional storage predictions (e.g., &#147;40&#037; growth per annum&#148;), there will be an overlap period of course where the old model needs to be retained while the accuracy of the new model is developed. Over time though&#151;at least for the type of data protected by the enterprise backup system&#151;the accuracy of the data growth model will grow to a point where it can form the core component of storage expansion forecasting.&nbsp;&#9830; <br> <br> <b>Read more <a href="http://ittoday.info/ITPerformanceImprovement/index.htm"><i>IT Performance Improvement</i></a></b> </p> <!--DISCLAIMER NOTICE AND COPYRIGHT--> <p class="copyright"> <br> <br> Certain names and logos on this page and others may constitute trademarks, servicemarks, or tradenames of <a HREF="http://www.crcpress.com" TARGET="_parent">Taylor & Francis LLC.</a> Copyright &#169; 2008&#151;2010 Taylor & Francis LLC. All rights reserved. </p> </td> </tr> </tbody> </table> </td> <td width="300"> <table> <tbody> <tr> <td> <table style="background-color:#E0E0D1;" cellpadding="10" margin="5" border="0" valign="top"> <tbody> <tr> <td colspan=2> <h4>Related Book</h4> </td> </tr> <tr valign="top"> <td> <center> <img src="http://www.ittoday.info/catalog/images/covers80w/AU6396.jpg" Border=0> </center> </td> <td> <p class=text> <a href="http://www.crcpress.com/shopping_cart/products/product_detail.asp?isbn=9781420076394&AF=WAUER" target="blank">Enterprise Systems Backup and Recovery</a> <br> <br>Preston de Guise </p> </td> </tr> <tr> <td colspan="2"> <p class=text>Instead of focusing on any individual backup product, this book recommends corporate procedures and policies that need to be established for comprehensive data protection. </p> </td> </tr> <tr> <td colspan="2"> </td> </tr> <tr> <td colspan="2"> <h4>About the Author</h4> <p class="text"> <b>Preston de Guise</b> is a long-term data protection consultant. He is also an author, a programmer and an IT geek. He writes the <a href="http://nsrd.info/blog/" target="blank">Networker Blog</a> and can be followed on <a href="http://twitter.com/prestondeguise" target="blank">Twitter</a>. He currently works for <a href="http://www.idataresolutions.com" target="blank"> IDATA Resolutions</a>. <br> </p> </td> </tr> </tbody> </table> </td> </tr> </tbody> </table> </td> </tr> </tbody> </table> </table> </body> </html>