For more than 50 years, Auerbach Publications has been printing cutting-edge books on all topics IT.

Read archived articles or become a new subscriber to IT Today, a free newsletter.

This free newsetter offers strategies and insight to managers and hackers alike. Become a new subscriber today.


Partners




Contact

Interested in submitting an article? Want to comment about an article?

Contact John Wyzalek editor of IT Performance Improvement.

 

Virtualization, Storage Tiers and Manual Data Movement

Hubbert Smith

The main problem with unstructured data growth is rooted in the difficulty in managing data lifecycle. Data begins life as hot and frequently accessed data, then data ages, and at some point data should be moved to lower tier. As Figure 1 illustrates, moving data to a lower tier breaks the links.

Figure 1. The Challenge of Migrating Unstructured NAS Data

In this example, "Volume-A" (2) is filling up and aging data needs to be moved to "Volume-B" (3), physically copying the data is no big deal, the problem is, every link breaks, and the servers (1) look for data on Vol-A, where it once was, but the data is now on Vol-B and the servers cannot find it. Yuk. The path of least resistance is to simply expand "Volume-A". and since RAID groups must consist of the same, consistent drive type (ex. A RAID group consists of fifteen 600GB 15K drives). Expanding that RAID group means adding more 600G 15K drives, no option to expand with capacity optimized drives.

The alternative is a Virtual File approach, which uses file stubs and pointers (See Figure 2). The Client sees the stub, and the virtualization software accesses a lookup table, which identifies the stub and the location of the real data. The big advantage is the stubs (i.e., pointers) can remain in place and therefore not disrupt client access to files, but the files themselves can be migrated as they age. This translates to less expensive storage, less power, less data center floor space, less backup on the old data, more backup on the new and changing data.

Figure 2. File Virtualization and Migration Using a Pointer System

Clients have pointers (1) to NAS data. The virtualization system provides file pointers (2) which allow clients to find the actual data (3). The beauty of this system is if the storage administrator chooses to move some portion of the data from Tier-1 storage, "Volume-A" (4), to Tier-2 storage "Volume-B" (5). the files can be moved from Volume-A to volume-B and the pointers (2), are updated to pint to Volume-B. It's all good, data migrated, no server or client reconfiguration required, no unstoppable growth for expensive storage; and it all still works. That's the manual version where humans are involved in data movement.

There is a more advanced version where the system has some fast storage and some slower/bigger storage and the system can figure out what data to put where, using a similar pointer systems, without human intervention. As wild as that sounds, this computer science problem has been solved many, many times: Servers and PCs have RAM, the operating system manages what data is stored in RAM and what data is stored on hard disk drive(s). Even better, many servers have good-better-best cache— L0, L2 and L3 caching systems.

Migration is a manual process to migrate data from an old hardware platform to a newer better hardware platform (retiring old hardware saves money and improves the consistency/manageability/supportability of IT hardware).

This is similar, but rather than retiring hardware, we keep the hardware and move the data as it goes through it's lifecycle. Virtualization is the tool that enables data movement from Tier-1 to Tier-2 to archive. The important function is to keep Tier-1 data clear of stale data and make room for new in-coming hot data (See Table 1).

Table 1. Tiering Before and After

Information Lifecycle Management (ILM or HSM)

Information Lifecycle Management (ILM) and Hierarchical Storage Management (HSM) have both been around for years and attack the same problems as migration and tiered data movement. Some IT shops find these tools invaluable, some IT shops have found another way to manage aging data (usually involving scripts). A while back (around 2004) I interviewed a data center manager for a bank. His bank had a process of keeping recent banking records (less than 32 days old) on tier-1 storage, and any records older than 32 days went to high capacity storage using SATA drives. And any data older than a year was archived. I asked what portion of the data was in Tier-1 and what portion of the data was in tier-2, he replied 10% to 90%. He used capabilities in the banking database to select/extract older data, move that data from tier-1 storage to tier-2. He used processes and scripting, home grown automation, rather than ILM or HSM to manage the banks tiering. He had the advantage of dealing with highly structured Relational database data and was able to take advantage of the database tools.

The point is, managing data migration and storage tiering will save money. My banking friend managed the data aging and saved his employer big money. The alternative would have been to keep growing the Tier-1 storage. ♦

Read more IT Performance Improvement

This article is an excerpt from the book:

Data Center Storage:
Cost-Effective Strategies, Implementation, and Management

We overspend on data center storage … yet, we fall short of business requirements. It's not about the technologies. It's about the proper application of technologies to deliver storage services efficiently and affordably. It’s about meeting business requirements dependent on data center storage. Spend less, deliver more.

Data Center Storage: Cost-Effective Strategies, Implementation, and Management provides an industry insider’s insight on how to properly scope, plan, evaluate, and implement storage technologies to maximize performance, capacity, reliability, and power savings. It provides business and use-case focused coverage of storage technology, including storage area networks (SAN), capacity-optimized drives, and solid-state drives. It offers key insights on financially responsible spending for data center storage.

About the Author

Hubbert Smith is an enterprise storage veteran with 25+ years of experience at Kodak, NCR, Intel, WDC, Samsung, and currently with LSI. He is a published author: Serial ATA Architectures and Applications; and patent holder: USPTO 7,007,142 Network Data Storage-related operations. Mr. Smith successfully managed 25 tech projects, negotiated and executed 15 significant technology-related business deals, and was a significant contributor in establishing the Serial ATA industry.