Virtualization, Storage Tiers and Manual Data Movement
The main problem with unstructured data growth is rooted in the difficulty in managing data lifecycle. Data begins life as hot and frequently accessed data, then data ages, and at some point data should be moved to lower tier. As Figure 1 illustrates, moving data to a lower tier breaks the links.
Figure 1. The Challenge of Migrating Unstructured NAS Data
In this example, "Volume-A" (2) is filling up and aging data needs to be moved to "Volume-B" (3), physically copying the data is no big deal, the problem is, every link breaks, and the servers (1) look for data on Vol-A, where it once was, but the data is now on Vol-B and the servers cannot find it. Yuk. The path of least resistance is to simply expand "Volume-A". and since RAID groups must consist of the same, consistent drive type (ex. A RAID group consists of fifteen 600GB 15K drives). Expanding that RAID group means adding more 600G 15K drives, no option to expand with capacity optimized drives.
The alternative is a Virtual File approach, which uses file stubs and pointers (See Figure 2). The Client sees the stub, and the virtualization software accesses a lookup table, which identifies the stub and the location of the real data. The big advantage is the stubs (i.e., pointers) can remain in place and therefore not disrupt client access to files, but the files themselves can be migrated as they age. This translates to less expensive storage, less power, less data center floor space, less backup on the old data, more backup on the new and changing data.
Figure 2. File Virtualization and Migration Using a Pointer System
Clients have pointers (1) to NAS data. The virtualization system provides file pointers (2) which allow clients to find the actual data (3). The beauty of this system is if the storage administrator chooses to move some portion of the data from Tier-1 storage, "Volume-A" (4), to Tier-2 storage "Volume-B" (5). the files can be moved from Volume-A to volume-B and the pointers (2), are updated to pint to Volume-B. It's all good, data migrated, no server or client reconfiguration required, no unstoppable growth for expensive storage; and it all still works. That's the manual version where humans are involved in data movement.
There is a more advanced version where the system has some fast storage and some slower/bigger storage and the system can figure out what data to put where, using a similar pointer systems, without human intervention. As wild as that sounds, this computer science problem has been solved many, many times: Servers and PCs have RAM, the operating system manages what data is stored in RAM and what data is stored on hard disk drive(s). Even better, many servers have good-better-best cache L0, L2 and L3 caching systems.
Migration is a manual process to migrate data from an old hardware platform to a newer better hardware platform (retiring old hardware saves money and improves the consistency/manageability/supportability of IT hardware).
This is similar, but rather than retiring hardware, we keep the hardware and move the data as it goes through it's lifecycle. Virtualization is the tool that enables data movement from Tier-1 to Tier-2 to archive. The important function is to keep Tier-1 data clear of stale data and make room for new in-coming hot data (See Table 1).
Table 1. Tiering Before and After
Information Lifecycle Management (ILM or HSM)
Information Lifecycle Management (ILM) and Hierarchical Storage Management (HSM) have both been around for years and attack the same problems as migration and tiered data movement. Some IT shops find these tools invaluable, some IT shops have found another way to manage aging data (usually involving scripts). A while back (around 2004) I interviewed a data center manager for a bank. His bank had a process of keeping recent banking records (less than 32 days old) on tier-1 storage, and any records older than 32 days went to high capacity storage using SATA drives. And any data older than a year was archived. I asked what portion of the data was in Tier-1 and what portion of the data was in tier-2, he replied 10% to 90%. He used capabilities in the banking database to select/extract older data, move that data from tier-1 storage to tier-2. He used processes and scripting, home grown automation, rather than ILM or HSM to manage the banks tiering. He had the advantage of dealing with highly structured Relational database data and was able to take advantage of the database tools.
The point is, managing data migration and storage tiering will save money. My banking friend managed the data aging and saved his employer big money. The alternative would have been to keep growing the Tier-1 storage.
Read more IT Performance Improvement
Certain names and logos on this page and others may constitute trademarks, servicemarks, or tradenames of
Taylor & Francis LLC. Copyright © 20082013 Taylor & Francis LLC. All rights reserved.