For more than 50 years, Auerbach Publications has been printing cutting-edge books on all topics IT.

Read archived articles or become a new subscriber to IT Today, a free newsletter.

This free newsetter offers strategies and insight to managers and hackers alike. Become a new subscriber today.


Partners




Contact

Interested in submitting an article? Want to comment about an article?

Contact John Wyzalek editor of IT Performance Improvement.

 

Metrics for Hard Disk Drives and Solid State Devices

Hubbert Smith

Ever run into one of those personalities who, when you ask the time of day, tells you how to build a clock? We're just looking for the time of day—we don't want or need to know how to build a clock. This metaphor applies well to data center storage. Storage vendors and suppliers have been known to claim leadership based on one (and usually only one) dimension of their product.

My favorite example is when a newbie enterprise solid state product marketing person claimed a high performance number (something like 100,000 IOPS, or input-output operations per second) and then crowed that to achieve that same number of IOPS would require some hundreds of 15K rpm drives. While we can relate to the notion that a vendor needs marketing sound bites—such as the world's best storage performance benchmark—we also know performance benchmarks alone are not the whole story. We know IOPS can be large or small; the workload can be random or sequential; the workload can be some mix of reads or writes. The one-dimensional "market-ecture" above, though factually correct, does not remotely resemble anything in the real world. The benchmark above assumed the smallest possible block size (512 bytes), 100 percent random workload, and 100 percent read 0 percent write workload—a situation never encountered in mainstream data storage.

  • In the real world, block sizes vary, but the typical block size is 4,000 bytes (not 512 bytes).
  • In the real world, the workload is sometimes random and sometimes sequential (not 100 percent random).
  • In the real world, there is a mix of reads and writes; the rule of thumb is a 70:30 read:write ratio (not read-only).
  • And obviously, the workload (mix of block sizes, read versus write, and random versus sequential) can vary based on the storage task/application, as well as on the time of day, week, month, quarter, or even year.

Our approach is to focus on real-world benchmarks, real-world use cases, and key components. We make a conscious effort to cull the noise, the irrelevant, and the imponderable from the equation.

We'll discover how to establish your own relevant criteria, applicable to your shop, rather than buying into those one-dimensional talking points. To be fair, to counterbalance the self-serving people; the data center storage industry has no shortage of good folks whose first instinct is to make things right with the customer. The techniques and approaches we'll cover will help you clearly identify those good folks in the industry, in contrast to the other kind.

For HDD building blocks, our approach is to structure a decision-making process around key metrics: price, performance, power, and capacity. As our objective is to turn raw data into useful information; we can take these four key variables (raw data) and evaluate them using key ratios (useful information), as shown in Table 1.

Table 1. Hard Disk Drive Key Ratios (Bigger is Better)

Notice the benefit (performance or capacity) is always in the top of the fraction (numerator), and the expense (cost or power) is always in the bottom of the fraction (denominator).

This way, bigger is always better.

The key ratios chart in Table 2 serves to simplify the total storage view. It tells us 10K drives are better in GB/$, better in GB/watt, and better in IOPS/$; but not better in IOPS/watt than 15K rpm drives. It also serves as the underlying data for Figures 1 through 7.

Table 2. Hard Disk Drive Key Ratios Raw Data (Sources: Vendor HDD data sheets, Nextag.com for approximate price, storagereview.com for approximate Web server performance.)

Storage system engineering is sometimes (but not always) about performance, and it's also important to see the entire picture including price, power, and capacity (Figure 1).

Figure 1. Key Metrics for 2.5" Small Form Factor 15K rpm, 146G

Clearly, the strength of this product is IOPS/watt. It's noticeably anemic in the areas of GB/$, GB/watt, and IOPS/$. The creators of this small form factor 2.5" HDD product were motivated by their IT customers to add more storage performance in over-full data centers with limited power, limited A/C, and limited floor space (sound familiar?).

In situations where slow storage interferes with end-user productivity (and, as a result, this costs the company money), this class of performance-optimized HDD or SSD is the right tool for the job. But in situations where storage performance has a minimal impact on end-user productivity (e.g., e-mail), there are other, more financially responsible tools for the job.

Let's review the same chart for a typical 10K rpm drive (Figure 2).

Figure 2. Key Metrics for a 3.5" 10K rpm, 400G

The 10K rpm product diagrammed above shows some balance across IOPS/$, IOPS/watt, GB/$, and GB/watt. This characterization is for a 3.5" drive. It consumes more power, but it also has more platter space with better total capacity, better capacity/$, and good sequential performance. The ratios improve our shared understanding of the merits of a specific HDD (the basic building block of storage).

Figure 3. Combined Chart for Evaluation of Key Storage Ratios: IOPS/$, IOPS/watt, GB/$, GB/watt (Bigger is Better)

What does Figure 3 tell us? We are looking at the tradeoffs between a 10K rpm 450G drive as compared to a 15K rpm, 144G drive. In this example (no surprise) the 10K rpm drive exceeds the 15K drive in GB/watt and GB/$. Also (no surprise) the 15K drive exceeded the 10K drive in IOPS/watt. The interesting surprise, however, is that the 10K drive exceeded the 15K drive in IOPS/$.

So, do we conclude we should use 10K rpm drives throughout your system?

This analysis indicates we should use 10K rpm drives as the default. But, when performance is the top criteria, this analysis leads us to apply 15K rpm drives.

This is just a simple example. And the analysis gets even more interesting when we add enterprise solid state devices (SSDs).

We know some systems should be optimized for performance, and other systems should be optimized for capacity, and still other systems should be optimized for a combination of both performance and capacity. With this insight and structure, we'll be able to objectively compare and buy the right tool for the job. Later in the book, our sections on Service Level Agreements (SLAs) will map to this approach. Sometimes we need a moving van, sometimes a sportscar, right? This approach balances technology against power and cost of ownership. This metrics/ratio approach will drive closure on the question of "when is good enough really good enough?"

It gets better in Figure 4.

Figure 4. Key Metrics for a 5400 rpm, 2000G, 120 est IOPS, est $180 (Bigger is Better)

We could double the scale of this diagram and the GB/watt (almost 300) and GB/$ (11) would still be off the scale.

The capacity-optimized disk drive is an incredible tool to achieve economic delivery of service levels. Capacity-optimized drives are not the right tool for situations where storage performance has an impact on user productivity and therefore costs the company money, but in almost every other instance the capacity-optimized drive is a tool that truly can save money and still get the job done.

There are data center professionals who have serious reservations regarding the reliability of high-capacity drives in the enterprise as well as regarding the use of SATA (serial advanced technology attachment) as an enterprise drive interface. It's likely these reservations are based on stale information and assumptions. Highly reliable capacity-optimized drives have been shipping for the better part of a decade. They are available in both SAS interface (for dual controller implementations) and SATA (for single controller and server-direct-attached implementations). These enterprise-class capacity-optimized drives (Raid Edition or NL-Nearline) demonstrate 1.2 million hours mean time to failure, consistent with other 10K and 15K drives.

Although there is much more to the subject than we touch on here (we will cover it in later sections on manual tiering and automated tiering), solid state devices make great sense when used in conjunction with capacity-optimized drives. SSDs make limited sense in general IT applications employing single-tiered approaches. But an approach that uses SSDs plus capacity-optimized HDDs properly, in two-tier applications, offers a significant advantage in IOPS/$, IOPS/watt, GB/$, and GB/watt over any single-tier storage system (see Figures 5, 6, and 7).

Figure 5. Key Metrics Comparison for SSD; Assume 100GB, 2000 IOPS, $600 (Bigger is Better)

Figure 6. Key Metrics Comparison for SSD with Capacity-Optimized 7200 rpm Drives (Bigger is Better)

Figure 7. Key Metrics Comparison for 15K, 10K, 5400 rpm, and SSD (Bigger is Better)

Notice the storage device classes that are strongest: capacity-optimized and SSD. Everything else is a compromise. For the upcoming sections, we will walk before we run, so I will not mention SSDs for the next few sections, but we'll cover SSDs and where and why they make financial sense later in the book.

The bottom line is that these products (high-capacity HDD, 15K rpm HDD, SSDs) align to storage service levels (Tier 2 and Tier 1). When these technologies, plus people, plus processes, are intelligently applied to deliver on service levels and manage aging data from tier-to-tier-to-tape archival, the operational savings and capital savings are compelling.

Table 3 shows the underlying data driving the charts in this article.

Table 3. Raw Data for Key Metrics Comparison for 15K, 10K, 5400 rpm, and SSD (Source: Vendor data sheets for typical drives, typical performance data (Web server IOPS) from storagereview.com, typical prices from nextag.com)

My point is that considering the ratios of IOPS/$, IOPS/watt, GB/$, and GB/watt enables us to avoid getting tangled in information spaghetti. Using key ratios, we trade confusion for clarity as we compare one class of drives to another class of drives. New HDD products will emerge with improved capacities, improved performance, improved pricing.

I hope that makes sense, and that we can declare "confusion avoided" instead of falling victim to analysis paralysis or stalling our investigation.

A side note on the drives we've examined: You may read this and think, "He said 400 GB, didn't he mean 450 GB?" At the time I put this section together the 400G 10K rpm drive was based on four platters of 100 GB each. Along the way, the platter density changed from 100 GB to 150 GB per platter. Now we see a 450 GB 10K rpm product, not a 400 GB product. That's the nature of the HDD industry.

The 450 GB product is based around three platters of 150 GB each (significantly less expensive to produce; its higher bit density offers higher sequential performance). The raw data will change quickly; it's the ratios that are the main event. Ratios turn data into information. ♦

Read more IT Performance Improvement

This article is an excerpt from the book:

Data Center Storage:
Cost-Effective Strategies, Implementation, and Management

We overspend on data center storage … yet, we fall short of business requirements. It's not about the technologies. It's about the proper application of technologies to deliver storage services efficiently and affordably. It’s about meeting business requirements dependent on data center storage. Spend less, deliver more.

Data Center Storage: Cost-Effective Strategies, Implementation, and Management provides an industry insider’s insight on how to properly scope, plan, evaluate, and implement storage technologies to maximize performance, capacity, reliability, and power savings. It provides business and use-case focused coverage of storage technology, including storage area networks (SAN), capacity-optimized drives, and solid-state drives. It offers key insights on financially responsible spending for data center storage.

About the Author

Hubbert Smith is an enterprise storage veteran with 25+ years of experience at Kodak, NCR, Intel, WDC, Samsung, and currently with LSI. He is a published author: Serial ATA Architectures and Applications; and patent holder: USPTO 7,007,142 Network Data Storage-related operations. Mr. Smith successfully managed 25 tech projects, negotiated and executed 15 significant technology-related business deals, and was a significant contributor in establishing the Serial ATA industry.