How to Choose Hardware for Data Storage

How to choose Hardware for data storage

A guide to the complexity of hardware choices

The first problem faced by anyone looking to buy an IT product, be it hardware or software, are the number of choices — it can be paralysing. This is a primary reason for the existence of Apple. However, for businesses looking to implement enterprise-level IT solutions on a budget, there is no way to sidestep learning something about the technology on offer.

Data-storage is no exception to the choice paradox. Is Flash storage appropriate for all of your data? Should you stick to traditional hard-drives? What about Cloud/hybrid-Cloud services? Do you want DAS or SAN or NAS? What is RAID? What is iSCSI and Fibre Channel? Does any of this matter? Do different services provide meaningfully different interfaces or quality levels?

There are a host of vendors that will help you make the right choice. However, a little bit of information can start you headed in the right direction and save you money. This article will introduce you to the basics of hardware choices when it comes to your data storage network. We will compare flash to traditional hard drives and talk about the basic RAID options for SSD or HDD configurations.   

Making the Right Hardware Choice: Flash vs. SSD vs. HDD

The basics of physical drives

HDD [Hard Disk Drive]

HDD is the traditional computer storage unit — a spinning magnetic disk and mechanical read/write head. It is slower to boot and slower to use than SSD. However, HDD is a cheap and proven technology. A major benefit over SSD is that HDD suffers no inherent deterioration through rewriting data.

Physicality and fragmentation are the two fundamental issues with HDD. Because files are stored on a rotating surface, HDDs work best when files are written in a contiguous block. This can become impossible as the drive begins to fill up. Read/write algorithms have helped minimise this problem. However, HDDs tethering to physical movements inherently limits its capability.

The mechanical nature of HDDs also exposes them to failure through simply breaking, and make them more susceptible to physical damage. This increases the likelihood of an unexpected failure.   

SSD [Solid State Drive] and Flash

SSD technically refers to any storage device without moving parts. Flash is SSD, but not all SSD is Flash. Generically, SSD has come to reference Flash hard disks. People are most familiar with Flash USB drives. However, Flash runs your phone, along with a growing number of laptops and enterprise storage arrays. The main benefits of Flash are its speed and size.

The Flash format is based on electric programming. Introduced in 1984, Flash memory cells must be erased before new data can be written. Historically, deterioration caused by this process limited the lifespan of its rewritable cells. In the last 10 years, coding techniques such as TRIM have dramatically solved this problem*. However, it has not been completely eradicated.     

There are three basic types of Flash storage: SLC [Single Level Cell] MLC [Multi Level Cell] and TLC [Triple Level Cell]. These scale from being able to store 1 bit per cell, to 3 bits per cell. These conversely scale down the cost per unit of processing power, but at the cost of long-term durability. A TLC unit will be the cheapest for the power, but will decay faster due to the wear of erasing and rewriting data in a high write environment. Most All-Flash Arrays [AFA] use the compromise MLC configuration.

Flash drives consume up to 50% less power than similar capacity HDDs, and are capable of much faster read/write speeds*. Average HDDs cap out at 120 MB/s, while many Flash drives are capable of read/write speeds in excess of 500 MB/s.

Flash drives are faster, do not suffer from fragmentation issues, are more durable when it comes to physical damage, are less susceptible to unexpected failures, and are dramatically smaller. However, you pay for this with upfront costs*. HDD costs around $0.03 per GB, while SDD will run closer to $0.20 per GB*.  

  • Flash is the fast option
  • Flash is small and uses less power, but is more expensive
  • Flash degrades over time, but coding techniques have helped solve this issue
  • Flash is less susceptible to unexpected failures than HDD

How to Make the Right Hardware Choice: The Limitations of All-Flash Arrays

Flash is great, but it might not solve all of your problems

Hardware Choices in the Context of Your Software: Code Written For Disk

Code developed around the characteristics of disk can cause unnecessary latency and system wear when running on Flash*. Long code paths, the generation of significant metadata and the caching of large amounts of granular snapshots can become a problem for Flash drives*.

Flash drives have to write information in blocks. This means that in order to change one byte in a file, the entire block that contains that alteration must be rewritten. This issue is called ‘write amplification’. If this is not accounted for, either in programming or write techniques, it can undermine some of the speed benefits of Flash and aggravate rewrite deterioration issues.   

These problems can be substantially mitigated through compression algorithms, which many Flash systems use as a standard. A business can also sidestep this issue by rewriting their systems. More simply, the problem can be bulldozed through over-provisioning storage and memory capacity.

Hybrid Environments and the Priority of Information

Consider whether or not all of your data is equal. For many organisations, older data becomes ‘inactive’ — no one is interested in accessing it, but it can’t be deleted*. Under those conditions, optimising access speed by purchasing an expensive new Flash array is not particularly important.   

A solution to this is a hybrid environment — something that can also be utilised to deal with old code. The difficulties here are the IT challenges of migrating the right data to the Flash array and maintaining a multi-tiered system. Purchasing a storage accelerator can help solve this problem.

Hardware Choices in the Context of your Network: Are Your Hard Drives Actually Your Weak Link?

You have to think about your overall network capabilities. If your connection speeds and network architecture won’t allow you to take advantage of the speed of Flash, you simply move your access bottleneck somewhere else. Buying an All-Flash Array [AFA] won’t solve all your problems if your network is poorly put together.

Making the Right Hardware Choice: Is Flash Right for You?

The efficacy of Flash generally comes down to time and budget. If you already have a lot of HDD capacity and operate in a legacy environment written for HDD, keeping some of that hardware around may be advisable and will certainly be economical. However, barring a quantum computing breakthrough, the future is Flash.

Because of the size and power efficiencies of Flash, businesses can save on rack space and energy costs, while achieving higher performance. Flash is truly optimised for an environment in which information is written only a few times and read many more, and where speed matters a lot. If this applies to you, Flash is your best bet.  

  • Do the particularities of the environment you already have make Flash a good option?
  • Think about hybrid data storage solutions

Making a Future Proofed Hardware Choice: The Future of Flash

The basics of what to look out for when it comes to hardware

SSDs have historically been designed to accommodate HDD I/O interfaces — SAS or SATA. New advances in interfacing technology may change that and push All-Flash Arrays [AFAs] into a league of their own, particularly where enterprise storage is concerned.

Nonvolatile Memory Express [NVMe] and NVMe over Fabrics [NVMe-oF] are already available and are interface protocols that allow for the access of memory through thousands of parallel command queues, rather than a traditional single series of commands. The speed capabilities offers are truly huge compared to traditional Flash options and HDD capabilities.

Storage-class Memory [SCM] or Persistent Memory [PMEM] is a developmental technology that operates as both memory and storage simultaneously or interchangeably*. It is basically storage that is so fast that it can be used as DRAM. This is something to look out for more than to think about purchasing today.

How to Configure Your Hardware Choices: RAID [Redundant Array of Inexpensive (-or- Interchangeable) Disks]

Something most data storage systems utilise

RAID is the most basic configuration option you need to think about after you purchase your hardware. It refers to several different techniques for configuring data storage across multiple hard drives. The aim is to increase speed and/or reliability by spreading or mirroring data across multiple drives.

The next article in this series will discuss network options. However, when running more than one physical storage unit, RAID is something you should think about no matter how you decide to connect that array to your network. The most common configurations are RAID 0, RAID 1 and RAID 10.

RAID 0 means using 2 or more drives and striping the data across all of the drives. This increases read/write performance, but decreases reliability by tethering your data to the functionality of all of your drives. If any one drive fails, you lose everything. If using new Flash hardware supported by active backup procedures, this could save you money on processing power. However, it is dangerous.     

RAID 1 means using 2 or more drives to optimise redundancy. The data is mirrored across the 2 drives so if one fails, everything is recoverable. There is no cost to the capacity of the first drive. However, no matter how many drives you use in a RAID 1 configuration, you will get ½ the power. This is a good choice when running ageing HDD hardware.    

RAID 10 attempts to combine the power of RAID 0 and redundancy of RAID 1, requiring a minimum of 4 drives. The data is striped across 2 drives, and then mirrored against the other 2. This gives you about double the performance and capacity of 1 drive (what you would get using 4 drives in a RAID 1 configuration), but the ability to lose up to 2 drives without losing data.   

There are a multitude of RAID configurations worth reading about*. However, one should note that although RAID 1 & 10 add redundancy, they do not ‘backup’ your data. You are still vulnerable to viruses and human errors such as saving overvalued content. RAID simply protects your data from hard drive failure — or, in the case of RAID 0, increases your vulnerability to failure in order to optimise speed.

SUMMARY: The Right Hardware Choices Often Rely on Hybrid Solutions

If you have the money and are operating without much legacy coding or hardware, Flash is a great choice. It is small, easy to use and fast. However, you first need to think about if you even need increased access speed at all*. An All-Flash Array could be a redundant investment considering how you use your storage network.  

Then look at your existing infrastructure. If you are already operating a HDD network, think about keeping that infrastructure around — at least until it hits the end of its natural lifespan. Lastly, make sure you plan appropriately for your migration to Flash.

Although there is significant complexity accompanying data storage, most solutions offer flexibility with growth. There are an increasing number of hybrid control tools on the market, and it is less and less important to look at data storage as distinct choices — but, rather as a continual journey of optimising a number of different solutions*. Flash is part of that journey. But, it isn’t a leap you have to make all at once.  

RAID configurations are something you should take advantage of. If you have a dynamic backup system, you could save money on processing power with a RAID 0 configuration. If operating a bunch of ageing HDDs, choose a more stable option to safely maximise the longevity of your devices.

Once you understand the hardware choice you should make, it is time to think about how your data storage devices should be linked to your network. That means learning about DAS, NAS, SAN and hybrid/integrated systems.

Sources:

* Trim (computing)
* Flash Storage
* SSD vs. HDD: What’s the Difference?
* An Explanation of Read and Write Speeds: How Read/Write Speeds Differ Between SSDs and HDDs
* Coding for SSDs – Part 2: Architecture of an SSD and Benchmarking
* All Flash Arrays and latency
* First XPoint, then Z-NAND: Oh dear, server-makers. SCM is happening
* RAID 2, RAID 3, RAID 4, RAID 6 Explained with Diagram
* Selecting a Storage Solution: The 3 Factors to Consider
* Your Next Storage Purchase: 5 Considerations
* Overcoming the All-Flash Array Implementation Challenges

Posted in

Troy Platts

Troy has spent over 20 years helping organisations solve their data, storage and compute conundrums. He is a regular speaker at vendor events and spends any free time he has keeping abreast of advances in data platform technologies. He also makes a mean curry.

Subscribe to receive the latest content from Nexstor


By clicking subscribe you accept our terms and conditions and privacy policy. We always treat you and your data with respect and we won't share it with anyone. You can always unsubscribe at the bottom of every email.