Things your Mother never told you about RAID arrays

What follows is a dissertation I received about RAID arrays.

It seemed like good info to post to the list.

Todd Anderson

----------------------------------------------------------------------------

Hard Disk Fault Tolerance

When more than one physical disk is used by the server, it is possible to
configure them in a 'stripe set' or a 'mirrored set' that is intended to
improve the speed of disk operations and to provide for tolerance in the
case of drive failure. This is a complex area, because the striping or
mirroring can be controlled by dedicated hardware, the operating system, or
even disk controller drivers. Each has merits under different circumstances.


Also, it is possible to include redundant drives that offer a certain level
of fault tolerance (useful for a mission critical system). RAID 1, RAID 5
and RAID 10 all provide for fault tolerance.

RAID Levels

The RAID concept (Redundant Arrays of Inexpensive Disks) was first proposed
in 1987 by researchers Patterson, Gibson, and Katz at the University of
California at Berkeley. They proposed that the cost, performance, size
limitations, and reliability of storage subsystems could be improved by
combining several small disks into a disk "array" that is perceived as a
single disk by applications (and the operating system). They described five
ways of combining disks, numbered 1 through 5. Later, RAID 0 was added to
the list. Combination RAID levels are also possible to an extent, RAID10 is
a combination on Raid 1 (Mirrors) and RAID 0 (Striping),
in this case the benefits of each RAID type are exploited. Each
configuration, however, has different costs, benefits, and drawbacks. RAID
levels 2, 3 and 4 are not commercially viable, and are excluded from this
document.

RAID 0

RAID level 0 is also called "striping". In this configuration, a group of
disks act as a single disk (a "stripe set") with data striped across all the
disks. The striping strategy is that the total storage space divided into
equal-size sections or "stripe blocks" that are allocated round-robin among
all the disks. The size of the striper block is configurable is normally a
megabyte long or longer. Technically RAID0 is not RAID at all, as there is
no redundancy.

Benefits
* Read and write performance for large sequential files is greatly enhanced
because all the disks in the stripe set can be accessed in parallel. Read
and write performance for small random access transfers is about the same as
for separate disks, but under special circumstances can
actually be worse.

* Disk activity will, on the average, be balanced across all the disks in
the stripe set so no one disk will become a "hot spot" with much more
activity than the others. This improves performance on the average for all
users.

* Several small disks can be combined into a single large logical disk to
provide increased storage capacity.

Drawbacks
* Reliability is extremely low. Failure of a single disk causes loss of the
entire stripe set. The mean time to failure (mttf) of the entire stripe set
is inversely proportional to the number of disks.
* Small file handling can be cumbersome if the file sizes are slightly
larger than the stripes.

RAID 1

RAID level 1 is also called "mirroring". In this configuration, a duplicate
copy of each disk is stored on a second "mirror" drive. All data are written
to both sides of the mirror. This provides redundancy and fault tolerance
because each disk is duplicated. If one fails, its mirrors can still be
accessed and no data are lost because of the failure. One can configure RAID
1 to have more than one duplicate (e.g. "triple mirroring") copy of each
disk; this is generally regarded as overkill however.

Benefits
* Reliability is very high.
* A disk failure has no impact on WRITE performance.
* If any disk fails, its mirror can continue to function and the failed disk
can be replaced when it is convenient. Many modern storage subsystems are
designed so that failed components can be replaced while the storage
subsystem is fully operational.
* Read performance is better than single drives because data can be read
from either side of the mirror.

Drawbacks
* Cost is highest. The cost per megabyte is slightly more than doubled (or
tripled if two redundant copies are made) compared to single drives. On the
other hand, disks are increasingly
inexpensive thereby offsetting this issue to a large extent.
* There can be a very slight degradation in write performance compared to
single drives.

RAID 5

In RAID level 5 configurations, data are striped across several disks along
with "parity" data. The striping strategy is the same as for RAID 0
(relatively large stripe blocks). The parity data is distributed across the
drives in such a way that a data group and its parity information
are always written to different devices. This technique allows
reconstruction of all data that was present on any single drive that has
failed. Some implementations allow the parity drive to be a single drive,
others allow this parity information to be stored on the other drives in the
array.

Benefits

* READ performance can be quite good, especially where multiple disks are
RAID'ed into an array.
* Reliability will be high. If one disk fails, the remaining disks continue
to function without loss of data or availability. The failed disk can be
replaced when it is convenient.
* Cost is lower when compared with mirroring (RAID 1), a typical RAID5 array
consists of three disks, giving 2/3 availability of the storage purchased (3
x 18Gb drives would give around 36GB). ( Note: You can setup RAID-5 arrays
of up to seven drives with hotspares included. )

Drawbacks.

* WRITE performance has the potential to be terrible for database based
applications. This is because the parity data must be updated whenever a
block is written. In the worst case, writing a single database block
requires four or more (in the case of a block being split over stripes) i/o
operations. The following operations are required:
1. Read the old stripe block
2. Read the old parity data
3. Exclusive or the old data group with the parity data
4. Merge the new database block into the old data group
5. Exclusive or the new data with the parity data
6. Write the new stripe block
7. Write the new parity data

"But since the data and parity are on separate drives, they can be read in
parallel" you say. Whilst this is true, reading two disks at the same time
uses up half your disk bandwidth. As you
can see, writing can use even more than half, giving the potential for
serious WRITE performance concerns.
* Although a failed disk can be replaced, when they are, reconstructing its
contents requires reading the entire contents of every other disk, writing
the entire contents of the failed disk, and rewriting the parity information
on all other disks. This can adversely affect overall
system performance whilst this operation takes place.
* In the case of a single Parity disk on low-cost RAID5 implementations,
this disk is often considered to be a point failure problem.

RAID 10

RAID level 10 is a combination of Mirroring and Striping. This takes a
minimum of four identical drives to implement, and gives all of the
advantages of both the striping and mirroring concepts, and reduces the
disadvantages. In this configuration, a duplicate copy of each pair of
striped disks is stored on a second pair of mirror drives. All data are
written to both sides of the mirror. This provides redundancy and fault
tolerance because each disk is duplicated. If one fails, its mirrors can
still be accessed and no data are lost because of the failure. This provides
the high performance aspects of Mirrored READ's, and striped WRITE's, In a
multiple disk environment this costs no more that RAID1, and only 1/3 more
than RAID5..

Benefits

* Reliability is very high. A disk failure can have almost no impact on
performance.
* If any disk fails, its mirror can continue to function and the failed disk
can be replaced when it is convenient. Many modern storage subsystems are
designed so that failed components can be replaced while the storage
subsystem is fully operational (Hot Swap).
* Read and write performance for large sequential files (such as those that
databases use) is greatly enhanced because all the disks in the stripe set
can be accessed in parallel.
* Read and write performance for small random access transfers is about
twice as good as for separate disks, whilst over-all read performance can be
three or more times that of a single drive.
* Disk activity will, on the average, be balanced across all the disks in
the stripe set so no one disk will become a "hot spot" with much more
activity than the others. This improves
performance on the average for all users.

Drawbacks

* Cost is high due to the mirroring, 50% of the hardware is redundant.
However disks are no longer expensive items.
* You cannot add "Another 18GB" without buying two 18GB disks, seems a
little odd to many people.

Note: If you are contemplating RAID 1, and wish to have multiple disks, RAID
10 will give you a performance boost over two "straight" RAID1 mirrored
implementations.

Implementation types

There are four common RAID implementations: in the operating system, in the
disk controller, via a dedicated RAID controller hardware or in a separate
subsystem. Separate disk subsystems are discussed later in this document,
but summarized here for the sake of clarity.

A RAID implementation that is part of the operating system is cheap - you
don't have to pay many dollars for it (or not much). It isn't worth much
either and should be avoided wherever possible, the additional load put on
the server, and the lack of hardware level control makes this
very undesirable.

RAID implementation in a disk controller uses less processor power and
system resources than an operating system based solution. Usually, special
device drivers are required for these disk controllers. These types of
systems are inexpensive but provide very limited fault-tolerance. Failed
disks cannot be replaced while the system is operational. This is however;
better than no RAID at all.

RAID implementation via a dedicated RAID controller card, combined with
hot-swappable disks. These are finding their way onto even entry-level
servers, and give the advantage of not having to take the machine off-line
to replace a disk; they are a very cost effective method of
adding RAID to a server.

A RAID implementation that is a separate subsystem from the computer is
best. It can have its own power supplies, battery backup, spare controllers,
hot-standby disks, and other capabilities all of which are independent of
the computer. Failure of a component in the computer should
not affect the disk subsystem and vice versa. An excellent example of this
type of system is Data General's Clariion disk array. Many other vendors
also provide these types of systems. This is discussed in more detail later
in this document.

RAID Recap

RAID 0, or disk striping, is a simple disk striping strategy. Data is
written evenly across all disks in the array providing optimal read and
write performance as operations are performed across multiple disks. A major
disadvantage of RAID 0 is that it does not provide any fault tolerance and
all data is lost if one disk in the array fails. RAID 0 offers high
performance READ and WRITES, but has no fault tolerance. This type of array
is suitable for often accessed temporary files. We do not advise that RAID 0
is used to store important data. This is not technically RAID, as there is
no redundancy.

RAID 1, or mirroring, provides excellent fault tolerance as data is written
to both disks on a mirrored pair and each is a working copy of the data.
Writing to a mirrored pair takes no longer than writing to a single disk and
good disk controllers can read data almost twice as fast by
reading from both disks in parallel. However, RAID 1 implementations result
in a loss of 50% of disk space that you have purchased, as data is written
twice.

RAID 5, or disk stripping with parity, extends the concept of disk stripping
and uses parity to provide fault tolerance at the cost of additional reads
and writes when data blocks are updated. Data redundancy is provided by the
parity information. Although RAID 5 write performance is
substantially slower than disk striping without parity (RAID 0), striping
with parity generally offers similar READ performance to disk mirroring
(RAID 1). Disk space is needed for the parity information, rendering some
portion of disk space unusable for data. However, this loss is less than
that of RAID 1 implementations. This is a RAID level that is promoted by
many hardware venders as it is less costly that RAID1/10 whilst still
offering high levels of data security.

RAID 10 (RAID 1 + RAID 0, or RAID0+1/RAID1+0)) is also known as mirroring
with striping. This level uses a striped array of disks, which are then
mirrored to another identical set of striped disks. For example, a striped
array can be created using three disks. The striped array of disks
is then mirrored using another set of three striped disks. RAID 10 provides
the performance benefits of disk striping with the disk redundancy of
mirroring, without the performance issues associated with parity
maintenance. RAID 10 provides the highest read/write performance of any of
the RAID levels at the expense of using twice as many disks. This is advised
for high security and high performance situations.

External Disk Systems. This includes such esoteric systems as the Clariion
disk array. This is a separate subsystem that is not a physical part of the
your server. They are very fast, very secure, and often highly expensive.
They are currently the ultimate in speed and safety, and
highly recommended for enterprise level servers

With redundancy, RAID 1, 5 and 10 implementations allow for the server to
remain operational even if a drive fails. Some drive arrays also allow for
repair and rebuild of the failed drive while the server continues
uninterrupted operations, often automatically, other systems (usually less
expensive ones) demand that the server is taken off line whilst this rebuild
is accomplished. Again, these issues are complex, and cannot be thoroughly
covered by this document; they should be discussed with your hardware vendor
and implementation team.

Hardware-Based Disk Cache

Systems that contain a large READ disk cache within the controller can
dramatically improve the performance of the application server. This option
is relatively inexpensive when purchased with the overall system and
therefore is recommended, particularly for larger implementations. WRITE
caches can cause integrity issues under extremely rare, and complex
circumstances. Most caches can be configured easily enough to avoid this
rare issue by turning off the write cache.