RAIDCTL(8) | System Manager's Manual | RAIDCTL(8) |
raidctl
—
raidctl |
[-v ] -A [yes | no |
forceroot | softroot] dev |
raidctl |
[-v ] -a
component dev |
raidctl |
[-v ] -B
dev |
raidctl |
[-v ] -C
config_file dev |
raidctl |
[-v ] -c
config_file dev |
raidctl |
[-v ] -F
component dev |
raidctl |
[-v ] -f
component dev |
raidctl |
[-v ] -G
dev |
raidctl |
[-v ] -g
component dev |
raidctl |
[-v ] -I
serial_number dev |
raidctl |
[-v ] -i
dev |
raidctl |
[-v ] -M [yes | no | set
params] dev |
raidctl |
[-v ] -m
dev |
raidctl |
[-v ] -P
dev |
raidctl |
[-v ] -p
dev |
raidctl |
[-v ] -R
component dev |
raidctl |
[-v ] -r
component dev |
raidctl |
[-v ] -S
dev |
raidctl |
[-v ] -s
dev |
raidctl |
[-v ] -U
unit dev |
raidctl |
[-v ] -u
dev |
raidctl
is the user-land control program for
raid(4), the RAIDframe disk
device. raidctl
is primarily used to dynamically
configure and unconfigure RAIDframe disk devices. For more information about
the RAIDframe disk device, see
raid(4).
This document assumes the reader has at least rudimentary knowledge of RAID and RAID concepts.
The command-line options for raidctl
are
as follows:
-A
yes
devRAID
in the disklabel.-A
no
dev-A
forceroot
devRAID
in the disklabel. Note that only certain
architectures (currently alpha, amd64, i386, pmax, sandpoint, sparc,
sparc64, and vax) support booting a kernel directly from a RAID set.
Please note that forceroot
mode was referred to as
root
mode on earlier versions of
NetBSD. For compatibility reasons,
root
can be used as an alias for
forceroot
.-A
softroot
devforceroot
, but only change the root device if
the boot device is part of the RAID set.-a
component dev-B
dev-C
config_file dev-c
, but forces the configuration to take
place. Fatal errors due to uninitialized components are ignored. This is
required the first time a RAID set is configured.-c
config_file dev-F
component dev-f
component dev-G
dev-c
or -C
options.-g
component dev-I
serial_number dev-i
dev-M
yes
dev-M
no
dev-M
set
cooldown
tickms regions
dev-m
dev-v
then the current contents of the
parity map will be output (in hexadecimal format) as well.-P
dev-p
dev-R
component dev-r
component dev-S
dev-s
dev-U
unit devlast_unit
field in all the raid
components, so that the next time the raid will be autoconfigured it uses
that unit.-u
dev-v
The device used by raidctl
is specified by
dev. dev may be either the full
name of the device, e.g., /dev/rraid0d, for the i386
architecture, or /dev/rraid0c for many others, or
just simply raid0 (for
/dev/rraid0[cd]). It is recommended that the
partitions used to represent the RAID device are not used for file
systems.
There are 4 required sections of a configuration file, and 2 optional sections. Each section begins with a ‘START’, followed by the section name, and the configuration parameters associated with that section. The first section is the ‘array’ section, and it specifies the number of columns, and spare disks in the RAID set. For example:
START array 3 0
indicates an array with 3 columns, and 0 spare disks. Old configurations specified a 3rd value in front of the number of columns and spare disks. This old value, if provided, must be specified as 1:
START array 1 3 0
The second section, the ‘disks’ section, specifies the actual components of the device. For example:
START disks /dev/sd0e /dev/sd1e /dev/sd2e
specifies the three component disks to be used in the RAID device.
Disk wedges may also be specified with the NAME=<wedge name> syntax.
If any of the specified drives cannot be found when the RAID device is
configured, then they will be marked as ‘failed’, and the
system will operate in degraded mode. Note that it is
imperative that the order of the components in the
configuration file does not change between configurations of a RAID device.
Changing the order of the components will result in data loss if the set is
configured with the -C
option. In normal
circumstances, the RAID set will not configure if only
-c
is specified, and the components are
out-of-order.
The next section, which is the ‘spare’ section, is optional, and, if present, specifies the devices to be used as ‘hot spares’ — devices which are on-line, but are not actively used by the RAID driver unless one of the main components fail. A simple ‘spare’ section might be:
START spare /dev/sd3e
for a configuration with a single spare component. If no spare drives are to be used in the configuration, then the ‘spare’ section may be omitted.
The next section is the ‘layout’ section. This section describes the general layout parameters for the RAID device, and provides such information as sectors per stripe unit, stripe units per parity unit, stripe units per reconstruction unit, and the parity configuration to use. This section might look like:
START layout # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level 32 1 1 5
The sectors per stripe unit specifies, in blocks, the interleave factor; i.e., the number of contiguous sectors to be written to each component for a single stripe. Appropriate selection of this value (32 in this example) is the subject of much research in RAID architectures. The stripe units per parity unit and stripe units per reconstruction unit are normally each set to 1. While certain values above 1 are permitted, a discussion of valid values and the consequences of using anything other than 1 are outside the scope of this document. The last value in this section (5 in this example) indicates the parity configuration desired. Valid entries include:
There are other valid entries here, including those for Even-Odd parity, RAID level 5 with rotated sparing, Chained declustering, and Interleaved declustering, but as of this writing the code for those parity operations has not been tested with NetBSD.
The next required section is the ‘queue’ section. This is most often specified as:
START queue fifo 100
where the queuing method is specified as fifo (first-in, first-out), and the size of the per-component queue is limited to 100 requests. Other queuing methods may also be specified, but a discussion of them is beyond the scope of this document.
The final section, the ‘debug’ section, is optional. For more details on this the reader is referred to the RAIDframe documentation discussed in the HISTORY section.
See EXAMPLES for a more complete configuration file example.
raid
device special files.raidctl
, and that they understand how the component
reconstruction process works. The examples in this section will focus on
configuring a number of different RAID sets of varying degrees of redundancy.
By working through these examples, administrators should be able to develop a
good feel for how to configure a RAID set, and how to initiate reconstruction
of failed components.
In the following examples ‘raid0’ will be used to denote the RAID device. Depending on the architecture, /dev/rraid0c or /dev/rraid0d may be used in place of raid0.
FS_RAID
, and
a typical disklabel entry for a RAID component might look like:
f: 1800000 200495 RAID # (Cyl. 405*- 4041*)
While FS_BSDFFS
will also work as the
component type, the type FS_RAID
is preferred for
RAIDframe use, as it is required for features such as auto-configuration. As
part of the initial configuration of each RAID set, each component will be
given a ‘component label’. A ‘component label’
contains important information about the component, including a
user-specified serial number, the column of that component in the RAID set,
the redundancy level of the RAID set, a ‘modification
counter’, and whether the parity information (if any) on that
component is known to be correct. Component labels are an integral part of
the RAID set, since they are used to ensure that components are configured
in the correct order, and used to keep track of other vital information
about the RAID set. Component labels are also required for the
auto-detection and auto-configuration of RAID sets at boot time. For a
component label to be considered valid, that particular component label must
be in agreement with the other component labels in the set. For example, the
serial number, ‘modification counter’, and number of columns
must all be in agreement. If any of these are different, then the component
is not considered to be part of the set. See
raid(4) for more information
about component labels.
Once the components have been identified, and the disks have
appropriate labels, raidctl
is then used to
configure the raid(4) device. To
configure the device, a configuration file which looks something like:
START array # numCol numSpare 3 1 START disks /dev/sd1e /dev/sd2e /dev/sd3e START spare /dev/sd4e START layout # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5 32 1 1 5 START queue fifo 100
is created in a file. The above configuration file specifies a RAID 5 set consisting of the components /dev/sd1e, /dev/sd2e, and /dev/sd3e, with /dev/sd4e available as a ‘hot spare’ in case one of the three main drives should fail. A RAID 0 set would be specified in a similar way:
START array # numCol numSpare 4 0 START disks /dev/sd10e /dev/sd11e /dev/sd12e /dev/sd13e START layout # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0 64 1 1 0 START queue fifo 100
In this case, devices /dev/sd10e, /dev/sd11e, /dev/sd12e, and /dev/sd13e are the components that make up this RAID set. Note that there are no hot spares for a RAID 0 set, since there is no way to recover data if any of the components fail.
For a RAID 1 (mirror) set, the following configuration might be used:
START array # numCol numSpare 2 0 START disks /dev/sd20e /dev/sd21e START layout # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1 128 1 1 1 START queue fifo 100
In this case, /dev/sd20e and /dev/sd21e are the two components of the mirror set. While no hot spares have been specified in this configuration, they easily could be, just as they were specified in the RAID 5 case above. Note as well that RAID 1 sets are currently limited to only 2 components. At present, n-way mirroring is not possible.
The first time a RAID set is configured, the
-C
option must be used:
raidctl -C raid0.conf raid0
where raid0.conf is the name of the RAID
configuration file. The -C
forces the configuration
to succeed, even if any of the component labels are incorrect. The
-C
option should not be used lightly in situations
other than initial configurations, as if the system is refusing to configure
a RAID set, there is probably a very good reason for it. After the initial
configuration is done (and appropriate component labels are added with the
-I
option) then raid0 can be configured normally
with:
raidctl -c raid0.conf raid0
When the RAID set is configured for the first time, it is necessary to initialize the component labels, and to initialize the parity on the RAID set. Initializing the component labels is done with:
raidctl -I 112341 raid0
where ‘112341’ is a user-specified serial number for the RAID set. This initialization step is required for all RAID sets. As well, using different serial numbers between RAID sets is strongly encouraged, as using the same serial number for all RAID sets will only serve to decrease the usefulness of the component label checking.
Initializing the RAID set is done via the
-i
option. This initialization
MUST be done for all RAID sets, since
among other things it verifies that the parity (if any) on the RAID set is
correct. Since this initialization may be quite time-consuming, the
-v
option may be also used in conjunction with
-i
:
raidctl -iv raid0
This will give more verbose output on the status of the initialization:
Initiating re-write of parity Parity Re-write status: 10% |**** | ETA: 06:03 /
The output provides a ‘Percent Complete’ in both a numeric and graphical format, as well as an estimated time to completion of the operation.
Since it is the parity that provides the ‘redundancy’ part of RAID, it is critical that the parity is correct as much as possible. If the parity is not correct, then there is no guarantee that data will not be lost if a component fails.
Once the parity is known to be correct, it is then safe to perform disklabel(8), newfs(8), or fsck(8) on the device or its file systems, and then to mount the file systems for use.
Under certain circumstances (e.g., the additional component has not arrived, or data is being migrated off of a disk destined to become a component) it may be desirable to configure a RAID 1 set with only a single component. This can be achieved by using the word “absent” to indicate that a particular component is not present. In the following:
START array # numCol numSpare 2 0 START disks absent /dev/sd0e START layout # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_1 128 1 1 1 START queue fifo 100
/dev/sd0e is the real component, and will
be the second disk of a RAID 1 set. The first component is simply marked as
being absent. Configuration (using -C
and
-I
12345 as above) proceeds
normally, but initialization of the RAID set will have to wait until all
physical components are present. After configuration, this set can be used
normally, but will be operating in degraded mode. Once a second physical
component is obtained, it can be hot-added, the existing data mirrored, and
normal operation resumed.
The size of the resulting RAID set will depend on the number of data components in the set. Space is automatically reserved for the component labels, and the actual amount of space used for data on a component will be rounded down to the largest possible multiple of the sectors per stripe unit (sectPerSU) value. Thus, the amount of space provided by the RAID set will be less than the sum of the size of the components.
raidctl -p raid0
can be used to check the current status of the parity. To check the parity and rebuild it necessary (for example, after an unclean shutdown) the command:
raidctl -P raid0
is used. Note that re-writing the parity can be done while other operations on the RAID set are taking place (e.g., while doing a fsck(8) on a file system on the RAID set). However: for maximum effectiveness of the RAID set, the parity should be known to be correct before any data on the set is modified.
To see how the RAID set is doing, the following command can be used to show the RAID set's status:
raidctl -s raid0
The output will look something like:
Components: /dev/sd1e: optimal /dev/sd2e: optimal /dev/sd3e: optimal Spares: /dev/sd4e: spare Component label for /dev/sd1e: Row: 0 Column: 0 Num Rows: 1 Num Columns: 3 Version: 2 Serial Number: 13432 Mod Counter: 65 Clean: No Status: 0 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 RAID Level: 5 blocksize: 512 numBlocks: 1799936 Autoconfig: No Last configured as: raid0 Component label for /dev/sd2e: Row: 0 Column: 1 Num Rows: 1 Num Columns: 3 Version: 2 Serial Number: 13432 Mod Counter: 65 Clean: No Status: 0 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 RAID Level: 5 blocksize: 512 numBlocks: 1799936 Autoconfig: No Last configured as: raid0 Component label for /dev/sd3e: Row: 0 Column: 2 Num Rows: 1 Num Columns: 3 Version: 2 Serial Number: 13432 Mod Counter: 65 Clean: No Status: 0 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 RAID Level: 5 blocksize: 512 numBlocks: 1799936 Autoconfig: No Last configured as: raid0 Parity status: clean Reconstruction is 100% complete. Parity Re-write is 100% complete. Copyback is 100% complete.
This indicates that all is well with the RAID set. Of importance here are the component lines which read ‘optimal’, and the ‘Parity status’ line. ‘Parity status: clean’ indicates that the parity is up-to-date for this RAID set, whether or not the RAID set is in redundant or degraded mode. ‘Parity status: DIRTY’ indicates that it is not known if the parity information is consistent with the data, and that the parity information needs to be checked. Note that if there are file systems open on the RAID set, the individual components will not be ‘clean’ but the set as a whole can still be clean.
To check the component label of /dev/sd1e, the following is used:
raidctl -g /dev/sd1e raid0
The output of this command will look something like:
Component label for /dev/sd1e: Row: 0 Column: 0 Num Rows: 1 Num Columns: 3 Version: 2 Serial Number: 13432 Mod Counter: 65 Clean: No Status: 0 sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1 RAID Level: 5 blocksize: 512 numBlocks: 1799936 Autoconfig: No Last configured as: raid0
raidctl -f /dev/sd2e raid0
The system will then be performing all operations in degraded mode, where missing data is re-computed from existing data and the parity. In this case, obtaining the status of raid0 will return (in part):
Components: /dev/sd1e: optimal /dev/sd2e: failed /dev/sd3e: optimal Spares: /dev/sd4e: spare
Note that with the use of -f
a
reconstruction has not been started. To both fail the disk and start a
reconstruction, the -F
option must be used:
raidctl -F /dev/sd2e raid0
The -f
option may be used first, and then
the -F
option used later, on the same disk, if
desired. Immediately after the reconstruction is started, the status will
report:
Components: /dev/sd1e: optimal /dev/sd2e: reconstructing /dev/sd3e: optimal Spares: /dev/sd4e: used_spare [...] Parity status: clean Reconstruction is 10% complete. Parity Re-write is 100% complete. Copyback is 100% complete.
This indicates that a reconstruction is in progress. To find out
how the reconstruction is progressing the -S
option
may be used. This will indicate the progress in terms of the percentage of
the reconstruction that is completed. When the reconstruction is finished
the -s
option will show:
Components: /dev/sd1e: optimal /dev/sd2e: spared /dev/sd3e: optimal Spares: /dev/sd4e: used_spare [...] Parity status: clean Reconstruction is 100% complete. Parity Re-write is 100% complete. Copyback is 100% complete.
At this point there are at least two options. First, if
/dev/sd2e is known to be good (i.e., the failure was
either caused by -f
or -F
,
or the failed disk was replaced), then a copyback of the data can be
initiated with the -B
option. In this example, this
would copy the entire contents of /dev/sd4e to
/dev/sd2e. Once the copyback procedure is complete,
the status of the device would be (in part):
Components: /dev/sd1e: optimal /dev/sd2e: optimal /dev/sd3e: optimal Spares: /dev/sd4e: spare
and the system is back to normal operation.
The second option after the reconstruction is to simply use /dev/sd4e in place of /dev/sd2e in the configuration file. For example, the configuration file (in part) might now look like:
START array 3 0 START disks /dev/sd1e /dev/sd4e /dev/sd3e
This can be done as /dev/sd4e is completely interchangeable with /dev/sd2e at this point. Note that extreme care must be taken when changing the order of the drives in a configuration. This is one of the few instances where the devices and/or their orderings can be changed without loss of data! In general, the ordering of components in a configuration file should never be changed.
If a component fails and there are no hot spares available on-line, the status of the RAID set might (in part) look like:
Components: /dev/sd1e: optimal /dev/sd2e: failed /dev/sd3e: optimal No spares.
In this case there are a number of options. The first option is to add a hot spare using:
raidctl -a /dev/sd4e raid0
After the hot add, the status would then be:
Components: /dev/sd1e: optimal /dev/sd2e: failed /dev/sd3e: optimal Spares: /dev/sd4e: spare
Reconstruction could then take place using
-F
as describe above.
A second option is to rebuild directly onto /dev/sd2e. Once the disk containing /dev/sd2e has been replaced, one can simply use:
raidctl -R /dev/sd2e raid0
to rebuild the /dev/sd2e component. As the rebuilding is in progress, the status will be:
Components: /dev/sd1e: optimal /dev/sd2e: reconstructing /dev/sd3e: optimal No spares.
and when completed, will be:
Components: /dev/sd1e: optimal /dev/sd2e: optimal /dev/sd3e: optimal No spares.
In circumstances where a particular component is completely unavailable after a reboot, a special component name will be used to indicate the missing component. For example:
Components: /dev/sd2e: optimal component1: failed No spares.
indicates that the second component of this RAID set was not detected at all by the auto-configuration code. The name ‘component1’ can be used anywhere a normal component name would be used. For example, to add a hot spare to the above set, and rebuild to that hot spare, the following could be done:
raidctl -a /dev/sd3e raid0 raidctl -F component1 raid0
at which point the data missing from ‘component1’ would be reconstructed onto /dev/sd3e.
When more than one component is marked as ‘failed’
due to a non-component hardware failure (e.g., loss of power to two
components, adapter problems, termination problems, or cabling issues) it is
quite possible to recover the data on the RAID set. The first thing to be
aware of is that the first disk to fail will almost certainly be out-of-sync
with the remainder of the array. If any IO was performed between the time
the first component is considered ‘failed’ and when the second
component is considered ‘failed’, then the first component to
fail will not contain correct data, and should be ignored.
When the second component is marked as failed, however, the RAID device will
(currently) panic the system. At this point the data on the RAID set (not
including the first failed component) is still self consistent, and will be
in no worse state of repair than had the power gone out in the middle of a
write to a file system on a non-RAID device. The problem, however, is that
the component labels may now have 3 different ‘modification
counters’ (one value on the first component that failed, one value on
the second component that failed, and a third value on the remaining
components). In such a situation, the RAID set will not autoconfigure, and
can only be forcibly re-configured with the -C
option. To recover the RAID set, one must first remedy whatever physical
problem caused the multiple-component failure. After that is done, the RAID
set can be restored by forcibly configuring the raid set
without the component that failed first. For example, if
/dev/sd1e and /dev/sd2e fail
(in that order) in a RAID set of the following configuration:
START array 4 0 START disks /dev/sd1e /dev/sd2e /dev/sd3e /dev/sd4e START layout # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5 64 1 1 5 START queue fifo 100
then the following configuration (say "recover_raid0.conf")
START array 4 0 START disks absent /dev/sd2e /dev/sd3e /dev/sd4e START layout # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_5 64 1 1 5 START queue fifo 100
can be used with
raidctl -C recover_raid0.conf raid0
to force the configuration of raid0. A
raidctl -I 12345 raid0
will be required in order to synchronize the component labels. At this point the file systems on the RAID set can then be checked and corrected. To complete the re-construction of the RAID set, /dev/sd1e is simply hot-added back into the array, and reconstructed as described earlier.
START array # numCol numSpare 4 0 START disks /dev/raid1e /dev/raid2e /dev/raid3e /dev/raid4e START layout # sectPerSU SUsPerParityUnit SUsPerReconUnit RAID_level_0 128 1 1 0 START queue fifo 100
A similar configuration file might be used for a RAID 0 set constructed from components on RAID 1 sets. In such a configuration, the mirroring provides a high degree of redundancy, while the striping provides additional speed benefits.
raidctl -A yes raid0
to turn on auto-configuration for that set. To turn off auto-configuration, use:
raidctl -A no raid0
RAID sets which are auto-configurable will be configured before the root file system is mounted. These RAID sets are thus available for use as a root file system, or for any other file system. A primary advantage of using the auto-configuration is that RAID components become more independent of the disks they reside on. For example, SCSI ID's can change, but auto-configured sets will always be configured correctly, even if the SCSI ID's of the component disks have become scrambled.
Having a system's root file system (/) on a RAID set is also allowed, with the ‘a’ partition of such a RAID set being used for /. To use raid0a as the root file system, simply use:
raidctl -A forceroot raid0
To return raid0a to be just an auto-configuring set simply use the
-A
yes arguments.
Note that kernels can only be directly read from RAID 1 components
on architectures that support that (currently alpha, i386, pmax, sandpoint,
sparc, sparc64, and vax). On those architectures, the
FS_RAID
file system is recognized by the bootblocks,
and will properly load the kernel directly from a RAID 1 component. For
other architectures, or to support the root file system on other RAID sets,
some other mechanism must be used to get a kernel booting. For example, a
small partition containing only the secondary boot-blocks and an alternate
kernel (or two) could be used. Once a kernel is booting however, and an
auto-configuring RAID set is found that is eligible to be root, then that
RAID set will be auto-configured and used as the root device. If two or more
RAID sets claim to be root devices, then the user will be prompted to select
the root device. At this time, RAID 0, 1, 4, and 5 sets are all supported as
root devices.
A typical RAID 1 setup with root on RAID might be as follows:
RAID sets raid0, raid1, and raid2 are all marked as auto-configurable. raid0 is marked as being a root file system. When new kernels are installed, the kernel is not only copied to /, but also to wd0a and wd1a. The kernel on wd0a is required, since that is the kernel the system boots from. The kernel on wd1a is also required, since that will be the kernel used should wd0 fail. The important point here is to have redundant copies of the kernel available, in the event that one of the drives fail.
There is no requirement that the root file system be on the same disk as the kernel. For example, obtaining the kernel from wd0a, and using sd0e and sd1e for raid0, and the root file system, is fine. It is critical, however, that there be multiple kernels available, in the event of media failure.
Multi-layered RAID devices (such as a RAID 0 set made up of RAID 1 sets) are not supported as root devices or auto-configurable devices at this point. (Multi-layered RAID devices are supported in general, however, as mentioned earlier.) Note that in order to enable component auto-detection and auto-configuration of RAID devices, the line:
options RAID_AUTOCONFIG
must be in the kernel configuration file. See raid(4) for more details.
swapoff=YES
be added to /etc/rc.conf.
raidctl
is to
unconfigure a raid(4) device. This
is accomplished via a simple:
raidctl -u raid0
at which point the device is ready to be reconfigured.
As with most performance tuning, benchmarking under real-life loads may be the only way to measure expected performance. Understanding some of the underlying technology is also useful in tuning. The goal of this section is to provide pointers to those parameters which may make significant differences in performance.
For a RAID 1 set, a SectPerSU value of 64 or 128 is typically sufficient. Since data in a RAID 1 set is arranged in a linear fashion on each component, selecting an appropriate stripe size is somewhat less critical than it is for a RAID 5 set. However: a stripe size that is too small will cause large IO's to be broken up into a number of smaller ones, hurting performance. At the same time, a large stripe size may cause problems with concurrent accesses to stripes, which may also affect performance. Thus values in the range of 32 to 128 are often the most effective.
Tuning RAID 5 sets is trickier. In the best case, IO is presented to the RAID set one stripe at a time. Since the entire stripe is available at the beginning of the IO, the parity of that stripe can be calculated before the stripe is written, and then the stripe data and parity can be written in parallel. When the amount of data being written is less than a full stripe worth, the ‘small write’ problem occurs. Since a ‘small write’ means only a portion of the stripe on the components is going to change, the data (and parity) on the components must be updated slightly differently. First, the ‘old parity’ and ‘old data’ must be read from the components. Then the new parity is constructed, using the new data to be written, and the old data and old parity. Finally, the new data and new parity are written. All this extra data shuffling results in a serious loss of performance, and is typically 2 to 4 times slower than a full stripe write (or read). To combat this problem in the real world, it may be useful to ensure that stripe sizes are small enough that a ‘large IO’ from the system will use exactly one large stripe write. As is seen later, there are some file system dependencies which may come into play here as well.
Since the size of a ‘large IO’ is often (currently) only 32K or 64K, on a 5-drive RAID 5 set it may be desirable to select a SectPerSU value of 16 blocks (8K) or 32 blocks (16K). Since there are 4 data sectors per stripe, the maximum data per stripe is 64 blocks (32K) or 128 blocks (64K). Again, empirical measurement will provide the best indicators of which values will yield better performance.
The parameters used for the file system are also critical to good performance. For newfs(8), for example, increasing the block size to 32K or 64K may improve performance dramatically. As well, changing the cylinders-per-group parameter from 16 to 32 or higher is often not only necessary for larger file systems, but may also have positive performance implications.
raidctl -C raid0.conf raid0
raidctl -I 123456 raid0
raidctl -i raid0
disklabel raid0 > /tmp/label
vi /tmp/label
disklabel -R -r raid0 /tmp/label
newfs /dev/rraid0e
mount /dev/raid0e /mnt
raidctl -c raid0.conf raid0
To re-configure the RAID set the next time it is needed, or put raid0.conf into /etc where it will automatically be started by the /etc/rc.d scripts.
The raidctl
command first appeared as a
program in CMU's RAIDframe v1.1 distribution. This version of
raidctl
is a complete re-write, and first appeared
in NetBSD 1.4.
The RAIDframe Copyright is as follows: Copyright (c) 1994-1996 Carnegie-Mellon University. All rights reserved. Permission to use, copy, modify and distribute this software and its documentation is hereby granted, provided that both the copyright notice and this permission notice appear in all copies of the software, derivative works or modified versions, and any portions thereof, and that both notices appear in supporting documentation. CARNEGIE MELLON ALLOWS FREE USE OF THIS SOFTWARE IN ITS "AS IS" CONDITION. CARNEGIE MELLON DISCLAIMS ANY LIABILITY OF ANY KIND FOR ANY DAMAGES WHATSOEVER RESULTING FROM THE USE OF THIS SOFTWARE. Carnegie Mellon requests users of this software to return to Software Distribution Coordinator or Software.Distribution@CS.CMU.EDU School of Computer Science Carnegie Mellon University Pittsburgh PA 15213-3890 any improvements or extensions that they make and grant Carnegie the rights to redistribute these changes.
Recomputation of parity MUST be performed whenever there is a chance that it may have been compromised. This includes after system crashes, or before a RAID device has been used for the first time. Failure to keep parity correct will be catastrophic should a component ever fail — it is better to use RAID 0 and get the additional space and speed, than it is to use parity, but not keep the parity correct. At least with RAID 0 there is no perception of increased data security.
When replacing a failed component of a RAID set, it is a good idea to zero out the first 64 blocks of the new component to insure the RAIDframe driver doesn't erroneously detect a component label in the new component. This is particularly true on RAID 1 sets because there is at most one correct component label in a failed RAID 1 installation, and the RAIDframe driver picks the component label with the highest serial number and modification value as the authoritative source for the failed RAID set when choosing which component label to use to configure the RAID set.
January 6, 2016 | NetBSD 9.4 |