Understanding RAID: Types and Configuration Steps
Day 7: Learning RAID Setup on Linux
What is the RAID ?
RAID (Redundant Array of Independent Disks) is like having backup copies of your important files stored in different places on several hard drives or solid-state drives (SSDs).
If one drive stops working, your data is still safe because you have other copies stored on the other drives.
It’s like having a safety net to protect your files from being lost if one of your drives breaks down.
RAID (Redundant Array of Independent Disks) in a Database Management System (DBMS) is a technology that combines multiple physical disk drives into a single logical unit for data storage.
The main purpose of RAID is to improve data reliability, availability, and performance.
There are different levels of RAID, each offering a balance of these benefits.
How RAID Works?
Let us understand How RAID works with an example- Imagine you have a bunch of friends, and you want to keep your favorite book safe. Instead of giving the book to just one friend, you make copies and give a piece to each friend. Now, if one friend loses their piece, you can still put the book together from the other pieces. That’s similar to how RAID works with hard drives. It splits your data across multiple drives, so if one drive fails, your data is still safe on the others. RAID helps keep your information secure, just like spreading your favorite book among friends keeps it safe.
What is a RAID Controller?
A RAID controller is like a boss for your hard drives in a big storage system.
It works between your computer’s operating system and the actual hard drives, organizing them into groups to make them easier to manage.
This helps speed up how fast your computer can read and write data, and it also adds a layer of protection in case one of your hard drives breaks down.
So, it’s like having a smart helper that makes your hard drives work better and keeps your important data safer.
Types of RAID Controller
There are three types of RAID controller:
Hardware Based: In hardware-based RAID, there’s a physical controller that manages the whole array. This controller can handle the whole group of hard drives together. It’s designed to work with different types of hard drives, like SATA (Serial Advanced Technology Attachment) or SCSI (Small Computer System Interface). Sometimes, this controller is built right into the computer’s main board, making it easier to set up and manage your RAID system. It’s like having a captain for your team of hard drives, making sure they work together smoothly**.**
Software Based: In software-based RAID, the controller doesn’t have its own special hardware. So it use computer’s main processor and memory to do its job. It perform the same function as a hardware-based RAID controller, like managing the hard drives and keeping your data safe. But because it’s sharing resources with other programs on your computer, it might not make things run as fast. So, while it’s still helpful, it might not give you as big of a speed boost as a hardware-based RAID system
Firmware Based: Firmware-based RAID controllers are like helpers built into the computer’s main board. They work with the main processor, just like software-based RAID. But they only implement when the computer starts up. Once the operating system is running, a special driver takes over the RAID job. These controllers aren’t as expensive as hardware ones, but they make the computer’s main processor work harder. People also call them hardware-assisted software RAID, hybrid model RAID, or fake RAID
Types of RAID
There are several levels of RAID, each providing a different balance of performance, data redundancy, and storage capacity.
1. RAID 0 (Striping):
Data is split into chunks (stripes) and written across multiple disks.
Increases read/write performance because data is written to multiple disks simultaneously.
No redundancy. If one disk fails, all data is lost.
Use Case: Suitable for tasks where performance is prioritized over data safety, such as video editing or gaming.
Advantages
It is easy to implement.
It utilizes the storage capacity in a better way.
Disadvantages
A single drive loss can result in the complete failure of the system.
It’s not a good choice for a critical system.
2. RAID 1 (Mirroring):
Description: Data is written identically to two or more disks (mirrored). Each disk contains the same data.
Provides redundancy. If one disk fails, the data is still available on the other disk(s).
Reduced storage capacity (50% of total capacity is used for mirroring). No performance gain in writes (but read performance may improve).
Use Case: Ideal for systems where data redundancy is critical, such as databases or mission-critical applications.
Advantages
It covers complete redundancy.
It can increase data security and speed.
Disadvantages
It is highly expensive.
Storage capacity is less
3. RAID-2 (Bit-Level Stripping with Dedicated Parity)
In Raid-2, the error of the data is checked at every bit level. Here, we use Hamming Code Parity Method to find the error in the data.
It uses one designated drive to store parity.
The structure of Raid-2 is very complex as we use two disks in this technique. One word is used to store bits of each word and another word is used to store error code correction.
It is not commonly used.
Advantages
In case of Error Correction, it uses hamming code.
It Uses one designated drive to store parity.
Disadvantages
It has a complex structure and high cost due to extra drive.
It requires an extra drive for error detection.
4. RAID-3 (Byte-Level Stripping with Dedicated Parity)
It consists of byte-level striping with dedicated parity striping.
At this level, we store parity information in a disc section and write to a dedicated parity drive.
Whenever failure of the drive occurs, it helps in accessing the parity drive, through which we can reconstruct the data.
Advantages
Data can be transferred in bulk.
Data can be accessed in parallel.
Disadvantages
It requires an additional drive for parity.
In the case of small-size files, it performs slowly.
RAID-4 (Block-Level Stripping with Dedicated Parity)
- Instead of duplicating data, this adopts a parity-based approach.
Advantages
- It helps in reconstructing the data if at most one data is lost.
Disadvantages
- It can’t help reconstructing data when more than one is lost.
6. RAID-5 (Block-Level Stripping with Distributed Parity)
This is a slight modification of the RAID-4 system where the only difference is that the parity rotates among the drives.
Data and parity information (used for recovery in case of a failure) are striped across three or more disks.
Provides both redundancy and performance. Can tolerate the failure of one disk without losing data.
Slower write performance due to the need to calculate parity. Requires at least three disks.
Often used in servers where a balance of redundancy, storage capacity, and performance is needed.
Advantages
Data can be reconstructed using parity bits.
It makes the performance better.
Disadvantages
Its technology is complex and extra space is required.
If both discs get damaged, data will be lost forever.
Advantages of RAID
Data redundancy: By keeping numerous copies of the data on many disks, RAID can shield data from disk failures.
Performance enhancement: RAID can enhance performance by distributing data over several drives, enabling the simultaneous execution of several read/write operations.
Scalability: RAID is scalable, therefore by adding more disks to the array, the storage capacity may be expanded.
Versatility: RAID is applicable to a wide range of devices, such as workstations, servers, and personal PCs
Disadvantages of RAID
Cost: RAID implementation can be costly, particularly for arrays with large capacities.
Complexity: The setup and management of RAID might be challenging.
Decreased performance: The parity calculations necessary for some RAID configurations, including RAID 5 and RAID 6, may result in a decrease in speed.
Single point of failure: RAID is not a comprehensive backup solution while offering data redundancy. The array’s whole contents could be lost if the RAID controller malfunctions.
Linux RAID (Software RAID)
In Linux, RAID can be managed via mdadm (Multiple Device Admin), the tool used to configure, manage, and monitor software RAID arrays. With software RAID, the RAID array is managed by the operating system rather than a dedicated hardware controller.
How to Configure RAID on CentOS 7/9
When setting up a server for robustness and performance, configuring RAID (Redundant Array of Independent Disks) is a critical step.
RAID combines multiple physical disk drive components into a single logical unit for the purposes of data redundancy, performance improvement, or both.
CentOS 7/9, a popular choice for servers, allows for the configuration of RAID at both the software and hardware levels.
This article will guide you through the process of setting up RAID on a CentOS7/ 9 system.
Step 1: Install mdadm
mdadm is a tool for managing software RAID on Linux. Install it using the following command:
yum install mdadm
Step 2: Partitioning Drives
Partition the drives that will be part of the RAID array. Use fdisk or parted to create partitions of type 'Linux RAID Autodetect'.
fdisk /dev/sdx
Step 3: Creating the RAID Array
Use mdadm to create your RAID array. For example, to create a RAID 1 array with two devices:
mdadm --create --verbose /dev/md0 --level=1 --raid-devices=2 /dev/sdx1 /dev/sdy1
Step 4: Configuring the Filesystem
Once the array is created, you will need to format it with a filesystem, such as ext4:
mkfs.ext4 /dev/md0
Step 5: Mounting the RAID Array
Create a mount point and mount the array:
mkdir /mnt/raid
mount /dev/md0 /mnt/raid
Step 6: Persistent Configuration
To ensure the RAID array is reassembled at boot, add it to /etc/mdadm.conf and update the initramfs:
mdadm --detail --scan >> /etc/mdadm.conf
update-initramfs -u
Then, add the mount point to /etc/fstab to mount it automatically at boot.
Step 7 : Monitoring RAID Health
Regularly check the health of the RAID array using:
mdadm --detail /dev/md0
Conclusion
Setting up RAID on CentOS7/ 9 involves multiple steps from installing necessary tools to creating and mounting the RAID array. By following this guide, you should be able to configure RAID for your system effectively. Remember, RAID is not a backup solution, but it is a powerful tool for preventing downtime and data loss due to disk failures.