by Antoine - categories : OS
You might have heard about ZFS over last years and its miraculous virtues. For licensing reasons (CDDL remains incompatible with GNU GPL), ZFS is not available for GNU/Linux yet. Actually it is, but via the Fuse way in userspace, whatever, not that great. So I took a look at OpenSolaris, the open source UNIX from Sun Microsystems, where ZFS is totally integrated, so that it is the default file system, replacing UFS after years of loyal service.
A few days later, I switched my filer from Debian to OpenSolaris.
Zettabyte File System is a revolutionary and unique FS developed by SUN, very different from others FS because of the way it works, its countless integrated functions, and its simplicity of administration.
The storage pool notion is one of the mains ZFS characteristics.
Classic FS are set on a single physical volume (hard drive) and mapped onto : - partitions (ex : FS /boot on the /dev/sda1 partition) - or onto logical volumes, grouped onto logical volume groups (pools equivalents), themselves made of several hard drives (ex : FS /home onto /dev/mapper/vg0-home logical volume, this latter being part of /dev/vg0 volume group)
This last case needs a logical volume manager (LVM) in order to manage more than one hard drive at once, brought together in those volume groups we create. With its storage pools, ZFS sightly uses the same concept as the volume groups, the (big) difference is that there is no more logical volumes on the above layer.
So to summarize, we have :
- layer 1 : physical volumess (hard drives; ex : /dev/rdsk/c4d1)
- layer 2 : pools (ex : rpool)
- layer 3 : datasets (or filesystems; ex : rpool/export)
The biggest change comes with the last layer : we do NOT define the FS/datasets size like we do with logical volumes. Datasets share themselves the full available space onto the pool : say goodbye to partition resizing operations and to useless disk space loss Not to mention the predefined space that can be established for each FS, preventing them to fill the whole pool to the detriment of other FS (a QUOTA), or to avoid them from being strapped for space (a RESERVATION). But the most part of the time, there is no reason to predefine each FS' size with ZFS, it is designed to work this way.
Moreover, the free space on a pool can be easily raised when a hard drive's added on it : in single command line it can be added to the pool which is instantly credited of this new capacity. Simple.
This kind of operation would be by far more tedious on a classic raid + LVM configuration. Administrating ZFS is somewhat smoother and by far less complex. Manipulating FS and manage them is really simpler, there is no need to get through tens of commands or to edit configuration files.
As said above, you cannot predefine the size of dataset when you create it, but you can set a limitation so it won't be able to take too much space on the pool. This is called a quota. For example, set a 10 GB quota to a dataset and it won't use more than 10 GB in its pool. You can set a quota size that is greater than the pool itself : there is obviously no point to do that unless you raise the pool capacity later by adding a new drive. In the other side, you can reserve space on the pool for a dataset. Example: a 10 GB reservation for "mydata" dataset will ensure that there will always be 10 GB dedicated to it on the pool. If "mydata" dataset is filled with 4 GB of data and there only is 6 GB left on the pool, others datasets won't be able to use this reserved space.
ZFS gives comes with the possibility to enable a GZip compression onto a dataset. Only datas written after the feature has been activated will be compressed. Old data won't, and will remain uncompressed until they change.
ZFS offers many data redundancy levels : - Stripped is the classical RAID0 equivalent, and the default RAID level when a multi hard drive pool ils created. No really a redundancy as drives aren't replicated but concatenated. - Mirrored is the RAID1 equivalent where each drive is the identical replica of the others. This mode offers the best security, but of course in the other hand you lose the capacity of each replica drives. - RaidZ1, the RAID5 equivalent, is a simple parity redundancy mode shared between each drives (3 minimum required). This mode has a one-drive-failure tolerancy, so it gives the best security/capacity ratio. - RaidZ2, the RAID6 equivalent, is pretty much identical to the RaidZ1 exept that it has double parity (4 drives required for this RAID level), so it offers the best a two-drives-failure tolerance. RaidZ is not just classical RAID renamed, it comes with some features related with data integrity and security like the write hole correction Moreover, you can use particular redundancy schemes lik RAID10 that is a Mirroring of multi drives concatenations (RAID 1 + 0).
Self-healing is one of the bests features related to data security offered by ZFS in case of a drive failure or data corruption onto one severals drives of a Mirror or Raidz pool.
Each block has a 256-bits checksum in order to detect any data corruption. If an error is detected on a drive, and if a valid copy of this file exists onto a redundant drive, the pool will repair itself and restore the drive from one of the healthy ones. Same if a drive's dead, the user only need to replace it with a new one. ZFS will detect it and rebuild the pool from a healthy replica. This is called a resilvering and it's completely transparent. ZFS also comes with a scrubbing feature, which allows to compare the checksums between each redundant drives of a pool.