You might have heard about ZFS over last years and its miraculous virtues. For licensing reasons (CDDL remains incompatible with GNU GPL), ZFS is not available for GNU/Linux yet. Actually it is, but via the Fuse way in userspace, whatever, not that great. So I took a look at OpenSolaris, the open source UNIX from Sun Microsystems, where ZFS is totally integrated, so that it is the default file system, replacing UFS after years of loyal service.

A few days later, I switched my filer from Debian to OpenSolaris.

Zettabyte File System is a revolutionary and unique FS developed by SUN, very different from others FS because of the way it works, its countless integrated functions, and its simplicity of administration.

Pools

The storage pool notion is one of the mains ZFS characteristics.

Classic FS are set on a single physical volume (hard drive) and mapped onto :

  • partitions (ex : FS /boot on the /dev/sda1 partition)
  • or onto logical volumes, grouped onto logical volume groups (pools equivalents), themselves made of several hard drives (ex : FS /home onto /dev/mapper/vg0-home logical volume, this latter being part of /dev/vg0 volume group)

This last case needs a logical volume manager (LVM) in order to manage more than one hard drive at once, brought together in those volume groups we create. With its storage pools, ZFS sightly uses the same concept as the volume groups, the (big) difference is that there is no more logical volumes on the above layer.

So to summarize, we have :

  • layer 1 : physical volumess (hard drives; ex : /dev/rdsk/c4d1)
  • layer 2 : pools (ex : rpool)
  • layer 3 : datasets (or filesystems; ex : rpool/export)

The biggest change comes with the last layer : we do NOT define the FS/datasets size like we do with logical volumes. Datasets share themselves the full available space onto the pool : say goodbye to partition resizing operations and to useless disk space loss Not to mention the predefined space that can be established for each FS, preventing them to fill the whole pool to the detriment of other FS (a QUOTA), or to avoid them from being strapped for space (a RESERVATION). But the most part of the time, there is no reason to predefine each FS' size with ZFS, it is designed to work this way.

Moreover, the free space on a pool can be easily raised when a hard drive's added on it : in single command line it can be added to the pool which is instantly credited of this new capacity. Simple.

This kind of operation would be by far more tedious on a classic raid + LVM configuration.

Snapshots and clones

ZFS snapshots are read-only point-in-time copies of a filesystem. They consume no extra space initially — only diverging blocks are stored. Creating one is nearly instantaneous, and a dataset can support thousands of them without performance degradation. Clones are writable copies of snapshots, effectively giving you instant, space-efficient forks of any dataset.

RAID-Z

ZFS comes with its own integrated RAID levels — RAID-Z (single parity), RAID-Z2 (double), and RAID-Z3 (triple) — that eliminate the "write hole" problem of traditional RAID. Since ZFS controls both the volume manager and the filesystem, it can guarantee that every block is consistent at all times, with no need for a separate RAID controller.

Data integrity

ZFS stores a checksum of every block alongside the data itself. On every read, the checksum is verified. If a block is corrupted and redundancy is available (mirror or RAID-Z), ZFS automatically repairs it on the fly. No other mainstream filesystem offers end-to-end data integrity like this out of the box.