ZFS Mirror to RAID-Z | Jonathan Perry-Houts
Jonathan Perry-Houts

ZFS Mirror to RAID-Z

Another geeky post here, because these are somehow easier to write than more personal posts. I recently decided to upgrade my network attached storage (NAS) server from two hard drives to four in order to take advantage of the RAID-Z filesystem. I had always meant to do this, but I was only able to afford two hard disks when I built the server, and had just planned on upgrading eventually.

The most immediate benefit of upgrading to RAID-Z over mirrored disks is the ability to get ~2/3 usable space rather than ~1/2, which means that by doubling the number of hard drives, I'll be tripling the amount of usable storage space! (not that I was using more than a quarter of the space I had before, but that's beside the point.)

The challenge here was doing the whole upgrade in-place, because I don't have a fifth hard drive to back everything up to during the swap. I found a nice description of how to do exactly this on Emil Mikulic's blog but his description is based on an old version of FreeBSD, which I think caused a couple of problems trying to use his method on my computer. (My server is running FreeBSD 10.1-RELEASE-p15 for amd64 and I think he was using FreeBSD 8.2-RELEASE).

The gist of his technique is to offline one of the mirrored disks in the original zpool:

zpool offline M31 label/d2.eli

then create a new zpool backed with the recently offlined disk, the two new disks, and a file-backed vnode:

truncate -s 2000398929920 /disk-backing
mdconfig -a -t vnode -S 4096 -f /disk-backing -u 0
zpool create M32 raidz md0 label/d2.eli \
    label/d3.eli label/d4.eli

and finally, offline the file-backed vnode

zpool offline M32 md0

It's then possible to copy everything over from the (now non-redundant) old zpool to the (degraded) new raidz pool, and finally destroy the old zpool and replace the offlined vnode with the fourth hard disk.

The problem I had was with creating a zpool from a sparse file. For some reason zpool create wants to fill the file (which obviously doesn't work because it's supposed to be the size of the entire hard disk). I found a forum post suggesting that this is due to ZFS TRIM support (which was introduced in FreeBSD 10, and can be disabled by following the instructions in the FreeBSD manual) but that didn't seem to fix the problem. The forum post did point out that zpool replace doesn't fill new disks in the pool. He had a problem with autoexpansion, but I figured I'd give it a shot and fortunately I didn't have the same issue. The following is my workaround to Emil's instructions, with all steps included for completeness:

(note: My original zpool was called M31, and it was backed by geli-encrypted disks on /dev/label/d1 and /dev/label/d2)

glabel label d3 /dev/ada3
glabel label d4 /dev/ada4
# set up full-disk encryption on the new disks:
dd if=/dev/random of=/root/d3.key bs=64 count=1
dd if=/dev/random of=/root/d4.key bs=64 count=1
geli init -s 4096 -K /root/d3.key /dev/label/d3
geli init -s 4096 -K /root/d4.key /dev/label/d4
geli attach -k /root/d3.key label/d3
geli attach -k /root/d4.key label/d4

## Detach one of the original mirrored disks
## from the original zpool
zpool detach M31 label/d2.eli

## Make new zpool based on the three free disks,
## and one (small) file:
truncate -s 1G /d0
mdconfig -a -t vnode -S 4096 -f /d0 -u 0
zpool create -f M32 raidz md0 label/d2.eli 
    label/d3.eli label/d4.eli
zpool set autoexpand=on M32
zpool offline M32 md0

## Now, /d0 got filled with 1G of zeros
## but we want M32 to be based on four 2TB disks.
truncate -s 2000398929920 /d1
  # (Note: get actual size from `geom ELI list`)
mdconfig -a -t vnode -S 4096 -f /d1 -u 1
zpool replace M32 md0 md1
## Somehow that leaves /d1 sparse, but allows the
## zpool to autoexpand

## Now be sure to remove that vnode, and its file back:
zpool offline M32 md1
mdconfig -d -u 0
mdconfig -d -u 1
rm /d0 /d1

## Now we have two zpools, both degraded
## but usable, and we need to copy the data from
## M31 to M32:
zfs get all > /zfs-get-all.txt
zfs snapshot -r M31@cp1
zfs send -R M31@cp1 | zfs recv -v -F -d M32

## You can apparently continue using M31, even making
## snapshots. After that last command finishes you can
## catch up with any changes that were made by shutting
## off any snapshot scripts you have, and doing:
zfs umount -a
zfs snapshot -r M31@cp2
zfs rollback M32@cp1
zfs send -R -I M31@cp1 M31@cp2 | zfs recv -v -d M32

## Then just swap out the new zpool with the old one:
zpool destroy M31
zpool export M32
zpool import M32 M31

## Replace the offlined disk with the real disk
zpool replace M31 md1 label/d1.eli

## And finally, check that everything worked
zfs get all > /zfs-get-all-new.txt
diff /zfs-get-all.txt /zfs-get-all-new.txt

## Files should be identical except things like
## creation dates, available space, etc.

And that's it! Hopefully after that's all done what's left is a ~5TB file system whose contents are identical to the pre-upgrade state of the original zpool. Any one of the four hard disks can fail without any data loss, and data quality is verifiable, and recoverable.

Written on August 31st , 2015 by JPH