Introduction to ZFS Replication

Replication is an OpenZFS feature that really ups the data management game, providing a mechanism for handling a hardware failure with minimal data loss and downtime. Fortunately, replication itself is easy to configure and understand. In this article we’ll keep things simple, and practice replicating small amounts of data to a virtual machine.

Your subscription could not be saved. Please try again. Your subscription has been successful.

Did you know? Improve the way you make use of ZFS in your company. Did you know you can rely on Klara engineers for anything from a ZFS performance audit to developing new ZFS features to ultimately deploying an entire storage system on ZFS? ZFS Support ZFS Development

Additional Articles

GPL 3: The Controversial Licensing Model and Potential Solutions
ZFS High Availability with Asynchronous Replication and zrep
8 Open Source Trends to Keep an Eye Out for in 2024
OpenZFS Storage Best Practices and Use Cases – Part 3: Databases and VMs
OpenZFS Storage Best Practices and Use Cases – Part 2: File Serving and SANs

Up your OpenZFS data management game and handle hardware failure with a minimal data loss

In Basics of ZFS Snapshot Management, we demonstrated how easy and convenient it is to create snapshots and use them to restore data on the local system.

In this article, we’ll demonstrate how to replicate snapshots to another system. This feature of OpenZFS really ups the data management game, providing a mechanism for handling a hardware failure with minimal data loss and downtime. Replication is also a convenient way to quickly spin up a copy of an existing system to another, say when you purchase a new laptop, or to deploy a whole lab of similar systems. It could also be used to mirror the contents of your home directory on two different systems.

The replication design used by OpenZFS is pretty ingenious. Unlike cloning software, replication does not do a byte-for-byte copy. Instead, zfs send converts snapshots into a serialized stream of data and zfs receive transforms the streams back into files and directories. Received snapshots are treated as a live file system, meaning that the data in the snapshot can be directly accessed on the receiving system.

In practical terms, snapshots are replicated to another system over the network, possibly in another geographic location, and typically on a schedule. This assumes that the other system has enough storage capacity to accept the replicated data, in addition to the replicated changes over time, and that the network can handle the data transfer. It can seem a bit daunting to determine the amount of required storage for data that change over time as well as the needed network capacity.

Fortunately, replication itself is easy to configure and understand. In this article we’ll keep things simple, and practice replicating small amounts of data to a virtual machine. Once you’re comfortable with how the commands work, you can start to apply them to real systems and larger amounts of data.

Things to Know First

Replication requires both systems to have at least one OpenZFS pool. The pools do not need to be identical: for example, they can be a different size, use a different RAIDZ level, or have different properties. However, if you have explicitly enabled a feature on one system’s pool, it must also be enabled on the other system’s pool.

Depending upon the size of the snapshot and the speed of the network, the first replication can take a very long time to complete, especially when replicating an entire pool. If possible, perform an initial replication when the network is not busy. Once the first replication is complete, subsequent replications of incremental data are quick. When replicating pools, be aware that the replicated data will not be accessible on the receiving system until the replication is complete.

Finally, it is very important to be aware of the available capacity on the receiving system and the size of the snapshot being sent. If you are scripting a replication schedule, include a space check before starting the replication.

Preparing the Receiving System

For these replication examples, a laptop is the sending system and a virtual machine is the receiving system.

Note: The commands in this article are run as the root user. While you could use zfs allow to give a user permission for the send and receive commands, consider that replication involves another system and usually the transfer of system files or pools. This is different than giving a regular user permission to snap or restore their own data locally.

I installed FreeBSD 13 into a VirtualBox virtual machine configured with two 16GB virtual storage devices. During installation, one virtual storage device was formatted with ZFS, DHCP was configured on the virtual network interface, and the default option of enabling SSH was selected. The IP address of this receiving system is 10.0.2.15.

Create a Pool to Hold the Replicated Snapshots

To FreeBSD, the two virtual storage devices appear as ada*, which represents the zroot pool created during installation, and ada1 which is still available:

ls /dev/ad* /dev/ada0 /dev/ada0p1 /dev/ada0p2 /dev/ada0p3 /dev/ada1

Getting your ZFS infrastructure up to date has never been easier!

Our team provides consistent, expert advice tailored to your business.

I’ll use the zpool create command to create a pool named backups on the ada1 device:

zpool create backups /dev/ada1

This system now has 2 pools: zroot contains the operating system, and backups will be used to hold the replicated snapshots:

zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH backups 15.5G 336K 15.5G - - 0% 0% 1.00x ONLINE zroot 15.5G 336K 15.5G - - 0% 0% 1.00x ONLINE

Configure SSH Access

OpenZFS uses SSH to encrypt the replication stream during the network transfer. By default, root is not allowed to ssh into a FreeBSD system. Since root will be sending the replication stream, change this line in the SSH daemon configuration file (/etc/ssh/sshd_config):

#PermitRootLogin no

PermitRootLogin yes

Then, tell the SSH daemon to reload its configuration:

service sshd reload

The receiving system is now configured.

Preparing the Sending System

On the laptop (sending system), I want to check that the root user has an SSH key pair. This authentication method requires a copy of the public key on the receiving system. By using a key pair without a passphrase, replication becomes fully scriptable, without prompting for user input.

If a key pair does not exist, generate one as root. Press enter at all of the prompts to accept the defaults and not require a passphrase:

ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root.ssh/id_rsa. Your public key has been saved in /root.ssh/id_rsa.pub.

Next, send a copy of the public key to the receiving system. This command assumes that you know the root password on the receiving system:

cat .ssh/id_rsa.pub | ssh 10.0.2.15 'cat >>.ssh/authorized_keys' Password for [email protected]: exit

Finally, verify that you can ssh to the receiving system without being prompted for a password or passphrase. If it works, you should just get the command prompt of the receiving system. Type exit to logout of the ssh session.

Testing a Replication

Let’s start by creating a test dataset, populating it with a small amount of data, and taking a snapshot:

zfs create tank/usr/home/dru/test cp -R /etc/* /usr/home/dru/test/ zfs snapshot tank/usr/home/dru/test@testbackup zfs list -t snapshot NAME USED AVAIL REFER MOUNTPOINT tank/usr/home/dru/test@testbackup 0 - 2.23M -

Remember: it is very important to be aware of the amount of snapshot data (REFER) to ensure it will fit on the replicated system. In this case, 2.23M is a trivial amount of data, even for the small 15.5GB pool on the receiving system.

The command to replicate that snapshot makes a send stream with verbose stats (-v) of the tank/usr/home/dru/test@testbackup snapshot, pipes (|) that stream to the ssh command in order to log in as root on 10.0.2.15 so the receiving system can receive the stream and save it to the backups pool:

zfs send -v tank/usr/home/dru/test@testbackup | ssh 10.0.2.15 zfs receive backups full send of tank/usr/home/dru@testbackup estimated size is 2.13M total estimated size is 2.13M TIME SENT SNAPSHOT cannot receive new filesystem stream: destination ‘backups’ exists must specify -F to overwrite it warning: cannot send ‘tank/usr/home/dru/test@testbackup’: signal received

This error indicates an important difference between replicating a snapshot of a pool and replicating a snapshot of a dataset. When sending a snapshot of a dataset, you must append a location name to the name of the destination pool. You can name the location whatever you want, as long as it doesn’t already exist on the destination pool. In this example, I’ll specify a name of test:

Want to learn more about ZFS? We consistently write about the awesome powers of OpenZFS in our article series.

zfs send -v tank/usr/home/dru/test@testbackup | ssh 10.0.2.15 zfs receive backups/test full send of tank/usr/home/dru@test-backup estimated size is 2.13M total estimated size is 2.13M TIME SENT SNAPSHOT

Note that the output indicates the estimated required storage capacity. You should ^c to abort the command if the receiving system doesn’t have the required capacity. In this case, the snapshot is so small, the transfer is almost instantaneous. I can verify it worked by checking the receiving system:

ssh 10.0.2.15 ls /backups/ test ls /backups/test

The location name of /test was created automatically and the second listing displays the contents of /etc/, the original source of the dataset.

Let’s try replicating a larger snapshot. Note that I specify a different location on the destination pool to hold this snapshot of dru’s home directory:

zfs snapshot tank/usr/home/dru@homedir zfs send -v tank/usr/home/dru@homedir | ssh 10.0.2.15 zfs receive backups/dru full send of tank/usr/home/dru@homedir estimated size is 3.13G total estimated size is 3.13G TIME SENT SNAPSHOT 07:47:14 3.37M tank/usr/home/dru@homedir 08:09:53 3.15G tank/usr/home/dru@homedir

This command perked along, indicating the transfer progress until the transfer successfully completed. Over this network, it took 22 minutes to send just over 3GB worth of data. Your transfer times will vary—note them over a variety of transfer sizes and network activity to estimate your own baseline.

If you’re impatient and try to list the contents of /backups/dru/ before the transfer completes, that destination won’t exist until the transfer is finished. Once the transfer is complete, a listing of /backups/dru/ should look just like her home directory.

Incremental Replication

Once an initial replication is complete, you can test sending an incremental snapshot. Let’s add a few new files to dru’s home directory and list the differences to the snapshot:

zfs destroy tank/usr/home/dru@permtest1 cp /var/log/messages* /usr/home/dru/ zfs diff tank/usr/home/dru@homedir + /usr/home/dru/messages + /usr/home/dru/messages.0.bz2 + /usr/home/dru/messages.1.bz2 + /usr/home/dru/messages.2.bz2

Let’s take a new snapshot that includes the 4 newly added files:

zfs snap tank/usr/home/dru@homedir-mod

To replicate these differences, add the increment switch (-i) and specify the names of the two snapshots. This command requires the first snapshot to already exist in the specified destination on the receiving system:

zfs send -vi tank/usr/home/dru@homedir tank/usr/home/dru@homedir-mod | ssh 10.0.2.15 zfs receive backups/dru send from @homedir to tank/usr/home/dru@homedir-mod estimated size is 24.1M TIME SENT SNAPSHOT 10:34:23 3.50M tank/usr/home/dru@homedir-mod 10:34:32 24.1M tank/usr/home/dru@homedir-mod

While the initial replication took 22 minutes, the incremental replication took 9 seconds.

If I ssh into the receiving system and do a listing, the 4 added files appear in /backups/dru/.

If you receive this error when performing an incremental replication:

cannot receive incremental stream: destination backups/dru has been modified since most recent snapshot

warning: cannot send ‘tank/usr/home/dru@homedir-mod’: signal received

it means that ZFS has determined that replicated data has changed on the destination; since the snapshots on the sending and receiving systems are no longer identical, ZFS aborts the replication. If the dataset has accidently changed since the last snapshot, you can use the zfs rollback command to revert the changes. If you want to overwrite the data changes on the receiving system, use receive -F in the command to force the receiving system to rollback to the state of the last received snapshot so the systems are again in sync.

Conclusion

This article should get you started replicating data between systems. We recommend that you start with small amounts of data to get a better understanding of replication times within your own environment.