Zrep documentation

This document covers a general overview of zrep, as well as giving specific examples for use cases.

Overview
Initialization
- Pre-existing filesystems
- Recursive, or nested, filesystems
Replication
- Forced replication
- Resume replication
Failover
Takeover
Recovery (what to do after master was down)
Recovery (point-in-time)
Status
Tuning
Multiple Destinations
Debugging and troubleshooting
Using from a backup server
Using as non root user

Overview of zrep

Zrep is designed to be a robust, yet simple, mechanism to keep a pair of zfs filesystems in sync, in a highly efficient manner. It can be used on as many filesystems on a server as you like. It relies on ssh trust between hosts.
There are two general areas where you might consider zrep:

High availability style failover of filesystems between two servers
A centralized backup server
Since the original design spec of zrep was for failover user, most of the documentation is geared towards that. However, there is a section at the end geared towards the secure backup server case.

Please note that, while the "backup server" requires only one-way trust, the other usage examples below presume that you have two-way ssh trust, and OS+filesystem targets similar to the following:
host1 - zfs pool "pool1", with ZFS version equivalent to solaris 10 update 9+
	root ssh trust to/from host2

host2 - zfs pool "pool2", with ZFS version equivalent to solaris 10 update 9+
	root ssh trust to/from host1
host1 and host2 are able to "ping" and "ssh" to/from each other
Reminder, that to allow root ssh trust, you may have to

create new "ssh keys" for root, in ~root/.ssh/
copy over to otherhost:~root/.ssh/authorized_keys
edit /etc/ssh/sshd_config to PermitRootLogin without-password
edit /etc/default/login to comment out CONSOLE=xxx
If you have a ZFS that supports delegation of privileges, and you wish to run zrep without root, see the "non root user" section of this document.
zrep snapshot naming conventions
zrep by default makes snapshots that look like
  your_fs@zrep_123abc
Note that the number part, is a HEXADECIMAL serial number, not a random string of numbers and letters.
Also note, that if you override the "tag" used (mentioned later in this document), that the initial "zrep@" part will change to match whatever tag you set.

Initialization of zrep replicated filesystem "prodfs"

	host1# zrep -i pool1/prodfs host2 destpool/prodfs
This will create an initial snapshot on prodfs. It will then create a new filesystem, destpool/prodfs, on host2, and set it "readonly" there.
Special tips:

If you want to set special options on the destination filesystem (ie: compression and deduplication), you might choose to create the filesystem before initial sync. Zrep might work. However, you may end up having to manually set the zrep expected properties after zrep init, as mentioned in the previous paragraphs, with "zfs set ...", or "zrep changeconfig ..."
Alternatively, if your ZFS implementation supports using -o to set properties, you can use ZREP_INIT_REMOTE_PROPERTIES. For example,
export ZREP_INIT_REMOTE_PROPERTIES="compression=on"; zrep init ....
Multiple properties need to be space-separated
For faster initialization, I strongly suggest that you use http://www.psc.edu/networking/projects/hpn-ssh/, a patched version of openssh. When use use it for both sshd and ssh, you can achieve throughput of 2x better or more, than standard ssh. Change which ssh binary zrep uses, by setting the SSH environment variable to the explicit binary path you wish to be used.
For regular, frequently done post-initial syncs, the amount of data to be copied will proably be relatively small, so the speed of ssh does not matter as much in that case.

Pre-existing filesystem
If for some reason you have a pre-existing (not-zrep-initialized) snapshotted filesystem replication pair that you want to convert to using zrep, you can do so, by first renaming the most recent snapshot to match zrep snapshot naming conventions. (See Overview section of this document)
If you havent already done replication, you can save yourself a bit of work, by creating an initial snapshot that already machines zrep snapshot naming conventions, before doing the zfs send and receive.
Next, set the basic zfs properties zrep expects such as zrep:src-host, etc.
You can do this with
srchost# "zrep changeconfig -f srcfs desthost destfs"
desthost# "zrep changeconfig -f -d destfs  srchost srcfs"
You will then need to set the last-sent timestamp property on the master snapshot, (actually called "zrep:sent" these days) which you can do easily via
srchost# zrep sentsync fs@zrep_snapnamehere"
You should then be ready to do a "zrep sync fs"
Initialization for nested ZFS filesystems (Recursive flag)
If you wish to set up replication for prodfs, and all ZFS filesystems under it, then you can use the new environment variable as follows:
    export ZREP_R=-R
You need to have this set for the zrep init, and also for all subsequent zrep syncs.
(well, technically, you COULD only set the -R flag for every hour, yet sync just the top one without it more often. But ....ick)
I strongly suggest you do not mix and match nesting. Take an all-or-nothing approach, of picking one parent ZFS filesystem, and then relying on that single anchor to replicate everything under it, consistently. Dont try to have one zrep init for the top level, but then use zrep init again on a sub-filesystem.
zrep wont stop you from doing this, but I suspect you may run into problems down the line.
Why you might NOT want to do this:
This will cause your data transfers to be serialized. you would probably get better throughput if you could overlap them. Then again... I havent gotten around to rewriting the zrep global locking to cleanly allow for multiple runs. Soon(tm).
Additionally, zrep at present uses "zfs get -r". If you have a hundred nested filesystems, zrep status type operations will then take a lot longer than otherwise.
If you have a THOUSAND of them... I would imagine it would take a WHOLE lot longer!
I imagine wall-clock time would still only be affected by less than 60 seconds extra though, so... use at your own risk.

Replication

Here is how you tell zrep to replicate updates from the master to the other side.
	master# zrep sync pool1/prodfs
Alternatively, if you need to "pull" from the destination side, rathern than push from the master, you can do
	slave# zrep refresh pool1/prodfs
You can call this manually, or from a cron job as frequently as once a minute. It will know from initialization, where to replicate the filesystem to, and do so.
If you have more than one filesystem initialized as a zrep master, you may also use
	# zrep -S all
(Note that at the current time, this runs as a single non-threaded process, so it may be faster for you to explicitly run separate zrep processes)
You can safely set up a cronjob on both host1 and host2 to do "all", and it will "do the right thing", for the most part. However, to avoid seeing potential harmless errors for conflicts on overly long syncs, you can set a "quiet limit" for syncs.
	
	# zrep sync -q NUMBER-OF-SECONDS all
Then, if it has been less than NUMBER-OF-SECONDS since last succesful sync for a filesystem, it will benignly continue to the next filesystem, with a small note on stdout, even if it cant get a lock on a particular zrep registered filesystem to do a new sync.
If you have nested ZFS filesystems and are using zrep to sync them all in a single job, see the section about initialization of nested filesystems, to make sure that you set the required environment variable before doing your "zrep sync".
Forced sync replication
By default, zrep uses the standard "zfs send" and "receive" commands. This actually allows certain actions to take place on the destination side, such as snapshots, that persist even after a supposed sync.
If you want to force the other side to look exactly like the sending side, you can add the "-f" option to zrep (which adds the -F option to receive)
"zrep refresh" also accepts -f.
Note: -f must be given AFTER the sync directive to zrep.
Resume replication
Some implementations of ZFS support a resume feature to zfs sends.
If a sync job fails halfway through, and you would like to pick up where it left off, rather than start from scratch, you can then add the "-r" option to "zrep send" or "zrep refresh"
Replication sets

Sometimes it is desirable to do replication in synchronized sets. For example, if you have some kind of database using multiple filesystems. In this case, it would be desirable to take down or pause the database very briefly to take a snapshot, then resume operations, while doing the data transfer in the background.
While zrep does not explicitly have a notion of sets (other than nested filesystems using zrep -R), you may use this kind of workflow with "snaponly", and "synconly":
  # pause database  
  zrep snaponly
  # resume database
  zrep synconly
  

Failover

	host1# zrep failover pool1/prodfs
In planned failover situations, you probably want to run something from your production side, rather than the standby side, to trigger failover. This is what you want to run
This will reconfigure each side to know that the flow of data should now be host2 -> host1, and flip readonly bits appropriately. Running "zrep -S all" on host1 will then ignore pool1/prodfs Running "zrep -S all" on host2 will then sync pool2/prodfs to pool1/prodfs
If, in contrast, you have already completed an emergency "takeover" from the other side (eg: network was down and so host1 has no idea), you can officially acknowlege the remote side as master, with
	host1# zrep failover -L  pool1/prodfs

Takeover

	host2# zrep takeover pool2/prodfs
This is basically the same as "planned failover" example, but the required syntax for running on the standby host.
For EMERGENCY failover purposes, where the primary host is down, you should instead force takeover by this host, with
	host2# zrep takeover -L pool2/prodfs

Recovery (AKA Failback, or "what to do after master comes back online")

If you have previously done a clean failover to a secondary server,and you want to make the original server primary again, simply use the failover command on the secondary server. You do not need the rest of this section.
If you had to force the secondary to be master (via "takeover") due to primary server being down, then you need to first bring the pair to a partially synced state again, by rolling back any changes since the last sync, on the master. Then you can bring them into synchronized state, and decide whether you want to fail back to the original master or not.
All steps required are shown below:
	host1# zrep failover -L pool1/prodfs
         # NOTE: "local-only" failover mode.
	 # Will ROLL BACK filesystem data on host1, to last synced point
	 # host2 will not be touched.

    	HOST2# zrep sync pool1/prodfs


	# And now, IF you want to make host1 the master again, either:
	host1# zrep takeover pool1/prodfs
	
	# or
	host2# zrep failover pool1/prodfs
  
The above section is also for "split brain" scenarios. Choose which of your two masters you want to be the real master. Treat that as "host2" in the above example.

Recovery (point in time)

If you have a need to revert your filesystem to an earlier version (ie: a zrep snapshot) then you can use the following procedure:

On the master side, or do a "zfs rollback" to a specific snaphot that ALSO SHOWS UP ON THE DESTINATION SIDE.
zfs rollback snap@fs

(on destination system) zfs rollback snap@fs

zrep setlastsent fs snap@fs

And now you should be ready to resume normal zrep operations.

Status

You can find the status of zrep managed filesystems in multiple ways. To simply find a list of them, use
	hostX# zrep list
This will give you a list of all zrep related filesystems. If you want to see all the internal zrep related zfs properties, add the -v flag.
To see the status of all zrep "master" filesystems, use
	hostX# zrep status
This will give a list of zrep managed filesystems the host is "master" for, and the date of the last successfully replicated snapshot.
You can see the status of all zrep filesystems, use -a.
If you have a mix of master, and slave, filesystems,you may wish to use the -v flag, which will show both the source and destination, as well as the last synced time. Please note that the order of flags does matter. So,
	hostX# zrep status -v -a

Tuning

Config file

In the top section of the script itself, there are many optional environment variables mentioned. You can set them in your environment. Alternatively, the very latest versions of zrep allow for them to be set in a configuration file, "/etc/default/zrep"

SSH option tuning

If you would like to provide custom options to the ssh command zrep invokes... or replace it with an entirely different program or wrapper.. you may set the environment variable $SSH, to whatever you wish.

Throughput optimization with mbuffer or bbcp

In addition to ssh tuning, it is sometimes desirable to use intermediate buffer utilities such as mbuffer or bbcp. When the endpoint is more than a few miles away, mbuffer can help with TCP latency behaviour. Alternatively, if you have very large pipes relative to the encryption throughput of a single cpu, bbcp will let you take advantage of multiple cpus.
See the comments near the top of the script itself, for the appropriate environment variables to set, to enable one or the other of those utilities.

Compression and encryption

Some people wish to use the ZFS-native encryption or compression, rather than relying solely on ssh. Zrep allows this by use of the ZREP_SEND_FLAGS environment variable. set
ZREP_SEND_FLAGS=-c
or
ZREP_SEND_FLAGS=--raw
as desired (so long as your ZFS actually supports the flags).

Archive management

Higher number of archives on remote side
By default, the number of zrep-recognized snapshots will be the same on both sides.
This is controlled by the zrep:savecount variable. You may set different values for each side, by using "zfs set" on the filesystem
Long-lived archive snapshots
You CANNOT manually create snapshots on remote side, because then incremental repliation will fail, due to the target "not being the most recent snapshot"
Because of thise issues, the best way to keep long-lived archives, may be one of the following methods:
a) create non-zrep-recognized snapshots on local side. Delete locally, but not remotely zfs send -I will copy over even non-recognized "new" snapshots.
b) schedule 'clone' jobs on remote side. pick the most recent zrep snapshot, and create a filesystem clone. This will not dupliacate the files on disk, and also not interfere with ongoing zrep snapshot/replication activity Again, care must be taken not to leave the clones around indefinitely, to the point where they fill available space.

Multiple Destinations

The default usage for zrep is to replicate to a single filesystem. However, some people may wish for multiple destinations. To do this requires essentially running multiple zrep processes per filesystem.

CAUTION ! At present, you must ensure, yourself, that you do not overlap multiple zrep processes, if you are using ZREPTAG functionality. Run them sequentially, not in parallel.
I am planning an update to fix this issue, but at the moment, there are problems with zrep global lock contention that may arise otherwise.

Set the environment variable ZREPTAG, to a short, unique identifier for each destination ( I strongly recommend you stick to the format "zrep-somethinghere"). Then init and sync as normal. For example:
	export ZREPTAG=zrep-dc
	zrep init pool/fs dc.your.com remotepool/fs
	zrep sync pool/fs
	  ...
	export ZREPTAG=zrep-ca
	zrep init pool/fs ca.your.com remotepool/fs
	zrep sync pool/fs
Alternatively, you may use "zrep -t tagnamehere normal.zrep.command.here"

Debugging

This section is for when Things Go Horribly Wrong... or if you just want to understand zrep better.
The first rule when debugging is, set "DEBUG=1" in your environment, to make zrep be a little more verbose.
Almost everything zrep does is controlled by ZFS properties.
ZFS allows you to set random properties of your choice, on filesystems.
So for example,
zfs set I_am=the_greatest /some/filesystem
is perfectly valid usage.
By default, zrep prefixes all its values with "zrep:". You can see the values it sets on a filesystem, (while skipping the many standard system-level properties) with "zrep list -v".
Sample output:
$ zrep list -v rpool/PLAY/source
rpool/PLAY/source:
zrep:master     yes
zrep:src-fs     rpool/PLAY/source
zrep:dest-fs    rpool/PLAY/d1
zrep:src-host   (myhostname)
zrep:savecount  5
zrep:dest-host  localhost
By default, zrep_#####_unsent snapshots will get left around for a while, if a snapshot cant be sent. This is to keep file backup recoverpoints around, for those who like it, even if a remote sync fails. They should get expired at the normal rate. However, if you are running zrep in short intervals, this is not neccessary, and you can set ZREP_RENAME_UNSENT to no, in your environment or the config file.

Use from a backup server (the reverse from normal)

Some folks may be looking just for a way to simplify their backup mechanisms, rather than doing fancy failover.
In this scenario, you probably will prefer one centralized backup server that has trusted root ssh OUT, but does not allow root ssh IN. (This way, it is possible to have independant clients, in separate zones of trust or system administration from each other).

For this goal, both initial setup, and replication, is reversed from the normal way. Normally, zrep expects the data master to have full ssh privilege into the data replication target. However, when a backup server is the target, sysadmins usually prefer the target system to have the privileges instead of the client.
To implement a backup server type layout with zrep, the simplest way is when you are starting from scratch, with a brand new filesystem.
However, it is also possible to retroactively convert an existing production ZFS filesystem, to be backed up by zrep to a backup server.

Backup server with a clean new filesystem
The steps would look like this, as run on the backup server:

Set up the filesystem initially on the backup server
Do "zrep init {localfs} {clienthost} {clientfs}"
"zrep failover {localfs}" #to make the client the data master.
At this point you have now set up initial state cleanly, and failed over so that the "client" side is the active master(ie: the read/write one) for the pair. The backup server side will be in read-only mode, ready to receive syncs.
If you already have a client zfs filesystem with a bunch of data, but you dont want to give it full access on the backup server side, it is technically possible to do a manual initialization to the backup server. A smart sysadmin should be able to do a manual zfs replication, and then setting the zfs properties themselves.
Alternatively, using the new "zrep changeconfig -f" syntax, and finally, converting an existing snapshot to a zrep active one as detailed in the Troubleshooting page
To then trigger incremental backups from the backup server side, you have to use a special zrep command: "zrep refresh"
You must then set up a special zrep job on the backup server, instead of the usual procedure of pushing it from the master side. It will look just like a "zrep sync" job, but instead, you will want to call
	zrep refresh pool/fs
Instead of the master side "pushing" new data, you will now be pulling the latest bits to your read-only backup server copy of the filesystem.

Backup server with an existing ZFS filesystem
This presumes you have ssh trust from backup server to the client. It also presumes you have installed zrep on both sides, somewhere in the standard $PATH
Hostnames are "client1" and "backupsrv".
Data filesystem is "data/fs".
Backup pool base is "backup/client1"

FYI, these instructions basically walk you through what a "zrep init" does. So if some time in the future the init process changes, these docs will need updating as well.
Create a snapshot of client fs

client1# zfs snap data/fs@zrep_000001

set zrep base properties

client1# zrep changeconfig -f data/fs backupsrv backup/client1/fs

Replicate to backup server and set properties

backupsrv# ssh client1 zfs send data/fs@zrep_000001 | zfs recv -F backup/client1/fs

backupsrv# zrep changeconfig -f -d backup/client1/fs client1 data/fs

let zrep know valid sync has happened

# this will also set "master" flag on client1

client1# zrep sentsync -L data/fs@zrep_000001

It should now be possible to run at your preferred frequency
    backupsrv# zrep refresh backup/client1/fs  
You may also wish to do "zfs set readonly=on backup/client1/fs"
Disaster recovery of client from backup server mode

Note: This section is specific to running zrep as part of a "backup server" type situation. For more normal zrep usage, see higher up in this page.

If you get into the situation where you have lost a client filesystem (or worse yet, the entire remote server), the good news is that it should be relatively straightforward to reinitialize, using the following steps:

Get the filesystem on the backup server, to be exactly how you need it to be.
If there is some reason you wish to preserve some point-in-time snapsnots, make your own by hand that dont start with "zrep"
Clear all zfs snapshots and information, by using "zrep clear pool/fs". This should just remove metadata, but leave the filesystem intact.
Do a full resync to the remote client, with
"zrep init pool/fs client remotepool/fs" on the backup server.
You are effectively now back at step 2 of the initial backup server setup. Finish that setup proceedure.
That should be all that is required to get you up and running again!

Using as non root user

Some people prefer to run zrep as a non-privileged user. If your implementation of ZFS supports the "allow" syntax, you may be able to do this, if you give the designated user the following privileges:
zfs allow zrepuser  \
 create,destroy,hold,mount,readonly,receive,rename,rollback,send,snapshot,userprop \
 your/fs/here
    

Author's note: I am pleased to know that people have been running my zrep script since 2012, in production, across the world! It is a great motivation to know that people are finding my efforts useful to them.

That being said, another great motivation is things like an amazon gift card(from main US amazon.com site only!), or something from my wishlist. :-D No gift too small, it's the thought that counts ;-)
My amazon wishlist

Zrep documentation

Contents

Overview of zrep

zrep snapshot naming conventions

Initialization of zrep replicated filesystem "prodfs"

Pre-existing filesystem

Initialization for nested ZFS filesystems (Recursive flag)

Replication

Forced sync replication

Resume replication

Replication sets

Failover

Takeover

Recovery (AKA Failback, or "what to do after master comes back online")

Recovery (point in time)

Status

Tuning

Config file

SSH option tuning

Throughput optimization with mbuffer or bbcp

Compression and encryption

Archive management

Higher number of archives on remote side

Long-lived archive snapshots

Multiple Destinations

Debugging

Use from a backup server (the reverse from normal)

Backup server with a clean new filesystem

Backup server with an existing ZFS filesystem

Disaster recovery of client from backup server mode

Using as non root user