Zrep is designed to be a robust, yet simple, mechanism to keep a pair of zfs filesystems in sync, in a highly efficient manner. It can be used on as many filesystems on a server as you like. It relies on ssh trust between hosts.There are two general areas where you might consider zrep:
Since the original design spec of zrep was for failover user, most of the documentation is geared towards that. However, there is a section at the end geared towards the secure backup server case.
- High availability style failover of filesystems between two servers
- A centralized backup server
Please note that, while the "backup server" requires only one-way trust, the other usage examples below presume that you have two-way ssh trust, and OS+filesystem targets similar to the following:host1 - zfs pool "pool1", with ZFS version equivalent to solaris 10 update 9+ root ssh trust to/from host2 host2 - zfs pool "pool2", with ZFS version equivalent to solaris 10 update 9+ root ssh trust to/from host1host1 and host2 are able to "ping" and "ssh" to/from each otherReminder, that to allow root ssh trust, you may have to
If you have a ZFS that supports delegation of privileges, and you wish to run zrep without root, see the "non root user" section of this document.
- create new "ssh keys" for root, in ~root/.ssh/
- copy over to otherhost:~root/.ssh/authorized_keys
- edit /etc/ssh/sshd_config to PermitRootLogin without-password
- edit /etc/default/login to comment out CONSOLE=xxx
zrep snapshot naming conventions
zrep by default makes snapshots that look likeyour_fs@zrep_123abcNote that the number part, is a HEXADECIMAL serial number, not a random string of numbers and letters.Also note, that if you override the "tag" used (mentioned later in this document), that the initial "zrep@" part will change to match whatever tag you set.
host1# zrep -i pool1/prodfs host2 destpool/prodfsThis will create an initial snapshot on prodfs. It will then create a new filesystem, destpool/prodfs, on host2, and set it "readonly" there.Special tips:
- If you want to set special options on the destination filesystem (ie: compression and deduplication), you might choose to create the filesystem before initial sync. Zrep might work. However, you may end up having to manually set the zrep expected properties after zrep init, as mentioned in the previous paragraphs, with "zfs set ...", or "zrep changeconfig ..."
Alternatively, if your ZFS implementation supports using -o to set properties, you can use ZREP_INIT_REMOTE_PROPERTIES. For example,
export ZREP_INIT_REMOTE_PROPERTIES="compression=on"; zrep init ....
Multiple properties need to be space-separated- For faster initialization, I strongly suggest that you use http://www.psc.edu/networking/projects/hpn-ssh/, a patched version of openssh. When use use it for both sshd and ssh, you can achieve throughput of 2x better or more, than standard ssh. Change which ssh binary zrep uses, by setting the SSH environment variable to the explicit binary path you wish to be used.
For regular, frequently done post-initial syncs, the amount of data to be copied will proably be relatively small, so the speed of ssh does not matter as much in that case.
Pre-existing filesystem
If for some reason you have a pre-existing (not-zrep-initialized) snapshotted filesystem replication pair that you want to convert to using zrep, you can do so, by first renaming the most recent snapshot to match zrep snapshot naming conventions. (See Overview section of this document)
If you havent already done replication, you can save yourself a bit of work, by creating an initial snapshot that already machines zrep snapshot naming conventions, before doing the zfs send and receive.Next, set the basic zfs properties zrep expects such as zrep:src-host, etc.
You can do this withsrchost# "zrep changeconfig -f srcfs desthost destfs" desthost# "zrep changeconfig -f -d destfs srchost srcfs"
You will then need to set the last-sent timestamp property on the master snapshot, (actually called "zrep:sent" these days) which you can do easily viasrchost# zrep sentsync fs@zrep_snapnamehere"You should then be ready to do a "zrep sync fs"Initialization for nested ZFS filesystems (Recursive flag)
If you wish to set up replication for prodfs, and all ZFS filesystems under it, then you can use the new environment variable as follows:export ZREP_R=-RYou need to have this set for the zrep init, and also for all subsequent zrep syncs.
(well, technically, you COULD only set the -R flag for every hour, yet sync just the top one without it more often. But ....ick)I strongly suggest you do not mix and match nesting. Take an all-or-nothing approach, of picking one parent ZFS filesystem, and then relying on that single anchor to replicate everything under it, consistently. Dont try to have one zrep init for the top level, but then use zrep init again on a sub-filesystem.
zrep wont stop you from doing this, but I suspect you may run into problems down the line.Why you might NOT want to do this:
This will cause your data transfers to be serialized. you would probably get better throughput if you could overlap them. Then again... I havent gotten around to rewriting the zrep global locking to cleanly allow for multiple runs. Soon(tm).Additionally, zrep at present uses "zfs get -r". If you have a hundred nested filesystems, zrep status type operations will then take a lot longer than otherwise.
If you have a THOUSAND of them... I would imagine it would take a WHOLE lot longer!
I imagine wall-clock time would still only be affected by less than 60 seconds extra though, so... use at your own risk.
Here is how you tell zrep to replicate updates from the master to the other side.master# zrep sync pool1/prodfsAlternatively, if you need to "pull" from the destination side, rathern than push from the master, you can doslave# zrep refresh pool1/prodfsYou can call this manually, or from a cron job as frequently as once a minute. It will know from initialization, where to replicate the filesystem to, and do so.If you have more than one filesystem initialized as a zrep master, you may also use
# zrep -S all(Note that at the current time, this runs as a single non-threaded process, so it may be faster for you to explicitly run separate zrep processes)You can safely set up a cronjob on both host1 and host2 to do "all", and it will "do the right thing", for the most part. However, to avoid seeing potential harmless errors for conflicts on overly long syncs, you can set a "quiet limit" for syncs.
# zrep sync -q NUMBER-OF-SECONDS allThen, if it has been less than NUMBER-OF-SECONDS since last succesful sync for a filesystem, it will benignly continue to the next filesystem, with a small note on stdout, even if it cant get a lock on a particular zrep registered filesystem to do a new sync.If you have nested ZFS filesystems and are using zrep to sync them all in a single job, see the section about initialization of nested filesystems, to make sure that you set the required environment variable before doing your "zrep sync".
Forced sync replication
By default, zrep uses the standard "zfs send" and "receive" commands. This actually allows certain actions to take place on the destination side, such as snapshots, that persist even after a supposed sync.If you want to force the other side to look exactly like the sending side, you can add the "-f" option to zrep (which adds the -F option to receive)
"zrep refresh" also accepts -f.Note: -f must be given AFTER the sync directive to zrep.
Resume replication
Some implementations of ZFS support a resume feature to zfs sends.
If a sync job fails halfway through, and you would like to pick up where it left off, rather than start from scratch, you can then add the "-r" option to "zrep send" or "zrep refresh"Replication sets
Sometimes it is desirable to do replication in synchronized sets. For example, if you have some kind of database using multiple filesystems. In this case, it would be desirable to take down or pause the database very briefly to take a snapshot, then resume operations, while doing the data transfer in the background.
While zrep does not explicitly have a notion of sets (other than nested filesystems using zrep -R), you may use this kind of workflow with "snaponly", and "synconly":
# pause database zrep snaponly # resume database zrep synconly
host1# zrep failover pool1/prodfsIn planned failover situations, you probably want to run something from your production side, rather than the standby side, to trigger failover. This is what you want to runThis will reconfigure each side to know that the flow of data should now be host2 -> host1, and flip readonly bits appropriately. Running "zrep -S all" on host1 will then ignore pool1/prodfs Running "zrep -S all" on host2 will then sync pool2/prodfs to pool1/prodfs
If, in contrast, you have already completed an emergency "takeover" from the other side (eg: network was down and so host1 has no idea), you can officially acknowlege the remote side as master, with
host1# zrep failover -L pool1/prodfs
host2# zrep takeover pool2/prodfsThis is basically the same as "planned failover" example, but the required syntax for running on the standby host.For EMERGENCY failover purposes, where the primary host is down, you should instead force takeover by this host, with
host2# zrep takeover -L pool2/prodfs
If you have previously done a clean failover to a secondary server,and you want to make the original server primary again, simply use the failover command on the secondary server. You do not need the rest of this section.If you had to force the secondary to be master (via "takeover") due to primary server being down, then you need to first bring the pair to a partially synced state again, by rolling back any changes since the last sync, on the master. Then you can bring them into synchronized state, and decide whether you want to fail back to the original master or not.
All steps required are shown below:
host1# zrep failover -L pool1/prodfs # NOTE: "local-only" failover mode. # Will ROLL BACK filesystem data on host1, to last synced point # host2 will not be touched. HOST2# zrep sync pool1/prodfs # And now, IF you want to make host1 the master again, either: host1# zrep takeover pool1/prodfs # or host2# zrep failover pool1/prodfsThe above section is also for "split brain" scenarios. Choose which of your two masters you want to be the real master. Treat that as "host2" in the above example.
If you have a need to revert your filesystem to an earlier version (ie: a zrep snapshot) then you can use the following procedure:And now you should be ready to resume normal zrep operations.
- On the master side, or do a "zfs rollback" to a specific snaphot that ALSO SHOWS UP ON THE DESTINATION SIDE.
zfs rollback snap@fs- (on destination system) zfs rollback snap@fs
- zrep setlastsent fs snap@fs
You can find the status of zrep managed filesystems in multiple ways. To simply find a list of them, usehostX# zrep listThis will give you a list of all zrep related filesystems. If you want to see all the internal zrep related zfs properties, add the -v flag.To see the status of all zrep "master" filesystems, use
hostX# zrep statusThis will give a list of zrep managed filesystems the host is "master" for, and the date of the last successfully replicated snapshot.
You can see the status of all zrep filesystems, use -a.If you have a mix of master, and slave, filesystems,you may wish to use the -v flag, which will show both the source and destination, as well as the last synced time. Please note that the order of flags does matter. So,
hostX# zrep status -v -a
Config file
In the top section of the script itself, there are many optional environment variables mentioned. You can set them in your environment. Alternatively, the very latest versions of zrep allow for them to be set in a configuration file, "/etc/default/zrep"SSH option tuning
If you would like to provide custom options to the ssh command zrep invokes... or replace it with an entirely different program or wrapper.. you may set the environment variable $SSH, to whatever you wish.Throughput optimization with mbuffer or bbcp
In addition to ssh tuning, it is sometimes desirable to use intermediate buffer utilities such as mbuffer or bbcp. When the endpoint is more than a few miles away, mbuffer can help with TCP latency behaviour. Alternatively, if you have very large pipes relative to the encryption throughput of a single cpu, bbcp will let you take advantage of multiple cpus.See the comments near the top of the script itself, for the appropriate environment variables to set, to enable one or the other of those utilities.
Compression and encryption
Some people wish to use the ZFS-native encryption or compression, rather than relying solely on ssh. Zrep allows this by use of the ZREP_SEND_FLAGS environment variable. set
ZREP_SEND_FLAGS=-c
or
ZREP_SEND_FLAGS=--raw
as desired (so long as your ZFS actually supports the flags).Archive management
Higher number of archives on remote side
By default, the number of zrep-recognized snapshots will be the same on both sides.
This is controlled by the zrep:savecount variable. You may set different values for each side, by using "zfs set" on the filesystemLong-lived archive snapshots
You CANNOT manually create snapshots on remote side, because then incremental repliation will fail, due to the target "not being the most recent snapshot"Because of thise issues, the best way to keep long-lived archives, may be one of the following methods:
a) create non-zrep-recognized snapshots on local side. Delete locally, but not remotely zfs send -I will copy over even non-recognized "new" snapshots.
b) schedule 'clone' jobs on remote side. pick the most recent zrep snapshot, and create a filesystem clone. This will not dupliacate the files on disk, and also not interfere with ongoing zrep snapshot/replication activity Again, care must be taken not to leave the clones around indefinitely, to the point where they fill available space.
The default usage for zrep is to replicate to a single filesystem. However, some people may wish for multiple destinations. To do this requires essentially running multiple zrep processes per filesystem.
CAUTION ! At present, you must ensure, yourself, that you do not overlap multiple zrep processes, if you are using ZREPTAG functionality. Run them sequentially, not in parallel.
I am planning an update to fix this issue, but at the moment, there are problems with zrep global lock contention that may arise otherwise.Set the environment variable ZREPTAG, to a short, unique identifier for each destination ( I strongly recommend you stick to the format "zrep-somethinghere"). Then init and sync as normal. For example:
export ZREPTAG=zrep-dc zrep init pool/fs dc.your.com remotepool/fs zrep sync pool/fs ... export ZREPTAG=zrep-ca zrep init pool/fs ca.your.com remotepool/fs zrep sync pool/fsAlternatively, you may use "zrep -t tagnamehere normal.zrep.command.here"
This section is for when Things Go Horribly Wrong... or if you just want to understand zrep better.The first rule when debugging is, set "DEBUG=1" in your environment, to make zrep be a little more verbose.
Almost everything zrep does is controlled by ZFS properties.
ZFS allows you to set random properties of your choice, on filesystems.
So for example,zfs set I_am=the_greatest /some/filesystemis perfectly valid usage.By default, zrep prefixes all its values with "zrep:". You can see the values it sets on a filesystem, (while skipping the many standard system-level properties) with "zrep list -v".
Sample output:$ zrep list -v rpool/PLAY/source rpool/PLAY/source: zrep:master yes zrep:src-fs rpool/PLAY/source zrep:dest-fs rpool/PLAY/d1 zrep:src-host (myhostname) zrep:savecount 5 zrep:dest-host localhostBy default, zrep_#####_unsent snapshots will get left around for a while, if a snapshot cant be sent. This is to keep file backup recoverpoints around, for those who like it, even if a remote sync fails. They should get expired at the normal rate. However, if you are running zrep in short intervals, this is not neccessary, and you can set ZREP_RENAME_UNSENT to no, in your environment or the config file.
See also, the Troubleshooting page
Some folks may be looking just for a way to simplify their backup mechanisms, rather than doing fancy failover.
In this scenario, you probably will prefer one centralized backup server that has trusted root ssh OUT, but does not allow root ssh IN. (This way, it is possible to have independant clients, in separate zones of trust or system administration from each other).
For this goal, both initial setup, and replication, is reversed from the normal way. Normally, zrep expects the data master to have full ssh privilege into the data replication target. However, when a backup server is the target, sysadmins usually prefer the target system to have the privileges instead of the client.
To implement a backup server type layout with zrep, the simplest way is when you are starting from scratch, with a brand new filesystem.
However, it is also possible to retroactively convert an existing production ZFS filesystem, to be backed up by zrep to a backup server.
Backup server with a clean new filesystem
The steps would look like this, as run on the backup server:
At this point you have now set up initial state cleanly, and failed over so that the "client" side is the active master(ie: the read/write one) for the pair. The backup server side will be in read-only mode, ready to receive syncs.
- Set up the filesystem initially on the backup server
- Do "zrep init {localfs} {clienthost} {clientfs}"
- "zrep failover {localfs}" #to make the client the data master.
If you already have a client zfs filesystem with a bunch of data, but you dont want to give it full access on the backup server side, it is technically possible to do a manual initialization to the backup server. A smart sysadmin should be able to do a manual zfs replication, and then setting the zfs properties themselves.To then trigger incremental backups from the backup server side, you have to use a special zrep command: "zrep refresh"
Alternatively, using the new "zrep changeconfig -f" syntax, and finally, converting an existing snapshot to a zrep active one as detailed in the Troubleshooting pageYou must then set up a special zrep job on the backup server, instead of the usual procedure of pushing it from the master side. It will look just like a "zrep sync" job, but instead, you will want to call
zrep refresh pool/fsInstead of the master side "pushing" new data, you will now be pulling the latest bits to your read-only backup server copy of the filesystem.
Backup server with an existing ZFS filesystem
This presumes you have ssh trust from backup server to the client. It also presumes you have installed zrep on both sides, somewhere in the standard $PATHHostnames are "client1" and "backupsrv".
Data filesystem is "data/fs".
Backup pool base is "backup/client1"
FYI, these instructions basically walk you through what a "zrep init" does. So if some time in the future the init process changes, these docs will need updating as well.
Create a snapshot of client fs client1# zfs snap data/fs@zrep_000001 set zrep base properties client1# zrep changeconfig -f data/fs backupsrv backup/client1/fs Replicate to backup server and set properties backupsrv# ssh client1 zfs send data/fs@zrep_000001 | zfs recv -F backup/client1/fs backupsrv# zrep changeconfig -f -d backup/client1/fs client1 data/fs let zrep know valid sync has happened # this will also set "master" flag on client1 client1# zrep sentsync -L data/fs@zrep_000001 It should now be possible to run at your preferred frequency
backupsrv# zrep refresh backup/client1/fsYou may also wish to do "zfs set readonly=on backup/client1/fs"
Disaster recovery of client from backup server mode
Note: This section is specific to running zrep as part of a "backup server" type situation. For more normal zrep usage, see higher up in this page.If you get into the situation where you have lost a client filesystem (or worse yet, the entire remote server), the good news is that it should be relatively straightforward to reinitialize, using the following steps:
You are effectively now back at step 2 of the initial backup server setup. Finish that setup proceedure.
- Get the filesystem on the backup server, to be exactly how you need it to be.
- If there is some reason you wish to preserve some point-in-time snapsnots, make your own by hand that dont start with "zrep"
- Clear all zfs snapshots and information, by using "zrep clear pool/fs". This should just remove metadata, but leave the filesystem intact.
- Do a full resync to the remote client, with
"zrep init pool/fs client remotepool/fs" on the backup server.
That should be all that is required to get you up and running again!
Some people prefer to run zrep as a non-privileged user. If your implementation of ZFS supports the "allow" syntax, you may be able to do this, if you give the designated user the following privileges:zfs allow zrepuser \ create,destroy,hold,mount,readonly,receive,rename,rollback,send,snapshot,userprop \ your/fs/here
Author's note: I am pleased to know that people have been running my zrep script since 2012, in production, across the world! It is a great motivation to know that people are finding my efforts useful to them.
That being said, another great motivation is things like an amazon gift
card(from main US amazon.com site only!), or something from my wishlist.
:-D No gift too small, it's the thought that counts ;-)
My
amazon wishlist