Wombat Documentation V1.0

Server Setup

The server, where the wombat.pl process will run, has several modules dependencies. These should be available through CPAN or included in modern Perl distributions. I’ve listed the ones here that were not included by default with Perl at about version 5.8.1. If you’re not running at least Perl 5.8.0, then there isn’t much point in running wombat because the threading will not work. Note also that Perl needs to be built with threading enabled!

Thread::Conveyor
Provides a thread safe queue for data structures.
File::Rsync
A convenient wrapper for making rsync calls.

You can easily test for the presence of these modules with something like:

perl -e 'use Thread::Conveyor'

If you’re missing a module, you can install it “magically” with CPAN:

perl -MCPAN -e 'install Thread::Conveyor'

The server will also need to have rsync, obviously. Because wombat makes extensive use of the rsync option “–link-dest”, you’ll need to have rsync 2.5.6 or greater.

Actually “installing” the wombat binary is as easy as creating a directory for the script and adding a cron entry. Wombat currently expects to find its configuration file in the same directory as the binary, so I recommend installing wombat in something like /usr/local/wombat. The configuration file controls the time and days that images are created, so you’ll want to run the actual binary every hour, or at least as often as your smallest time interval. This can be done easily with a root cron entry like:

0 * * * * /usr/local/wombat/wombat.pl >> /var/log/wombat.log

This will run wombat every hour on the hour. Note that wombat currently outputs all of its status messages to STDOUT, so I’ve included a redirection into a logfile. You could leave off the redirection which would cause cron to send the output via email. Support for specifying a logfile in the configuration file is on my todo list.

Also note that the easiest way to run wombat is as root, but this is by no means the best way from a security point of view. See the section “Running as non-root” below for an idea of how to achieve this.

The last thing that needs to be configured on the server is the space(s) to hold the images. This can be any local path on the server and will be specified in the configuration file. Generally speaking, all you need is one big filesystem. However, one of the features of wombat is that it supports multiple write destinations or “writeto”s. Each writeto should be on a separate physical device, the idea being that if your image repository should fail, you will only lose 1/nth of your images where n is the number of writetos specified. Wombat will automatically alternate between the writeto spaces, ensuring that the most recent images are never on the same device. However, keep in mind that increasing the number of writetos increases the total disk space required since each writeto must be able to hold at least one full image to link against.

Running as Non-Root

Allowing remote ssh sessions as the root user is not recommended, especially if the machine is outside of the local network, say across the internet. It’s also not necessary for wombat to function. Here are the steps to configure wombat to connect to a remote machine as a non-root user:

Create a non-root user and setup ssh keys

The exact mecanism for adding a user will depend on your operating system choice, but is generally as easy as useradd -m wombat. Once the user is added, you can create the ssh keys and copy the id_rsa.pub into authorized_hosts (See Client Setup). However, when you create the entry in authorized_hosts, you will want to add a few options:

command="ro-rsync",from="wombatserver",no-port-forwarding,no-X11-forwarding,no-pty ssh-rsa.....

The most important option is the “command” option. This restricts incoming ssh sessions to allow only the command “ro-rsync” to be executed. We are going to create this command in the next step.

Create ro-rsync script

The idea is to crete a script that will examine the incoming rsync command and deny any attempts to run something other than rsync or to write files to the machine being backed up. Thus the name “ro-rsync” for read only rsync. To do this, we create a script like the following:

#!/usr/bin/perl

my $cmd = $ENV{SSH_ORIGINAL_COMMAND};

if ( $cmd !~ /^rsync --server --sender /) {
  die "Invalid command ($cmd) : ro-rsync restricted\n"
}

exec "sudo $cmd" or die "exec(sudo $cmd) failed: $? $!";

Create this script as /usr/bin/ro-rsync and “chmod +x ro-rsync” to ensure it is exectuable.

Install and configure “sudo”
You will need to install the sudo package on the remote machine. This is because the wombat user will not have the necessary permissions to backup files that are not world readable. Using sudo will allow root level elevation to the wombat user specific to the task of running rsync. The relavent sections in /etc/sudoers file might look something like this:

Cmnd_Alias RSYNC = /usr/bin/rsync
wombat  ALL= NOPASSWD: RSYNC
Setup the remote user in wombat.conf
Simply set rcmd_user wombat or whatever the name of the remote
user you created is.

That should be it! If it doesn’t work, try running a test rsync from the command line:

rsync --archive --rsh='ssh -l wombat' remotehost:/path/to/file /tmp/

If this doesn’t work, try removing the

command="ro-rsync"

option from authorized_keys and see if you can ssh in without a password as the wombat user:

ssh wombat@remotehost

If this doesn’t work, don’t forget to check that authorized_keys is 0600 and the .ssh directory is 0700.

The above steps will improve security of the remote host since it helps to restrict the access to the specific task of backing up files. You might be tempted to also have wombat run as a non-root user on the wombatserver itself. This will fail because rsync needs root permissions to create the files with different UIDs. In order for this to work, the wombat.pl script itself would need to be re-written to call all rsync, directory management, pruning, DBM files, etc functions with sudo. It’s certainly possible, but not currently worth the effort for the minimal improvement in security. If your wombat server isn’t secure, you have more serious problems since it has all of your most important files on it!

Client Setup

Client configuration consists of making sure the wombat server has remote access, that the client has rsync installed, and then adding the filesystems to be backed up to wombat.conf. For connecting, wombat currently supports ssh, rsh, rsyncd, smbmount (for Windows), and anything else that “rsync -e” supports for establishing the rsync connection to a client. While using ssh for the rsync connection is not required, it is highly recommended as it is both easy and secure.

In order to use ssh, you will want to configure a public/private key with an empty passphrase. To do this, create a key pair on the wombat server as follows:

ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
...

Note: you could just as easily choose a DSA key with -t dsa in which case the files will be called id_dsa and everything else works the same. You can also increase the keylength from the default of 1024 bits with the -b option if you are truly paranoid.

The id_rsa file is the private key and id_rsa.pub is the public key. All that needs to be done is to append the id_rsa.pub file to /root/.ssh/authorized_keys on any client that the wombat server needs to connect to with ssh. Once you copy the file, you should be able to ssh into the client without entering in a password. If it doesn’t work, you might check /etc/ssh/sshd_config on the client and make certain that RSAAuthentication is set to yes.

If it still doesn’t work, check that the authorized_keys file and the id_rsa* files are all chmod 0600. If that still doesn’t work, start consulting logfiles.

You can tighten the security up a bit more by specifying that the key is only good from the wombat server by pre-pending a from directive to the authorized_keys entry on the client like so:

from="wombatserver" ssh-rsa AAAAB3NzaC1yc2EA...

If you would rather use rsh, then it is simply a manner of making sure that the rsh server is installed and enabled (have a look at /etc/xinetd.d/rsh on RedHat/Fedora and probably other distros or /etc/inetd.conf if you’re running something more in line with old *nix) and then creating a /root/.rhosts file like so:

echo "wombatserver root" >> /root/.rhosts
chmod 0600 /root/.rhosts

Of course there are a million other ways you could do this as well, for example /etc/hosts.equiv if you’re platform supports that. You can even tighten up rsh by disabling rlogin (so no interactive connections) and then using /etc/hosts.allow (i.e. tcp wrappers) to restrict what machines can access your rsh service:


[root@client]# cat /etc/hosts.allow
in.rshd: wombatserver

[root@client]# cat /etc/hosts.deny 
in.rshd: ALL

There may be situations where using ssh or rsh are not desirable simply because allowing them is granting more permissions than is necessary for making a backup. Specifically, if the wombatserver can connect with ssh/rsh as root, then all of the clients are only as secure as the wombat server. One solution to this is to use rsyncd instead. Using rsyncd allows the individual clients to configure which filesystems the wombat server can access. Moreover, the access can be granted as read-only. The disadvantage is that if changes need to be made in the filesystems that are being backed up, they now have to be made at both the client and the server.

Setting up rsyncd is a simple matter of creating the rsyncd configuration file, typically /etc/rsyncd.conf, and then starting the rsync daemon. A simple rsyncd.conf file might look something like:

hosts allow wombatserver
read only = true

[usrdata]
        path=/usr/data

The target can now be specified in wombat.conf as client::usrdata where usrdata is the module name. On Fedora/RedHat there is typically already a /etc/xinetd.d/rsync file that can used to enable the rsync daemon by setting disable=no or by simply running /sbin/chkconfig rsync on. Otherwise you will need to add a startup script to run rsync with the --daemon option.

Windows Clients

So, what about those pesky Windows clients? The easiest way to use wombat to backup a Windows client is to install Cygwin. Not only is this probably the best way, but it is really darn useful for a lot of other stuff as well. If you install Cygwin, be sure to install OpenSSH and the rsync packages.

You certainly don’t have to use Cygwin to get a working sshd on Windows, but the Cygwin sshd works the best of the free ones that I’ve tried. A couple of tips about Cygwin: once it is installed, you can configure sshd and install it as a service by running the command:

ssh-host-config

It will ask a bunch of questions, most of which you can probably accept the default. However, be certain to answer yes when prompted about installing sshd as a service. Once sshd is installed, edit /etc/passwd and change the username “Administrator” to “root” and the home path “/home/Administrator” to “/home/root”. This will map the remote username “root” to the local Windows username “Administrator”. Then make root’s ssh configuration directory with mkdir -p /home/root/.ssh. Now you should be able to copy the same public key you use for *nix client to /home/root/.ssh/authorized_keys in Cygwin and be able to ssh into your windows box as root using public-key authentication. Slick!

Now the bad news. Using rsync over ssh with Cygwin just doesn’t seem to work. Try entering this into google and you’ll see lots of chatter about this problem. Some people claim that it works for them, or for certain versions of rsync, etc, but I’ve never had it work reliably. Generally, the rsync will start up and after transferring a few kilobytes it will hang indefinitely. Don’t shut off that sshd though! Once you get over that weird feeling you might get when you ssh into your Windows machine, you’ll really come to rely on it. Go ahead, check out /proc/cpuinfo, or /proc/meminfo, or try running vmstat. Yeah, it almost makes Windows cool.

Since ssh doesn’t seem to play well with rsync, we could try rsh. I never tried, but I think it should be possible. I’ve been using rsyncd with Windows and I have not had any problems, even over a slow VPN connection. You can install rsync as a daemon in Windows from the Cygwin prompt by running the command:

cygrunsrv --install "rsyncd" --path /usr/bin/rsync --args "--daemon --no-detach" 
  --desc "Starts a rsync daemon for accepting incoming rsync connections" 
  --disp "Rsync Daemon" --type auto

Now a simple “net start rsyncd” should start up rsync as a daemon! All that’s left to do is create a /etc/rsyncd.conf file – the format is the same as for Linux.

The last method that wombat supports for backing up Windows shares is SMB. To enable this, use the smb option in wombat.conf like so:

smb(username,password)

Wombat will then use smbmount to mount the windows share and then rsync the contents as if the filesystem were local. While this does work, it is horribly slow. The reason is that rsync must parse the file on the client to determine what has changed. Since the remote file system is being mounted locally, the entire content of every file has to be transferred to determine if it has changed. Worse, if there are any changes, the changes have to be copied over again. If you are on a slow connection this will be abysmal. However, if you are on a fast connection and ssh/rsh/rsyncd options are just not going to work for whatever reason, then this should do the trick.

Wombat.conf Formatting

Before we get into the specifics of the configuration file, we need to discuss formatting. Commands are terminated with a newline unless the newline occurs inside a block or a list. A block is created using curly braces {} and a list by using regular parens (). The elements of a list can be separated with commas, whitespace, or newlines. A comma can be assigned in a list by quoting the element, and likewise a parenthesis can be assigned in a list by escaping it. Multi dimensional lists are not supported.

Variables are separated from their assigned values by whitespace; quotes can be used if the assignment itself contains spaces. Variables can be assigned integers, strings, and intervals. The interval type is very similar to the intervals that crontab understands:

n-m
A simple range element – assigns all integers between n and m. If the upper or lower bounds are not specified, than they are open ended, so 2- means every integer greater than or equal to 2.
n,m
A simple list – assigns the integers n and m. List can also be comprised of other interval elements.
*/k
All possible values incremented by k. This assignment typically only makes sense when the variable has an intrinsic range, like the hours in a day. So for example, */2 would mean every even hour of the day, 0,2,4,.. etc
n-m/k
A stepped range – every integer between n and m in increments of k.
!
The negation operator – take the complement of any element or set of elements.

Interval elements can be combined into fairly complex lists for example:

1,5,!10-14,*/7,24- == 1,5,7,21,24,25,26,27...

Finally, all assignments are scoped inside blocks and blocks inherit. That is, if you assign a variable inside a block, then it will return to it’s original value once the block exits. So:

var 4		--> var=4 at this point
{
  do some stuff --> var=4 at this point
  var 6		
  do some stuff --> var=6 at this point
}
		--> var=4 at this point

Wombat.conf Settings

Here is a rundown of the supported commands and variables:

workers n
Starts n workers. The workers are responsible for connecting to the clients and creating the rsync images. If you have a lot of hosts to backup and sufficient network/disk bandwidth, you will want to increase this value. The default is 1 worker.
pruners n
Starts n pruners. The pruners are responsible for cleaning up old images. The default number of pruners is 1 which is likely to be sufficient unless you need to do a lot of pruning.
maxconnect n
This parameter controls the maximum number of workers that are allowed to connect simultaneously to the same host. For example, if you happen to have a client with 10 filesystems that need to be backed up and you are running 10 workers, it is entirely possible that all 10 workers might connect at the same time. That would be very bad. The maxconnect option will keep that from happening. The default value is 2.
rcmd
This sets the command that wombat will use to connect to the client and start the rsync. The default value is “ssh”. The value assigned here is basically passed through to rsync as the “–rsh” option, so anything that would work there should work here.
rcmd_user
Setting rcmd_user has the effect of calling rsync with --rsh="ssh -l rcmd_user" if rcmd=ssh effectively connecting to the target as user rcmd_user. Taking advantage of this feature will require carefully following the steps in the Server configuraiton on how to setup ro-rsync and sudo.
runhours interval
Sets the hours that an image should be created. This won’t necessarily be the time that an image actually gets created, but rather when it is due. The default value is “1-23” or all hours of the day.
rundays interval
Sets the days of the week than an image should be created. The default value is 0-6, or Sunday through Saturday. Note that 0=7=Sunday.
retry n
If a scheduled image fails, it will be retried every subsequent hour, irrespective of runhours/rundays, a maximum of n times. The default is 0.
writeto ( /path/1 /path/2, … )
The writeto command defines a list of paths to which the backup images will be written. If more than one path is specified, subsequent images will be written to each path in the order listed. The idea is to use multiple write paths that write to separate physical devices, so that if a device fails not all the backup images will be lost.
alias _aliasname string
alias _aliasname { block }
alias _aliasname ( list )
The alias command provides a way to utilize replaceable parameters in the config file. Note that all alias names must start with an underscore character “_”. The first form replaces all future occurrences of _aliasname with the string specified. This could be used to replace a complicated interval definition with an easy to read name. For example: alias _myhours 8-17,19,21,23. Now _myhours could be used in place of the interval statement in the configuration file. The second form assigns a block of configuration settings to _aliasname. This can be used to create groups of default settings, for example:

alias _business_time {
  runhours 8-18
  rundays 1-5
  retry 8
}

The last form assigns a list to _aliasname. This might be a convenient way to configure the smb authentication. For example:

alias _smbauth ( username,password)
.. do stuff ..
smb _smbauth

Notice that since _smbauth is a list, we don’t have to include the parens on the smb call. Consider the equivalent:

alias _smbauth username,password
.. do stuff ..
smb ( _smbauth )

Now since we defined _smbauth as a string, we have to put the parens on the smb call so that it will treat _smbauth as a list.

preserve (preserve_statement1, preserve_statement2… )
The preserve command defines the rules that the pruner will use to decide which images to keep and how long to keep them. If you do not configure a preserve list, then all images will be preserved. If you do configure a preserve list, then any image not specifically marked for preservation will be removed.

The individual preserve statements consist of three parts. The first part is a series of time elements which are used to select which images are governed by the preserve statement. The time elements are matched directly against the scheduled time of each image and are defined using time intervals and the keywords hour, wday, yday, mday and mon. For example, “hour 12,18” would match any image whose scheduled time was at 12:00 or 18:00 and “mon 3” would match any image from March. You can use as many of these elements as is necessary to select the desired images.

The second component is the length of time to keep the images. This component of the statement is demarcated by the for operator and the time is computed using an integer and one of the time strings: hours, days, weeks, months, and years. For example, “for 3 days” would preserve the selected images for a time period of 3 days. Only one “for” definition is allowed in a preserve statement. If a “for” definition is not present, then all images governed by the statement will be preserved.

The third component is the atleast operator. This component specifies a minimum number of matching images to preserve even if they exceed the time defined by the for operator. This is a good setting to use to ensure that you do not lose all of your images in the event that a client becomes unavailable for a long period of time.

Here is an example of a preserve statement, lifted from the sample config file:

  preserve (
    "for 8 hours atleast 4"
    "hour 12,18 for 2 weeks"
    "hour 18 wday 5 for 1 month"
    "hour 18 mday 22-31 wday 5 for 6 months"
  )

The first preserve statement “for 8 hours atleast 4” tells the pruner that images older than 8 hours are safe to remove provided that there are atleast four available in that 8 hour period. Notice that this is an example of a preserve statement with an empty first component which therefore matches all images.

The second statement “hour 12,18 for 2 weeks” tells the pruner to keep any image scheduled for 12:00 or 18:00 for a period of 2 weeks. Similarly, the third statement preserves any image scheduled for 18:00 on a Friday (wday=5) for 1 month.

Finally, the last statement will preserve any image scheduled for 18:00 on the last Friday of the month for 6 months. Actually, it will also preserve the image from the second to last Friday of any month other than February. Note that this works because there must be at least one Friday between the 22 and 31 of any month, even February.

sync ( host:target host:target … ) { settings }
The sync command specifies which hosts and filesystems will be backed up and allows for specifying additional settings specific to these hosts and filesystems. The host:target specification has three forms:

hostname:/path/on/remote/host
In this form, the remote filesystem “/path/on/remote/host” on the host “hostname” will be backed up using rsync over the command channel specified by rcmd (either ssh or rsh). Note that the presence or lack of a trailing slash on the path element will not change the behavior of the rsync command.
hostname::module_name
In this form, rsync will try to connect to rsyncd on the remote host and backup the path that is specified in the remote /etc/rsyncd.conf for module_name.
hostname:share_name/path/on/remote/host
In this form, when combined with the smb command, wombat will use smbmount to connect to share_name on the remote host and then rsync the path specified as a subdirectory of the share. This should be used only as a last measure to backup a Windows machine as it is very slow. See the Windows Clients section for more details.

Any settings specified in the {} block will take precedence over the current settings. This allows for creating global settings and then over-riding them for a specific image set. The only thing not allowed in the block section of a sync command is an additional sync command. You may also use an alias inside of the block to load a group of settings at once. For example:

alias _root_dynamic {
  runhours 4
  rundays 0-6
  retry 20
  preserve (
    "for 1 week atleast 7"
    "wday 0,4 for 3 months atleast 20"
  )
}

sync ( fileserver:/var ) {
  writeto ( /dumps/0 /dumps/1 )
  _root_dynamic
}

In this example, the alias _root_dynamic will be expanded inside the block for the sync of fileserver:/var, and all of the settings will be incorporated.

smb ( username, password )
This command tells wombat to use smbmount to mount a remote
Windows share and rsync via the locally mounted path. The username and password should match the Windows credentials needed for access to the share. Note that when using this command you must also use the smb form of the sync target specification: hostname:share_name/path/to/files.
Share and Enjoy:
  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Yahoo! Buzz
  • Twitter
  • Google Bookmarks

Comments

Wombat Documentation V1.0 — 1 Comment

  1. It might be useful telling users of RHEL Fedora CentOS etc. that when editing sudoers file, check for expression requiretty and to comment that one out: #Default requiretty
    Othervise non root users won’t be able to run rsync etc.

    Regards

    Riiiik

Leave a Reply

Your email address will not be published. Required fields are marked *

Notify me of followup comments via e-mail. You can also subscribe without commenting.