
      /\
     /  \            (C) Copyright 2006 Parliament Hill Computers Ltd.
     \  /            All rights reserved.
      \/
       .             Author: Alain Williams, <addw@phcomp.co.uk> April 2006
       .
        .            SCCS: @(#)Documentation	1.9 07/05/09 18:02:37
          .
 



Overview
********

RsyncBackup is a script that backs machines up over a network using rsync.
It maintains one directory tree per backed up machine per day - this makes it
easy to find/restore a consistent set of files from a particular day.

By use of hard links files that have not changed are shared between different
daily backup trees. This saves on disk space on the archive machine and also saves
on the time taken to backup. This works since, typically, most files do not change
from day to day.

The idea is that one central archive server initiates backups on several other machines.


Configuration on the machine being backed up
********************************************

Rsync needs to allow the archive machine read access to the directories that need
to be backed up.

Sample: /etc/rsyncd.conf
	use chroot = no

	# Something so that the backup server can read the entire machine:
	[backup]
	        comment = Backup export
        	path = /
	        uid = 0
        	hosts allow = archive.example.co.uk

Don't forget that you ought to keep a copy of configuration files on the archive machine.
So arrange to run RsyncBackup on some other machine and have it backup your arhive machine.
These config files will probably not use a lot of disk space.


Running the backup
******************

This will typically be done via root's cron on the archive machine, here is an example:

30      0       *       *       *       /usr/local/bin/RsyncBackup -c -1 -t 10 -r server -d 'etc home usr/local root'

The remote machine (the one being backed up) is 'server'
The directories backed up are: 'etc home usr/local root'


Archive Created
***************

The archive directory (on the archive machine) for a machine 'server' will be: /arch/server
That directory will contain directories with names being the date of back up, eg: 20070812
There will be one such directory for every date on which a backup was done.

The sumbolic link LATEST will be to the directory of the last successful backup.

Below this level will be directories just like on the machine that was backed up.


Samba share definition
**********************

This is to allow people to recover files from backup using Microsoft shares.
Beware: the definitions below allow anyone to recover any file; this is not what
you would want if the files being archived are confidential.

[archive]
   comment = Archive Directory for archive
   path = /arch/archive
   browseable = yes
   writable = no
   public = yes

[server]
   comment = Archive Directory for server
   path = /arch/server
   browseable = yes
   writable = no
   public = yes

[logs]
   comment = Archive Logs
   path = /var/log/backups
   browseable = yes
   writable = no
   public = yes


Option Explanation
******************


  Archive cleaning options
    You will eventually fill the the space available for archive. So you need to clean out
    archives that are too old to be wanted. It is, however, nice to keep the occasional snapshot
    for a long time.

    -c
    	Clean out (remove) old backups that are more than 14 days old.
    	This might not save as much space as you might think. If few files change from day to day
	then most of what you save are directories.

    -C days
    	Set the number of days older than which backups will be Cleaned to days (default 14).
	This works by looking at the archive directory name (eg 20070825) and deciding how old
	it is.

    -1
    	Don't clean a backup made on 1st of the month, eg leave 20070801
	This can be useful to keep the occasional snapshot.

    -y yrs
    	Clean out 1st of the month backups that are more than yrs years old.
	A year is deemed 366 days long - to be on the safe side.

  What to back up options

    -r name
    	Backup the remote machine machine 'name'. Name is a DNS name, you could probably also use an IP address.
	This acts as the default name for local storgage of the archive (see -R).

    -s src
    	The name of the rsync source (module name) on the machine that is being backed up. This module
	name will be specified in /etc/rsyncd.conf on the machine being backed up. The default name
	is 'backup'.
	This module should permit at least read access to the files that are to be backed up.
	The phrase 'module name' is an rsync concept.

    -d dirs
    	List of directories to backup is 'dirs'. These should be specified without the leading '/',
	all the directories will be with respect to '/'.
	These directories will be copied into directories under the date directory (eg 20070825/home).

    -D dirs
    	List of snapshot directories to backup is 'dirs'. These should be specified without the leading '/',
	all the directories will be with respect to '/'.
	These directories will be copied into directories under the snapshot directory (eg SNAPSHOT/home).
	The purpose of this is to have one copy, ie do not store by distinct date of backup. You
	probably want to use the '-d' option in preference to '-D'.

    -F
    	Do not cross mount points onto other File systems. This is especially useful with directories
	that contain chrooted environments as this will contain a mount to /proc. Eg var/named.


  Where to store the back up
    Ie where things go on the machine that it performing the back up.

    -l dir
    	Where backups are stored on the local machine. The default is '/arch'. The backup for
	the machine 'server' will be kept under '/arch/server'.
	This directory must exist before the program is run.

    -R name
	Name to use for local storage of archive (default is value is -r). This is used
	for the directory names under '/arch' and under '/var/log/backups'.
	You generally do not need to use this option, the remote name (-r) is probably
	good enough; however if the remote name is very log or an IP address you may want
	to specify another name for local use.


    -S file
	Touch a file with the start time of a successful backup. The point is that
	files may change or be created while a backup is taking place.

	Such a file with the name RsyncBackupSuccessTime will be used by the WriteRsyncBackupToTape
	program.

	If the name 'file' is not absolute (does not start with a '/') it will be created
	in the top level archive directory (eg /arch/server). If the name starts
	with a '_' it will be created in the directory for a particular day
	(eg /arch/server/20070812).


  Miscellaneous options

    -b kbps
	Set average bandwidth to kbsp (Kilobytes per second). Default no limit.
	You probably only need this if backing up over the Internet.

    -o
    	Where does logging output go to ?
	By default output will be sent to /var/log/backups/RemoteMachineName/YYYYMMDD-HHMM.
	If the program is being run interactively (stderr is a terminal) output will only go
	to the terminal. If the '-o' option is given and the program is interactive then
	output will output will also go to the log file.
	If the log directory does not exist it will be created.

    -M addr
	Mail a success/fail message and the log file to email address addr.
	This program is probably being run from cron and any output will
	be emailed by cron - iff the -o option is given.
	Alternatively give this option and specify where output is sent.

    -f
    	mail only on Failure to the address given by -M.
	The point is that it can be useful to only send mail on a failure,
	if mail is always sent people will often ignore it and thus not notice
	when a failure happens.
	This does nothing unless -M is also specified.

    -t n
	compare the Total number of files and the disk space used to be within n% of yesterday.
	10 may a good value for n. A large change of file numbers can either indicate a problem
	with usage of the system (why have many files been created or removed) or a problem with
	the backup - files not being backed up.

	You may find that the disk used for the same files differs between two machines.
	This may not be a problem as different file systems on different machines may have different
	block sizes and the such.
	The disk used is reported in units of a Kilobyte.

    -x
	eXplain - provide a brief help message.

    -Z
	Don't compress rsync data transfer. Use this if you are backing up over a fast local network
	and/or the machine CPUs are not that fast.


Installation
************

You will probably designate one machine as the archive server, ie the machine
that stores the backups of all the other machines. You should probably store
the backups on a nice big disk partition all of it's own - using a mirrored
disk would be a good idea.

1) Install rsync on all the machines
   Install the RsyncBackup into /usr/local/bin and make it executable.
   NOTE that the script uses the Korn Shell (ksh), you may need to install this
   on your machine.

2) Create a directory (default /arch) on the archive server.

3) Create a /etc/rsyncd.conf on each machine to be backed up.

4) Test that rsync works. The following command will do for the 'server' to be backed up:

	rsync rsync://server/backup/

   This should give you a directory listing of the module name 'backup' on the
   machine 'server'.

   If it doesn't work, check:
   * /var/log/messages on the server (and similar)
   * the permissions in /etc/rsyncd.conf
   * What does the server think the archive machine is called when a network
     connection is made -- ie is your reverse DNS correct ?
   * Is rsync running on the server ?

5) Create a directory to store each machine's backup, eg /arch/server

6) Run RsyncBackup on one smallish directory, eg /etc:

	/usr/local/bin/RsyncBackup -r server -d etc

  Check that this works, check the files in /arch/server/LATEST/etc, check the
  log file in /var/log/backups/server/

7) Install an appropriate entry into root's crontab.
   If you are backing many machines up, stagger the start times.
   If you don't run this as root then you will not be able to preserve
   file ownership, this will make it more difficult to restore a
   remote machine properly.

8) Note that the first backup will take a long time since files need to
   be copied over the network. Once this has been done it will be much quicker.
   You should avoid installing the crontab entry until one successful copy
   has been made - ie do the first one by hand or as an 'at' job.

9) Don't forget to backup the config files on the server itself, probably you will
   want to do this both on the archive machine itself (ie get it to back itself up)
   and back the archive server up onto some other machine.

10) Send the author an email saying that you are using the program along with any
   suggestions, bug fixes or beer tokens.

You will also want to arrange to take backups off site occasionally. How often is
up to you - the less frequent then the more that you loose if your entire building
is destroyed.

It is a good idea to keep your archive machine as far away as possible from the machines
that it is backing up - preferably in another building. Be careful about physical security,
if someone steals your archive machine then they have a copy of all your company secrets.


Restoring files
***************

Most of the time you are going to be restoring a directory where files have been damaged or lost.
When doing this it is best to restore to a different directory so that the user
can merge any changes that have been made since the last backup was done.

Alternatively move what remains of the directory and restore everything else; this is what is
illustrated below.

	cd /home/john/bigproject/
	mkdir OLD_FILES
	mv * OLD_FILES

You may wish to check that there are not any interesting 'hidden' files (names starting '.').

	ls -la

To get the files back, rsync is probably easiest. If the archive server has a suitable
/etc/rsyncd.conf (ie like the one on the machine backed up) you could then type:

	rsync  -aH --numeric-ids --stats rsync://archive.example.co.uk/arch/server/LATEST/home/john/bigproject/ .

That would restore the directory to the time of the last backup. Then leave the user (John) to
merge files from OLD_FILES.

If the archive server does not have a rsync configured as above, then you could create
another entry in server's /etc/rsyncd.conf, something like:

	# Allow files to be put here by the archive server:
	[restore]
		comment = Backup restore tmp
		path = /tmp
		uid = 0
		read only = false
		hosts allow = archive.example.co.uk

On the file server you type:

	mkdir /tmp/REST

Then on the archive machine you type:

	cd /arch/server/LATEST/home/john/bigproject/
	rsync -aH --numeric-ids . rsync://server/restore/REST/

and back on file server you type:

	cd /home/john/bigproject/
	cp -a /tmp/REST/. .
	rm -rf /tmp/REST
