From fb9943e7fbf1e8ffc83f0ea135ac9f75f44c2b97 Mon Sep 17 00:00:00 2001 From: Joerg Jaspert Date: Wed, 5 Nov 2008 23:49:58 +0100 Subject: readme add pretty large readme Signed-off-by: Joerg Jaspert --- README | 216 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 202 insertions(+), 14 deletions(-) (limited to 'README') diff --git a/README b/README index 0adab69..9687b85 100644 --- a/README +++ b/README @@ -1,23 +1,211 @@ Archvsync ========= -This is the centralized archvsync configuration. It is a simple git -repository that is currently hosted on ftp-master.debian.org AKA ries. +This is the central repository for the Debian mirror scripts. The scripts +in this repository are written for the purposes of maintaining a Debian +archive mirror (and shortly, a Debian bug mirror), but they should be +easily generalizable. -Use the following conventions in here: +Currently the following scripts are available: - - As much as possible should be in the master branch (and as such - affect every machine). + * ftpsync - Used to sync an archive using rsync + * runmirrors - Used to notify leaf nodes of available updates + * dircombine - Internal script to manage the mirror user's $HOME + on debian.org machines + * typicalsync - Generates a typical Debian mirror + * udh - We are lazy, just a shorthand to avoid typing the + commands, ignore... :) - - Use machine specific branches for those things that *have* to be - machine specific. Examples for such cases are the - .ssh/authorized_keys files as well as the actual list of hosts to - push to / mirror from. +Usage +===== +For impatient people, short usage instruction: + - Create a dedicated user for the whole mirror. + - Create a seperate directory for the mirror, writeable by the new user. + - Place the ftpsync script in the mirror user's $HOME/ (or $HOME/bin) + - Place the ftpsync.conf.sample into $HOME/etc as ftpsync.conf and edit + it to suit your system. You should at the very least change the TO= + line. + - Setup .ssh/authorized_keys for the mirror user and place the public key of + your upstream mirror into it. Preface it with +no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty,command="~/bin/ftpsync",from="IPADDRESS" + and replace $IPADDRESS with that of your upstream mirror. + - You are finished -!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! -!!! NOTE: Do NOT store ssh private keys or any kind of passwords in !!! -!!! here. The existance of etc/secrets is just for the existance of the !!! -!!! directory. Nothing more. !!! -!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! +In order to receive different pushes or syncs from different archives, +name the config file ftpsync-$ARCHIVE.conf and call the ftpsync script +with the commandline "sync:archive:$ARCHIVE". Replace $ARCHIVE with a +sensible value. + + + +Debian mirror script minimum requirements +========================================= +As always, you may use whatever scripts you want for your Debian mirror. +However, if you want to be listed as a Primary mirror it must support +the following minimal functionality: + + - Must perform a 2-stage sync + The archive mirroring must be done in 2 stages. The first rsync run + must ignore the index files. The correct exclude options for the + first rsync run are: + --exclude Packages* --exclude Sources* --exclude Release* --exclude ls-lR* + The first stage must not delete any files. + + The second stage should then transfer the above excluded files and + delete files that no longer belong on the mirror. + + Rationale: If archive mirroring is done in a single stage, there will be + periods of time during which the index files will reference files not + yet mirrored. + + - Must not ignore pushes whil(e|st) running. + If a push is received during a run of the mirror sync, it MUST NOT + be ignored. The whole synchronization process must be rerun. + + Rationale: Most implementations of Debian mirror scripts will leave the + mirror in an inconsistent state in the event of a second push being + received while the first sync is still running. It is likely that in + the near future, the frequency of pushes will increase. + + - Should understand multi-stage pushes. + The script should parse the arguments it gets via ssh, and if they + contain a hint to only sync stage1 or stage2, then ONLY those steps + SHOULD be performed. + + Rationale: This enables us to coordinate the timing of the first + and second stage pushes and minimize the time during which the + archive is desynchronized. This is especially important for mirrors + that are involved in a round robin or GeoDNS setup. + + The minimum arguments the script has to understand are: + sync:stage1 Only sync stage1 + sync:stage2 Only sync stage2 + sync:all Do everything. Default if none of stage1/2 are + present. + There are more possible arguments, for a complete list see the + ftpsync script in our git repository. + + + +ftpsync +======= + +This script is based on the old anonftpsync script. It has been rewritten +to add flexibilty and fix a number of outstanding issues. + +Some of the advantages of the new version are: + - Nearly every aspect is configurable + - Correct support for multiple pushes + - Support for multi-stage archive synchronisations + - Support for hook scripts at various points + - Support for multiple archives, even if they are pushed using one ssh key + + Correct support for multiple pushes + ----------------------------------- + When the script receives a second push while it is running and syncing + the archive it won't ignore it. Instead it will rerun the + synchronisation step to ensure the archive is correctly synchronised. + + Scripts that fail to do that risk ending up with an inconsistent archive. + + + Can do multi-stage archive synchronisations + ------------------------------------------- + The script can be told to only perform the first or second stage of the + archive synchronisation. + + This enables us to send all the binary packages and sources to a + number of mirrors, and then tell all of them to sync the + Packages/Release files at once. This will keep the timeframe in which + the mirrors are out of sync very small and will greatly help things like + DNS RR entries or even the planned GeoDNS setup. + + + Can run hook scripts + -------------------- + ftpsync currently allows 5 hook scripts to run at various points of the + mirror sync run. + + Hook1: After lock is acquired, before first rsync + Hook2: After first rsync, if successful + Hook3: After second rsync, if successful + Hook4: Right before leaf mirror triggering + Hook5: After leaf mirror trigger (only if we have slave mirrors; HUB=true) + + Note that Hook3 and Hook4 are likely to be called directly after each other. + The difference is that Hook3 is called *every* time the second rsync + succeeds even if the mirroring needs to re-run due to a second push. + Hook4 is only executed if mirroring is completed. + + + Support for multiple archives, even if they are pushed using one ssh key + ------------------------------------------------------------------------ + If you get multiple archives from your upstream mirror (say Debian, + Debian-Backports and Volatile), previously you had to use 3 different ssh + keys to be able to automagically synchronize them. This script can do it + all with just one key, if your upstream mirror tells you which archive. + See "Commandline/SSH options" below for further details. + + +For details of all available options, please see the extensive documentation +in the sample configuration file. + + +Commandline/SSH options +======================= +Script options may be set either on the local command line, or passed by +specifying an ssh "command". Local commandline options always have +precedence over the SSH_ORIGINAL_COMMAND ones. + +Currently this script understands the options listed below. To make them +take effect they MUST be prepended by "sync:". + +Option Behaviour +stage1 Only do stage1 sync +stage2 Only do stage2 sync +all Do a complete sync (default) +archive:foo Sync archive foo (if the file $HOME/etc/ftpsync-foo.conf + exists and is configured) +callback Call back when done (needs proper ssh setup for this to + work). It will always use the "command" callback:$HOSTNAME + where $HOSTNAME is the one defined in config and + will happen before slave mirrors are triggered. + +So, to get the script to sync all of the archive behind bpo and call back when +it is complete, use an upstream trigger of +ssh $USER@$HOST sync:all sync:archive:bpo sync:callback + + +runmirrors +========== +This script is used to tell leaf mirrors that it is time to synchronize +their copy of the archive. This is done by parsing a mirror list and +using ssh to "push" the leaf nodes. You can read much more about the +principle behind the push at [1], essentially it tells the receiving +end to run a pre-defined script. As the whole setup is extremely limited +and the ssh key is not usable for anything else than the pre-defined +script this is the most secure method for such an action. + +This script supports two types of pushes: The normal single stage push, +as well as the newer multi-stage push. + +The normal push, as described above, will simply push the leaf node and +then go on with the other nodes. + +The multi-staged push first pushes a mirror and tells it to only do a +stage1 sync run. Then it waits for the mirror (and all others being pushed +in the same run) to finish that run, before it tells all of the staged +mirrors to do the stage2 sync. + +This way you can do a nearly-simultaneous update of multiple hosts. +This is useful in situations where periods of desynchronization should +be kept as small as possible. Examples of scenarios where this might be +useful include multiple hosts in a DNS Round Robin entry. + +For details on the mirror list please see the documented +runmirrors.mirror.sample file. + + +[1] http://blog.ganneff.de/blog/2007/12/29/ssh-triggers.html -- cgit v1.2.3