Repository Replication

There are several scenarios in which it is quite handy to have a Subversion repository whose version history is exactly the same as some other repository's. Perhaps the most obvious one is the maintenance of a simple backup repository, used when the primary repository has become inaccessible due to a hardware failure, network outage, or other such annoyance. Other scenarios include deploying mirror repositories to distribute heavy Subversion load across multiple servers, use as a soft-upgrade mechanism, and so on.

As of version 1.4, Subversion provides a program for managing scenarios like these—svnsync. svnsync works by essentially asking the Subversion server to “replay” revisions, one at a time. It then uses that revision information to mimic a commit of the same to another repository. Neither repository needs to be locally accessible to machine on which svnsync is running—its parameters are repository URLs, and it does all its work through Subversion's repository access (RA) interfaces. All it requires is read access to the source repository and read/write access to the destination repository.

Note

When using svnsync against a remote source repository, the Subversion server for that repository must be running Subversion version 1.4 or better.

Assuming you already have a source repository that you'd like to mirror, the next thing you need is an empty target repository which will actually serve as that mirror. This target repository can use either of the available filesystem data-store back-ends (see the section called “Choosing a Data Store”), but it must not yet have any version history in it. The protocol via which svnsync communicates revision information is highly sensitive to mismatches between the versioned histories contained in the source and target repositories. For this reason, while svnsync cannot demand that the target repository be read-only, [36] allowing the revision history in the target repository to change by any mechanism other than the mirroring process is a recipe for disaster.

Warning

Do not modify a mirror repository in such a way as to cause its version history to deviate from that of the repository it mirrors. The only commits and revision property modifications that ever occur on that mirror repository should be those performed by the svnsync tool.

Another requirement of the target repository is that the svnsync process be allowed to modify certain revision properties. svnsync stores its bookkeeping information in special revision properties on revision 0 of the destination repository. Because svnsync works within the framework of that repository's hook system, the default state of the repository (which is to disallow revision property changes; see pre-revprop-change) is insufficient. You'll need to explicitly implement the pre-revprop-change hook, and your script must allow svnsync to set and change its special properties. With those provisions in place, you are ready to start mirroring repository revisions.

Tip

It's a good idea to implement authorization measures which allow your repository replication process to perform its tasks while preventing other users from modifying the contents of your mirror repository at all.

Let's walk through the use of svnsync in a somewhat typical mirroring scenario. We'll pepper this discourse with practical recommendations which you are free to disregard if they aren't required by or suitable for your environment.

As a service to the fine developers of our favorite version control system, we will be mirroring the public Subversion source code repository and exposing that mirror publicly on the Internet, hosted on a different machine than the one on which the original Subversion source code repository lives. This remote host has a global configuration which permits anonymous users to read the contents of repositories on the host, but requires users to authenticate in order to modify those repositories. (Please forgive us for glossing over the details of Subversion server configuration for the moment—those are covered thoroughly in Chapter 6, Server Configuration.) And for no other reason than that it makes for a more interesting example, we'll be driving the replication process from a third machine, the one which we currently find ourselves using.

First, we'll create the repository which will be our mirror. This and the next couple of steps do require shell access to the machine on which the mirror repository will live. Once the repository is all configured, though, we shouldn't need to touch it directly again.

$ ssh admin@svn.example.com \
      "svnadmin create /var/svn/svn-mirror"
admin@svn.example.com's password: ********
$

At this point, we have our repository, and due to our server's configuration, that repository is now “live” on the Internet. Now, because we don't want anything modifying the repository except our replication process, we need a way to distinguish that process from other would-be committers. To do so, we use a dedicated username for our process. Only commits and revision property modifications performed by the special username syncuser will be allowed.

We'll use the repository's hook system both to allow the replication process to do what it needs to do, and to enforce that only it is doing those things. We accomplish this by implementing two of the repository event hooks—pre-revprop-change and start-commit. Our pre-revprop-change hook script is found in Example 5.2, “Mirror repository's pre-revprop-change hook script”, and basically verifies that the user attempting the property changes is our syncuser user. If so, the change is allowed; otherwise, it is denied.

Example 5.2. Mirror repository's pre-revprop-change hook script

#!/bin/sh 

USER="$3"

if [ "$USER" = "syncuser" ]; then exit 0; fi

echo "Only the syncuser user may change revision properties" >&2
exit 1

That covers revision property changes. Now we need to ensure that only the syncuser user is permitted to commit new revisions to the repository. We do this using a start-commit hook scripts like the one in Example 5.3, “Mirror repository's start-commit hook script”.

Example 5.3. Mirror repository's start-commit hook script

#!/bin/sh 

USER="$2"

if [ "$USER" = "syncuser" ]; then exit 0; fi

echo "Only the syncuser user may commit new revisions" >&2
exit 1

After installing our hook scripts and ensuring that they are executable by the Subversion server, we're finished with the setup of the mirror repository. Now, we get to actually do the mirroring.

The first thing we need to do with svnsync is to register in our target repository the fact that it will be a mirror of the source repository. We do this using the svnsync initialize subcommand.

$ svnsync help init
initialize (init): usage: svnsync initialize DEST_URL SOURCE_URL

Initialize a destination repository for synchronization from
another repository.
…
$ svnsync initialize http://svn.example.com/svn-mirror \
                     http://svn.collab.net/repos/svn \
                     --sync-username syncuser --sync-password syncpass
Copied properties for revision 0.
$

Our target repository will now remember that it is a mirror of the public Subversion source code repository. Notice that we provided a username and password as arguments to svnsync—that was required by the pre-revprop-change hook on our mirror repository.

Note

The URLs provided to svnsync must point to the root directories of the target and source repositories, respectively. The tool does not handle mirroring of repository subtrees.

Note

In Subversion 1.4, the values given to svnsync's --username and --password command-line options were used for authentication against both the source and destination repositories. This caused problems when a user's credentials weren't exactly the same for both repositories, especially when running in non-interactive mode (with the --non-interactive option) might experience problems.

This has been fixed in Subversion 1.5 with the introduction of two new pairs of options. Use the --source-username and --source-password options to provide authentication credentials for the source repository; use --sync-username and --sync-password to provide credentials for the destination repository. (The old --username and --password options still exist for compatibility, but we advise against using them.)

And now comes the fun part. With a single subcommand, we can tell svnsync to copy all the as-yet-unmirrored revisions from the source repository to the target. [37] The svnsync synchronize subcommand will peek into the special revision properties previously stored on the target repository, and determine what repository it is mirroring and that the most recently mirrored revision was revision 0. Then it will query the source repository and determine what the latest revision in that repository is. Finally, it asks the source repository's server to start replaying all the revisions between 0 and that latest revision. As svnsync get the resulting response from the source repository's server, it begins forwarding those revisions to the target repository's server as new commits.

$ svnsync help synchronize
synchronize (sync): usage: svnsync synchronize DEST_URL

Transfer all pending revisions to the destination from the source
with which it was initialized.
…
$ svnsync synchronize http://svn.example.com/svn-mirror
Transmitting file data ........................................
Committed revision 1.
Copied properties for revision 1.
Transmitting file data ..
Committed revision 2.
Copied properties for revision 2.
Transmitting file data .....
Committed revision 3.
Copied properties for revision 3.
…
Transmitting file data ..
Committed revision 23406.
Copied properties for revision 23406.
Transmitting file data .
Committed revision 23407.
Copied properties for revision 23407.
Transmitting file data ....
Committed revision 23408.
Copied properties for revision 23408.
$

Of particular interest here is that for each mirrored revision, there is first a commit of that revision to the target repository, and then property changes follow. This is because the initial commit is performed by (and attributed to) the user syncuser, and datestamped with the time as of that revision's creation. Also, Subversion's underlying repository access interfaces don't provide a mechanism for setting arbitrary revision properties as part of a commit. So svnsync follows up with an immediate series of property modifications which copy all the revision properties found for that revision in the source repository into the target repository. This also has the effect of fixing the author and datestamp of the revision to match that of the source repository.

Also noteworthy is that svnsync performs careful bookkeeping that allows it to be safely interrupted and restarted without ruining the integrity of the mirrored data. If a network glitch occurs while mirroring a repository, simply repeat the svnsync synchronize command and it will happily pick up right where it left off. In fact, as new revisions appear in the source repository, this is exactly what you to do in order to keep your mirror up-to-date.

There is, however, one bit of inelegance in the process. Because Subversion revision properties can be changed at any time throughout the lifetime of the repository, and don't leave an audit trail that indicates when they were changed, replication processes have to pay special attention to them. If you've already mirrored the first 15 revisions of a repository and someone then changes a revision property on revision 12, svnsync won't know to go back and patch up its copy of revision 12. You'll need to tell it to do so manually by using (or with some additionally tooling around) the svnsync copy-revprops subcommand, which simply re-replicates all the revision properties for a particular revision or range thereof.

$ svnsync help copy-revprops
copy-revprops: usage: svnsync copy-revprops DEST_URL [REV[:REV2]]

Copy the revision properties in a given range of revisions to the
destination from the source with which it was initialized.
…
$ svnsync copy-revprops http://svn.example.com/svn-mirror 12
Copied properties for revision 12.
$

That's repository replication in a nutshell. You'll likely want some automation around such a process. For example, while our example was a pull-and-push setup, you might wish to have your primary repository push changes to one or more blessed mirrors as part of its post-commit and post-revprop-change hook implementations. This would enable the mirror to be up-to-date in as near to realtime as is likely possible.

Also, while it isn't very commonplace to do so, svnsync does gracefully mirror repositories in which the user as whom it authenticates only has partial read access. It simply copies only the bits of the repository that it is permitted to see. Obviously such a mirror is not useful as a backup solution.

As far as user interaction with repositories and mirrors goes, it is possible to have a single working copy that interacts with both, but you'll have to jump through some hoops to make it happen. First, you need to ensure that both the primary and mirror repositories have the same repository UUID (which is not the case by default). See the section called “Managing Repository UUIDs” for more about managing repository UUIDs.

Once the two repositories have the same UUID, you can use svn switch --relocate to point your working copy to whichever of the repositories you wish to operate against, a process which is described in svn switch. There is a possible danger here, though, in that if the primary and mirror repositories aren't in close synchronization, a working copy up-to-date with, and pointing to, the primary repository will, if relocated to point to an out-of-date mirror, become confused about the apparent sudden loss of revisions it fully expects to be present, and throws errors to that effect. If this occurs, you can relocate your working copy back to the primary repository and then either wait until the mirror repository is up-to-date, or backdate your working copy to a revision you know is present in the sync repository and then retry the relocation.

Finally, be aware that the revision-based replication provided by svnsync is only that—replication of revisions. Only information carried by the Subversion repository dump file format is available for replication. As such, svnsync has the same sorts of limitations that the repository dump stream has, and does not include such things as the hook implementations, repository or server configuration data, uncommitted transactions, or information about user locks on repository paths.



[36] In fact, it can't truly be read-only, or svnsync itself would have a tough time copying revision history into it.

[37] Be forewarned that while it will take only a few seconds for the average reader to parse this paragraph and the sample output which follows it, the actual time required to complete such a mirroring operation is, shall we say, quite a bit longer.