Purging unused Berkeley DB logfiles

Until recently, the largest offender of disk space usage with respect to BDB-backed Subversion repositories was the log files in which Berkeley DB performs its pre-writes before modifying the actual database files. These files capture all the actions taken along the route of changing the database from one state to another—while the database files, at any given time, reflect a particular state, the log files contain all the many changes along the way between states. Thus, they can grow and accumulate quite rapidly.

Fortunately, beginning with the 4.2 release of Berkeley DB, the database environment has the ability to remove its own unused log files automatically. Any repositories created using an svnadmin which is compiled against Berkeley DB version 4.2 or greater will be configured for this automatic log file removal. If you don't want this feature enabled, simply pass the --bdb-log-keep option to the svnadmin create command. If you forget to do this, or change your mind at a later time, simply edit the DB_CONFIG file found in your repository's db directory, comment out the line which contains the set_flags DB_LOG_AUTOREMOVE directive, and then run svnadmin recover on your repository to force the configuration changes to take effect. See the section called “Berkeley DB Configuration” for more information about database configuration.

Without some sort of automatic log file removal in place, log files will accumulate as you use your repository. This is actually somewhat of a feature of the database system—you should be able to recreate your entire database using nothing but the log files, so these files can be useful for catastrophic database recovery. But typically, you'll want to archive the log files that are no longer in use by Berkeley DB, and then remove them from disk to conserve space. Use the svnadmin list-unused-dblogs command to list the unused log files:

$ svnadmin list-unused-dblogs /var/svn/repos
$ rm `svnadmin list-unused-dblogs /var/svn/repos`
## disk space reclaimed!


BDB-backed repositories whose log files are used as part of a backup or disaster recovery plan should not make use of the log file autoremoval feature. Reconstruction of a repository's data from log files can only be accomplished when all the log files are available. If some of the log files are removed from disk before the backup system has a chance to copy them elsewhere, the incomplete set of backed-up log files is essentially useless.