Choosing a Data Store

As of version 1.1, Subversion provides two options for the type of underlying data store—often referred to as “the back-end” or, somewhat confusingly, “the (versioned) filesystem”—that each repository uses. One type of data store keeps everything in a Berkeley DB (or BDB) database environment; repositories that use this type are often referred to as being “BDB-backed”. The other type stores data in ordinary flat files, using a custom format. Subversion developers have adopted the habit of referring to this latter data storage mechanism as FSFS[28] —a versioned filesystem implementation that uses the native OS filesystem directly—rather than via a database library or some other abstraction layer—to store data.

Table 5.1, “Repository Data Store Comparison” gives a comparative overview of Berkeley DB and FSFS repositories.

Table 5.1. Repository Data Store Comparison

CategoryFeatureBerkeley DBFSFS
ReliabilityData integritywhen properly deployed, extremely reliable; Berkeley DB 4.4 brings auto-recoveryolder versions had some rarely demonstrated, but data-destroying bugs
Sensitivity to interruptionsvery; crashes and permission problems can leave the database “wedged”, requiring journaled recovery proceduresquite insensitive
AccessibilityUsable from a read-only mountnoyes
Platform-independent storagenoyes
Usable over network filesystemsgenerally, noyes
Group permissions handlingsensitive to user umask problems; best if accessed by only one userworks around umask problems
ScalabilityRepository disk usagelarger (especially if logfiles aren't purged)smaller
Number of revision treesdatabase; no problemssome older native filesystems don't scale well with thousands of entries in a single directory
Directories with many filesslowerfaster
PerformanceChecking out latest revisionno meaningful differenceno meaningful difference
Large commitsslower overall, but cost is amortized across the lifetime of the commitfaster overall, but finalization delay may cause client timeouts

There are advantages and disadvantages to each of these two back-end types. Neither of them is more “official” than the other, though the newer FSFS is the default data store as of Subversion 1.2. Both are reliable enough to trust with your versioned data. But as you can see in Table 5.1, “Repository Data Store Comparison”, the FSFS backend provides quite a bit more flexibility in terms of its supported deployment scenarios. More flexibility means you have to work a little harder to find ways to deploy it incorrectly. Those reasons—plus the fact that not using Berkeley DB means there's one fewer component in the system—largely explain why today almost everyone uses the FSFS backend when creating new repositories.

Fortunately, most programs which access Subversion repositories are blissfully ignorant of which back-end data store is in use. And you aren't even necessarily stuck with your first choice of a data store—in the event that you change your mind later, Subversion provides ways of migrating your repository's data into another repository that uses a different back-end data store. We talk more about that later in this chapter.

The following subsections provide a more detailed look at the available back-end data store types.

[28] Often pronounced “fuzz-fuzz”, if Jack Repenning has anything to say about it. (This book, however, assumes that the reader is thinking “eff-ess-eff-ess”.)