This section is from the "Version Control with Subversion" book, by Ben Collins-Sussman, Brian W. Fitzpatrick and C. Michael Pilato. Also available from Amazon: Version Control with Subversion.
As of version 1.1, Subversion provides two options for the type of underlying data store—often referred to as “the back-end” or, somewhat confusingly, “the (versioned) filesystem”—that each repository uses. One type of data store keeps everything in a Berkeley DB (or BDB) database environment; repositories that use this type are often referred to as being “BDB-backed”. The other type stores data in ordinary flat files, using a custom format. Subversion developers have adopted the habit of referring to this latter data storage mechanism as FSFS[28] —a versioned filesystem implementation that uses the native OS filesystem directly—rather than via a database library or some other abstraction layer—to store data.
Table 5.1, “Repository Data Store Comparison” gives a comparative overview of Berkeley DB and FSFS repositories.
Table 5.1. Repository Data Store Comparison
Category | Feature | Berkeley DB | FSFS |
---|---|---|---|
Reliability | Data integrity | when properly deployed, extremely reliable; Berkeley DB 4.4 brings auto-recovery | older versions had some rarely demonstrated, but data-destroying bugs |
Sensitivity to interruptions | very; crashes and permission problems can leave the database “wedged”, requiring journaled recovery procedures | quite insensitive | |
Accessibility | Usable from a read-only mount | no | yes |
Platform-independent storage | no | yes | |
Usable over network filesystems | generally, no | yes | |
Group permissions handling | sensitive to user umask problems; best if accessed by only one user | works around umask problems | |
Scalability | Repository disk usage | larger (especially if logfiles aren't purged) | smaller |
Number of revision trees | database; no problems | some older native filesystems don't scale well with thousands of entries in a single directory | |
Directories with many files | slower | faster | |
Performance | Checking out latest revision | no meaningful difference | no meaningful difference |
Large commits | slower overall, but cost is amortized across the lifetime of the commit | faster overall, but finalization delay may cause client timeouts |
There are advantages and disadvantages to each of these two back-end types. Neither of them is more “official” than the other, though the newer FSFS is the default data store as of Subversion 1.2. Both are reliable enough to trust with your versioned data. But as you can see in Table 5.1, “Repository Data Store Comparison”, the FSFS backend provides quite a bit more flexibility in terms of its supported deployment scenarios. More flexibility means you have to work a little harder to find ways to deploy it incorrectly. Those reasons—plus the fact that not using Berkeley DB means there's one fewer component in the system—largely explain why today almost everyone uses the FSFS backend when creating new repositories.
Fortunately, most programs which access Subversion repositories are blissfully ignorant of which back-end data store is in use. And you aren't even necessarily stuck with your first choice of a data store—in the event that you change your mind later, Subversion provides ways of migrating your repository's data into another repository that uses a different back-end data store. We talk more about that later in this chapter.
The following subsections provide a more detailed look at the available back-end data store types.
[28] Often pronounced “fuzz-fuzz”, if Jack Repenning has anything to say about it. (This book, however, assumes that the reader is thinking “eff-ess-eff-ess”.)