How Subversion saves disk space

To keep the repository small, Subversion uses deltification (or, “deltified storage”) within the repository itself. Deltification involves encoding the representation of a chunk of data as a collection of differences against some other chunk of data. If the two pieces of data are very similar, this deltification results in storage savings for the deltified chunk—rather than taking up space equal to the size of the original data, it takes up only enough space to say, “I look just like this other piece of data over here, except for the following couple of changes”. The result is that most of the repository data that tends to be bulky—namely, the contents of versioned files—is stored at a much smaller size than the original “fulltext” representation of that data. And for repositories created with Subversion 1.4 or later, the space savings are even better—now those fulltext representations of file contents are themselves compressed.


Because all of the data that is subject to deltification in a BDB-backed repository is stored in a single Berkeley DB database file, reducing the size of the stored values will not immediately reduce the size of the database file itself. Berkeley DB will, however, keep internal records of unused areas of the database file, and consume those areas first before growing the size of the database file. So while deltification doesn't produce immediate space savings, it can drastically slow future growth of the database.