Subversion's use of locales

The Subversion client, svn, honors the current locale configuration in two ways. First, it notices the value of the LC_MESSAGES variable and attempts to print all messages in the specified language. For example:

$ export LC_MESSAGES=de_DE
$ svn help cat
cat: Gibt den Inhalt der angegebenen Dateien oder URLs aus.
Aufruf: cat ZIEL[@REV]...
…

This behavior works identically on both Unix and Windows systems. Note, though, that while your operating system might have support for a certain locale, the Subversion client still may not be able to speak the particular language. In order to produce localized messages, human volunteers must provide translations for each language. The translations are written using the GNU gettext package, which results in translation modules that end with the .mo filename extension. For example, the German translation file is named de.mo. These translation files are installed somewhere on your system. On Unix, they typically live in /usr/share/locale/, while on Windows they're often found in the \share\locale\ folder in Subversion's installation area. Once installed, a module is named after the program it provides translations for. For example, the de.mo file may ultimately end up installed as /usr/share/locale/de/LC_MESSAGES/subversion.mo. By browsing the installed .mo files, you can see which languages the Subversion client is able to speak.

The second way in which the locale is honored involves how svn interprets your input. The repository stores all paths, filenames, and log messages in Unicode, encoded as UTF-8. In that sense, the repository is internationalized—that is, the repository is ready to accept input in any human language. This means, however, that the Subversion client is responsible for sending only UTF-8 filenames and log messages into the repository. In order to do this, it must convert the data from the native locale into UTF-8.

For example, suppose you create a file named caffè.txt, and then when committing the file, you write the log message as “Adesso il caffè è più forte”. Both the filename and log message contain non-ASCII characters, but because your locale is set to it_IT, the Subversion client knows to interpret them as Italian. It uses an Italian character set to convert the data to UTF-8 before sending them off to the repository.

Note that while the repository demands UTF-8 filenames and log messages, it does not pay attention to file contents. Subversion treats file contents as opaque strings of bytes, and neither client nor server makes an attempt to understand the character set or encoding of the contents.