Chapter 22: Localization

Chapter 22 deals with the C++ localization facilities.

class locale
class codecvt
class ctype
class messages
Bjarne Stroustrup on Locales
Nathan Myers on Locales
Correct Transformations

class locale

Notes made during the implementation of locales can be found here.

class codecvt

Notes made during the implementation of codecvt can be found here.

The following is the abstract from the implementation notes:

The standard class codecvt attempts to address conversions between different character encoding schemes. In particular, the standard attempts to detail conversions between the implementation-defined wide characters (hereafter referred to as wchar_t) and the standard type char that is so beloved in classic "C" (which can now be referred to as narrow characters.) This document attempts to describe how the GNU libstdc++-v3 implementation deals with the conversion between wide and narrow characters, and also presents a framework for dealing with the huge number of other encodings that iconv can convert, including Unicode and UTF8. Design issues and requirements are addressed, and examples of correct usage for both the required specializations for wide and narrow characters and the implementation-provided extended functionality are given.

class ctype

Notes made during the implementation of ctype can be found here.

class messages

Notes made during the implementation of messages can be found here.

Bjarne Stroustrup on Locales

Dr. Bjarne Stroustrup has released a pointer to Appendix D of his book, The C++ Programming Language (3rd Edition). It is a detailed description of locales and how to use them.

He also writes:

Please note that I still consider this detailed description of locales beyond the needs of most C++ programmers. It is written with experienced programmers in mind and novices will do best to avoid it.

Nathan Myers on Locales

An article entitled "The Standard C++ Locale" was published in Dr. Dobb's Journal and can be found here.

A very common question on newsgroups and mailing lists is, "How do I do <foo> to a character string?" where <foo> is a task such as changing all the letters to uppercase, to lowercase, testing for digits, etc. A skilled and conscientious programmer will follow the question with another, "And how do I make the code portable?"

(Poor innocent programmer, you have no idea the depths of trouble you are getting yourself into. 'Twould be best for your sanity if you dropped the whole idea and took up basket weaving instead. No? Fine, you asked for it...)

The task of changing the case of a letter or classifying a character as numeric, graphical, etc, all depends on the cultural context of the program at runtime. So, first you must take the portability question into account. Once you have localized the program to a particular natural language, only then can you perform the specific task. Unfortunately, specializing a function for a human language is not as simple as declaring extern "Danish" int tolower (int); .

The C++ code to do all this proceeds in the same way. First, a locale is created. Then member functions of that locale are called to perform minor tasks. Continuing the example from Chapter 21, we wish to use the following convenience functions:

   namespace std {
     template <class charT>
       charT
       toupper (charT c, const locale& loc) const;
     template <class charT>
       charT
       tolower (charT c, const locale& loc) const;
   }

This function extracts the appropriate "facet" from the locale loc and calls the appropriate member function of that facet, passing c as its argument. The resulting character is returned.

For the C/POSIX locale, the results are the same as calling the classic C toupper/tolower function that was used in previous examples. For other locales, the code should Do The Right Thing.

Of course, these functions take a second argument, and the transformation algorithm's operator argument can only take a single parameter. So we write simple wrapper structs to handle that.

The next-to-final version of the code started in Chapter 21 looks like:

   #include <iterator>    // for back_inserter
   #include <locale>
   #include <string>
   #include <algorithm>
   #include <cctype>      // old <ctype.h>

   struct ToUpper
   {
       ToUpper(std::locale const& l) : loc(l) {;}
       char operator() (char c) const  { return std::toupper(c,loc); }
   private:
       std::locale const& loc;
   };
   
   struct ToLower
   {
       ToLower(std::locale const& l) : loc(l) {;}
       char operator() (char c) const  { return std::tolower(c,loc); }
   private:
       std::locale const& loc;
   };
   
   int main ()
   {
      std::string  s("Some Kind Of Initial Input Goes Here");
      ToUpper      up(std::locale::classic());
      ToLower      down(std::locale::classic());
   
      // Change everything into upper case.
      std::transform(s.begin(), s.end(), s.begin(), up);
   
      // Change everything into lower case.
      std::transform(s.begin(), s.end(), s.begin(), down);
   
      // Change everything back into upper case, but store the
      // result in a different string.
      std::string  capital_s;
      std::transform(s.begin(), s.end(), std::back_inserter(capital_s), up);
   }

The ToUpper and ToLower structs can be generalized for other character types by making operator() a member function template.

The final version of the code uses bind2nd to eliminate the wrapper structs, but the resulting code is tricky. I have not shown it here because no compilers currently available to me will handle it.

See license.html for copying conditions. Comments and suggestions are welcome, and may be sent to the libstdc++ mailing list.

Contents