Library Overview

Options Description Component
Parsers Component
Storage Component
Annotated List of Symbols

In the tutorial section, we saw several examples of library usage. Here we will describe the overall library design including the primary components and their function.

The library has three main components:

The options description component, which describes the allowed options and what to do with the values of the options.
The parsers component, which uses this information to find option names and values in the input sources and return them.
The storage component, which provides the interface to access the value of an option. It also converts the string representation of values that parsers return into desired C++ types.

To be a little more concrete, the options_description class is from the options description component, the parse_command_line function is from the parsers component, and the variables_map class is from the storage component.

In the tutorial we've learned how those components can be used by the main function to parse the command line and config file. Before going into the details of each component, a few notes about the world outside of main.

For that outside world, the storage component is the most important. It provides a class which stores all option values and that class can be freely passed around your program to modules which need access to the options. All the other components can be used only in the place where the actual parsing is the done. However, it might also make sense for the individual program modules to describe their options and pass them to the main module, which will merge all options. Of course, this is only important when the number of options is large and declaring them in one place becomes troublesome.

Options Description Component

Syntactic information
Semantic information
Positional options

The options description component has three main classes: option_description, value_semantic and options_description. The first two together describe a single option. The option_description class contains the option's name, description and a pointer to value_semantic, which, in turn, knows the type of the option's value and can parse the value, apply the default value, and so on. The options_description class is a container for instances of option_description.

For almost every library, those classes could be created in a conventional way: that is, you'd create new options using constructors and then call the add method of options_description. However, that's overly verbose for declaring 20 or 30 options. This concern led to creation of the syntax that you've already seen:

options_description desc;
desc.add_options()
    ("help", "produce help")
    ("optimization", value<int>()->default_value(10), "optimization level")
    ;

The call to the value function creates an instance of a class derived from the value_semantic class: typed_value. That class contains the code to parse values of a specific type, and contains a number of methods which can be called by the user to specify additional information. (This essentially emulates named parameters of the constructor.) Calls to operator() on the object returned by add_options forward arguments to the constructor of the option_description class and add the new instance.

Note that in addition to the value, library provides the bool_switch function, and user can write his own function which will return other subclasses of value_semantic with different behaviour. For the remainder of this section, we'll talk only about the value function.

The information about an option is divided into syntactic and semantic. Syntactic information includes the name of the option and the number of tokens which can be used to specify the value. This information is used by parsers to group tokens into (name, value) pairs, where value is just a vector of strings (std::vector<std::string>). The semantic layer is responsible for converting the value of the option into more usable C++ types.

This separation is an important part of library design. The parsers use only the syntactic layer, which takes away some of the freedom to use overly complex structures. For example, it's not easy to parse syntax like:

calc --expression=1 + 2/3

because it's not possible to parse

1 + 2/3

without knowing that it's a C expression. With a little help from the user the task becomes trivial, and the syntax clear:

calc --expression="1 + 2/3"

Syntactic information

The syntactic information is provided by the boost::program_options::options_description class and some methods of the boost::program_options::value_semantic class. The simplest usage is illustrated below:

          options_description desc;
          desc.add_options()
          ("help", "produce help message")
          ;

This declares one option named "help" and associates a description with it. The user is not allowed to specify any value.

To make an option accept a value, you'd need the value function mentioned above:

          options_description desc;
          desc.add_options()
          ("compression", "compression level", value<string>())
          ("verbose", "verbosity level", value<string>()->implicit())
          ("email", "email to send to", value<string>()->multitoken());

With these declarations, the user must specify a value for the first option, using a single token. For the second option, the user may either provide a single token for the value, or no token at all. For the last option, the value can span several tokens. For example, the following command line is OK:

          test --compression 10 --verbose --email beadle@mars beadle2@mars

Semantic information

The semantic information is completely provided by the boost::program_options::value_semantic class. For example:

options_description desc;
desc.add_options()
    ("compression", "compression level", value<int>()->default(10));
    ("email", "email", value< vector<string> >()
        ->composing()->notify(&your_function);

These declarations specify that default value of the first option is 10, that the second option can appear several times and all instances should be merged, and that after parsing is done, the library will call function &your_function, passing the value of the "email" option as argument.

Positional options

Our definition of option as (name, value) pairs is simple and useful, but in one special case of the command line, there's a problem. A command line can include a positional option, which does not specify any name at all, for example:

          archiver --compression=9 /etc/passwd

Here, the "/etc/passwd" element does not have any option name.

One solution is to ask the user to extract positional options himself and process them as he likes. However, there's a nicer approach -- provide a method to automatically assign the names for positional options, so that the above command line can be interpreted the same way as:

          archiver --compression=9 --input-file=/etc/passwd

The positional_options_description class allows the command line parser to assign the names. The class specifies how many positional options are allowed, and for each allowed option, specifies the name. For example:

          positional_options_description pd; pd.add("input-file", 1, 1);

specifies that for exactly one, first, positional option the name will be "input-file".

It's possible to specify that a number, or even all positional options, be given the same name.

          positional_options_description pd;
          pd.add("output-file", 2, 2).add_optional("input-file", 0, -1);

In the above example, the first two positional options will be associated with name "output-file", and any others with the name "input-file".

Parsers Component

The parsers component splits input sources into (name, value) pairs. Each parser looks for possible options and consults the options description component to determine if the option is known and how its value is specified. In the simplest case, the name is explicitly specified, which allows the library to decide if such option is known. If it is known, the value_semantic instance determines how the value is specified. (If it is not known, an exception is thrown.) Common cases are when the value is explicitly specified by the user, and when the value cannot be specified by the user, but the presence of the option implies some value (for example, true). So, the parser checks that the value is specified when needed and not specified when not needed, and returns new (name, value) pair.

To invoke a parser you typically call a function, passing the options description and command line or config file or something else. The results of parsing are returned as an instance of the parsed_options class. Typically, that object is passed directly to the storage component. However, it also can be used directly, or undergo some additional processing.

There are three exceptions to the above model -- all related to traditional usage of the command line. While they require some support from the options description component, the additional complexity is tolerable.

The name specified on the command line may be different from the option name -- it's common to provide a "short option name" alias to a longer name. It's also common to allow an abbreviated name to be specified on the command line.
Sometimes it's desirable to specify value as several tokens. For example, an option "--email-recipient" may be followed by several emails, each as a separate command line token. This behaviour is supported, though it can lead to parsing ambiguities and is not enabled by default.
The command line may contain positional options -- elements which don't have any name. The command line parser provides a mechanism to guess names for such options, as we've seen in the tutorial.

Storage Component

The storage component is responsible for:

Storing the final values of an option into a special class and in regular variables
Handling priorities among different sources.
Calling user-specified notify functions with the final values of options.

Let's consider an example:

        variables_map vm;
        store(parse_command_line(argc, argv, desc), vm);
        store(parse_config_file("example.cfg", desc), vm);
        notify(vm);

The variables_map class is used to store the option values. The two calls to the store function add values found on the command line and in the config file. Finally the call to the notify function runs the user-specified notify functions and stores the values into regular variables, if needed.

The priority is handled in a simple way: the store function will not change the value of an option if it's already assigned. In this case, if the command line specifies the value for an option, any value in the config file is ignored.

Warning

Don't forget to call the notify function after you've stored all parsed values.

Annotated List of Symbols

The following table describes all the important symbols in the library, for quick access.

Symbol	Description
Options description component
options_description	describes a number of options
value	defines the option's value
Parsers component
parse_command_line	parses command line
parse_config_file	parses config file
parse_environment	parses environment
Storage component
variables_map	storage for option values