program_options/doc/overview.xml

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
     "http://www.boost.org/tools/boostbook/dtd/boostbook.dtd"
[
    <!ENTITY % entities SYSTEM "program_options.ent" >
    %entities;
]>
<section id="program_options.overview">
  <title>Library Overview</title>

  <para>In the tutorial section, we saw several examples of library usage.
    Here we will describe the overall library design including the primary
    components and their function.
  </para>

  <para>The library has three main components:
    <itemizedlist>
      <listitem>
        <para>The options description component, which describes the allowed options
          and what to do with the values of the options.
        </para>
      </listitem>
      <listitem>
        <para>The parsers component, which uses this information to find option names
          and values in the input sources and return them.
        </para>
      </listitem>
      <listitem>
        <para>The storage component, which provides the
          interface to access the value of an option. It also converts the string
          representation of values that parsers return into desired C++ types.
        </para>
      </listitem>
    </itemizedlist>
  </para>

  <para>To be a little more concrete, the <code>options_description</code>
  class is from the options description component, the
  <code>parse_command_line</code> function is from the parsers component, and the
  <code>variables_map</code> class is from the storage component. </para>

  <para>In the tutorial we've learned how those components can be used by the
    <code>main</code> function to parse the command line and config
    file. Before going into the details of each component, a few notes about
    the world outside of <code>main</code>.
  </para>

  <para>
    For that outside world, the storage component is the most important. It
    provides a class which stores all option values and that class can be
    freely passed around your program to modules which need access to the
    options. All the other components can be used only in the place where
    the actual parsing is the done.  However, it might also make sense for the
    individual program modules to describe their options and pass them to the
    main module, which will merge all options. Of course, this is only
    important when the number of options is large and declaring them in one
    place becomes troublesome.
  </para>

<!--
  <para>The design looks very simple and straight-forward, but it is worth
  noting some important points:
    <itemizedlist>
      <listitem>
        <para>The options description is not tied to specific source. Once
        options are described, all parsers can use that description.</para>
      </listitem>
      <listitem>
        <para>The parsers are intended to be fairly dumb. They just
          split the input into (name, value) pairs, using strings to represent
          names and values. No meaningful processing of values is done.
        </para>
      </listitem>
      <listitem>
        <para>The storage component is focused on storing options values. It
        </para>
      </listitem>


    </itemizedlist>

  </para>
-->

  <section>
    <title>Options Description Component</title>

    <para>The options description component has three main classes:
      &option_description;, &value_semantic; and &options_description;. The
      first two together describe a single option. The &option_description;
      class contains the option's name, description and a pointer to &value_semantic;,
      which, in turn, knows the type of the option's value and can parse the value,
      apply the default value, and so on. The &options_description; class is a
      container for instances of &option_description;.
    </para>

    <para>For almost every library, those classes could be created in a
      conventional way: that is, you'd create new options using constructors and
      then call the <code>add</code> method of &options_description;. However,
      that's overly verbose for declaring 20 or 30 options. This concern led
      to creation of the syntax that you've already seen:
<programlisting>
options_description desc;
desc.add_options()
    ("help", "produce help")
    ("optimization", value&lt;int&gt;()->default_value(10), "optimization level")
    ;
</programlisting>
    </para>

    <para>The call to the <code>value</code> function creates an instance of
      a class derived from the <code>value_semantic</code> class: <code>typed_value</code>.
      That class contains the code to parse
      values of a specific type, and contains a number of methods which can be
      called by the user to specify additional information. (This
      essentially emulates named parameters of the constructor.) Calls to
      <code>operator()</code> on the object returned by <code>add_options</code>
      forward arguments to the constructor of the <code>option_description</code>
      class and add the new instance.
    </para>

    <para>
      Note that in addition to the
      <code>value</code>, library provides the <code>bool_switch</code>
      function, and user can write his own function which will return
      other subclasses of <code>value_semantic</code> with
      different behaviour. For the remainder of this section, we'll talk only
      about the <code>value</code> function.
    </para>

    <para>The information about an option is divided into syntactic and
      semantic. Syntactic information includes the name of the option and the
      number of tokens which can be used to specify the value. This
      information is used by parsers to group tokens into (name, value) pairs,
      where value is just a vector of strings
      (<code>std::vector&lt;std::string&gt;</code>). The semantic layer
      is responsible for converting the value of the option into more usable C++
      types.
    </para>

    <para>This separation is an important part of library design. The parsers
      use only the syntactic layer, which takes away some of the freedom to
      use overly complex structures. For example, it's not easy to parse
      syntax like: <screen>calc --expression=1 + 2/3</screen> because it's not
      possible to parse <screen>1 + 2/3</screen> without knowing that it's a C
      expression. With a little help from the user the task becomes trivial,
      and the syntax clear: <screen>calc --expression="1 + 2/3"</screen>
    </para>

    <section>
      <title>Syntactic information</title>
      <para>The syntactic information is provided by the
        <classname>boost::program_options::options_description</classname> class
        and some methods of the
        <classname>boost::program_options::value_semantic</classname> class.
        The simplest usage is illustrated below:
        <programlisting>
          options_description desc;
          desc.add_options()
          ("help", "produce help message")
          ;
        </programlisting>
        This declares one option named "help" and associates a description with
        it. The user is not allowed to specify any value.
      </para>

      <para>To make an option accept a value, you'd need the
        <code>value</code> function mentioned above:
        <programlisting>
          options_description desc;
          desc.add_options()
          ("compression", "compression level", value&lt;string&gt;())
          ("verbose", "verbosity level", value&lt;string&gt;()->implicit())
          ("email", "email to send to", value&lt;string&gt;()->multitoken());
        </programlisting>
        With these declarations, the user must specify a value for
        the first option, using a single token. For the second option, the user
        may either provide a single token for the value, or no token at
        all. For the last option, the value can span several tokens. For
        example, the following command line is OK:
        <screen>
          test --compression 10 --verbose --email beadle@mars beadle2@mars
        </screen>
      </para>
    </section>

    <section>
      <title>Semantic information</title>

      <para>The semantic information is completely provided by the
        <classname>boost::program_options::value_semantic</classname> class. For
        example:
<programlisting>
options_description desc;
desc.add_options()
    ("compression", "compression level", value&lt;int&gt;()->default(10));
    ("email", "email", value&lt; vector&lt;string&gt; &gt;()
        ->composing()->notify(&amp;your_function);
</programlisting>
        These declarations specify that default value of the first option is 10,
        that the second option can appear several times and all instances should
        be merged, and that after parsing is done, the library will  call
        function <code>&amp;your_function</code>, passing the value of the
        "email" option as argument.
      </para>
    </section>

    <section>
      <title>Positional options</title>

      <para>Our definition of option as (name, value) pairs is simple and
        useful, but in one special case of the command line, there's a
        problem. A command line can include a <firstterm>positional option</firstterm>,
        which does not specify any name at all, for example:
        <screen>
          archiver --compression=9 /etc/passwd
        </screen>
        Here, the "/etc/passwd" element does not have any option name.
      </para>

      <para>One solution is to ask the user to extract positional options
        himself and process them as he likes. However, there's a nicer approach
        -- provide a method to automatically assign the names for positional
        options, so that the above command line can be interpreted the same way
        as:
        <screen>
          archiver --compression=9 --input-file=/etc/passwd
        </screen>
      </para>

      <para>The &positional_options_desc; class allows the command line
        parser to assign the names. The class specifies how many positional options
        are allowed, and for each allowed option, specifies the name. For example:
        <programlisting>
          positional_options_description pd; pd.add("input-file", 1, 1);
        </programlisting> specifies that for exactly one, first, positional
        option the name will be "input-file".
      </para>

      <para>It's possible to specify that a number, or even all positional options, be
        given the same name.
        <programlisting>
          positional_options_description pd;
          pd.add("output-file", 2, 2).add_optional("input-file", 0, -1);
        </programlisting>
        In the above example, the first two positional options will be associated
        with name "output-file", and any others with the name "input-file".
      </para>

    </section>

    <!-- Note that the classes are not modified during parsing -->

  </section>

  <section>
    <title>Parsers Component</title>

    <para>The parsers component splits input sources into (name, value) pairs.
      Each parser looks for possible options and consults the options
      description component to determine if the option is known and how its value
      is specified. In the simplest case, the name is explicitly specified,
      which allows the library to decide if such option is known. If it is known, the
      &value_semantic; instance determines how the value is specified. (If
      it is not known, an exception is thrown.) Common
      cases are when the value is explicitly specified by the user, and when
      the value cannot be specified by the user, but the presence of the
      option implies some value (for example, <code>true</code>). So, the
      parser checks that the value is specified when needed and not specified
      when not needed, and returns new (name, value) pair.
    </para>

    <para>
      To invoke a parser you typically call a function, passing the options
      description and command line or config file or something else.
      The results of parsing are returned as an instance of the &parsed_options;
      class. Typically, that object is passed directly to the storage
      component. However, it also can be used directly, or undergo some additional
      processing.
    </para>

    <para>
      There are three exceptions to the above model -- all related to
      traditional usage of the command line. While they require some support
      from the options description component, the additional complexity is
      tolerable.
      <itemizedlist>
        <listitem>
          <para>The name specified on the command line may be
            different from the option name -- it's common to provide a "short option
            name" alias to a longer name. It's also common to allow an abbreviated name
            to be specified on the command line.
          </para>
        </listitem>
        <listitem>
          <para>Sometimes it's desirable to specify value as several
          tokens. For example, an option "--email-recipient" may be followed
          by several emails, each as a separate command line token. This
          behaviour is supported, though it can lead to parsing ambiguities
          and is not enabled by default.
          </para>
        </listitem>
        <listitem>
          <para>The command line may contain positional options -- elements
            which don't have any name. The command line parser provides a
            mechanism to guess names for such options, as we've seen in the
            tutorial.
          </para>
        </listitem>
      </itemizedlist>
    </para>

  </section>


  <section>
    <title>Storage Component</title>

    <para>The storage component is responsible for:
      <itemizedlist>
        <listitem>
          <para>Storing the final values of an option into a special class and in
            regular variables</para>
        </listitem>
        <listitem>
          <para>Handling priorities among different sources.</para>
        </listitem>

        <listitem>
          <para>Calling user-specified <code>notify</code> functions with the final
         values of options.</para>
        </listitem>
      </itemizedlist>
    </para>

    <para>Let's consider an example:
      <programlisting>
        variables_map vm;
        store(parse_command_line(argc, argv, desc), vm);
        store(parse_config_file("example.cfg", desc), vm);
        notify(vm);
      </programlisting>
      The <code>variables_map</code> class is used to store the option
      values. The two calls to the <code>store</code> function add values
      found on the command line and in the config file. Finally the call to
      the <code>notify</code> function runs the user-specified notify
      functions and stores the values into regular variables, if needed.
    </para>

    <para>The priority is handled in a simple way: the <code>store</code>
      function will not change the value of an option if it's already
      assigned. In this case, if the command line specifies the value for an
      option, any value in the config file is ignored.
    </para>

    <warning>
      <para>Don't forget to call the <code>notify</code> function after you've
      stored all parsed values.</para>
    </warning>

  </section>

  <section>
    <title>Annotated List of Symbols</title>

    <para>The following table describes all the important symbols in the
      library, for quick access.</para>

    <informaltable pgwide="1">

      <tgroup cols="2">
        <colspec colname='c1'/>
        <colspec colname='c2'/>
        <thead>

          <row>
            <entry>Symbol</entry>
            <entry>Description</entry>
          </row>
        </thead>

        <tbody>

          <row>
            <entry namest='c1' nameend='c2'>Options description component</entry>
          </row>

          <row>
            <entry>&options_description;</entry>
            <entry>describes a number of options</entry>
          </row>
          <row>
            <entry>&value;</entry>
            <entry>defines the option's value</entry>
          </row>

          <row>
            <entry namest='c1' nameend='c2'>Parsers component</entry>
          </row>

          <row>
            <entry>&parse_command_line;</entry>
            <entry>parses command line</entry>
          </row>
          <row>
            <entry>&parse_config_file;</entry>
            <entry>parses config file</entry>
          </row>

          <row>
            <entry>&parse_environment;</entry>
            <entry>parses environment</entry>
          </row>

          <row>
            <entry namest='c1' nameend='c2'>Storage component</entry>
          </row>

          <row>
            <entry>&variables_map;</entry>
            <entry>storage for option values</entry>
          </row>

        </tbody>
      </tgroup>

    </informaltable>

  </section>

</section>

<!--
     Local Variables:
     mode: xml
     sgml-indent-data: t
     sgml-parent-document: ("program_options.xml" "section")
     sgml-set-face: t
     End:
-->