PrevUpHomeNext

Rules

We saw in the previous section how parse() is flexible in what types it will accept as attribute out-parameters.

That flexibility is a blessing and a curse. For instance, say you wanted to use the parser +char_ to parse a std::string, and capture the result in a std::string. +char_ generates an attribute of std::vector<char> when parsing a sequence of char, so you'd have to write the result into a vector first, including all the allocations that implies, and then you'd have to allocate space in a string, and copy the entire result. Not great. The flexibility of attribute out-parameters lets you avoid that. On the other hand, if you want to parse your result into a std::vector<char>, but accidentally pass a std::string, the code is well-formed. Usually, we expect type mismatches like this to be ill-formed in C++. Fortunately, rules help you address both these concerns.

Using rules to nail down attribute flexibility

Every rule has a specific attribute type. If one is not specified, the rule has no attribute. The fact that the attribute is a specific type allows you to remove attribute flexibility. For instance, say we have a rule defined like this:

bp::rule<struct doubles, std::vector<double>> doubles = "doubles";
auto const doubles_def = bp::double_ % ',';
BOOST_PARSER_DEFINE_RULE(doubles);

You can then use it in a call to parse(), and parse() will return a std::optional<std::vector<double>>:

auto const result = bp::parse(input, doubles, bp::ws);

If you call parse() with an attribute out-parameter, it must be exactly std::vector<double>:

std::vector<double> vec_result;
bp::parse(input, doubles, bp::ws, vec_result); // Ok.
std::deque<double> deque_result;
bp::parse(input, doubles, bp::ws, deque_result); // Ill-formed!

If we wanted to use a std::deque<double> as the attribute type of our rule:

// Attribute changed to std::deque<double>.
bp::rule<struct doubles, std::deque<double>> doubles = "doubles";
auto const doubles_def = bp::double_ % ',';
BOOST_PARSER_DEFINE_RULES(doubles);

int main()
{
    std::deque<double> deque_result;
    bp::parse(input, doubles, bp::ws, deque_result); // Ok.
}

So, the attribute flexibility is still available, but only within the rule — the parser bp::double_ % ',' can parse into a std::vector<double> or a std::deque<double>, but the rule doubles must parse into only the exact attribute it was declared to generate.

The reason for this is that, inside the rule parsing implementation, there is code something like this:

using attr_t = ATTR(doubles_def);
attr_t attr;
parse(first, last, parser, attr);
attribute_out_param = std::move(attr);

Where attribute_out_param is the attribute out-parameter we pass to parse(). If that final move assignment is ill-formed, the call to parse() is too.

Using rules to exploit attribute flexibility

So, even though a rule reduces the flexibility of attributes it can generate, the fact that it is so easy to write a new rule means that we can use rules themselves to get the attribute flexibility we want across our code:

namespace bp = boost::parser;

// We only need to write the definition once...
auto const generic_doubles_def = bp::double_ % ',';

bp::rule<struct vec_doubles, std::vector<double>> vec_doubles = "vec_doubles";
auto const & vec_doubles_def = generic_doubles_def; // ... and re-use it,
BOOST_PARSER_DEFINE_RULES(vec_doubles);

// Attribute changed to std::deque<double>.
bp::rule<struct deque_doubles, std::deque<double>> deque_doubles = "deque_doubles";
auto const & deque_doubles_def = generic_doubles_def; // ... and re-use it again.
BOOST_PARSER_DEFINE_RULES(deque_doubles);

Now we have one of each, and we did not have to copy any parsing logic that would have to be maintained in two places.

Forward declaration

One of the advantages of using rules is that you can declare all your rules up front and then use them immediately afterward. This lets you make rules that use each other without introducing cycles:

namespace bp = boost::parser;

// Assume we have some polymorphic type that can be an object/dictionary,
// array, string, or int, called `value_type`.

bp::rule<class string, std::string> const string = "string";
bp::rule<class object_element, bp::tuple<std::string, value_type>> const object_element = "object-element";
bp::rule<class object, value_type> const object = "object";
bp::rule<class array, value_type> const array = "array";
bp::rule<class value_tag, value_type> const value = "value";

auto const string_def = bp::lexeme['"' >> *(bp::char_ - '"') > '"'];
auto const object_element_def = string > ':' > value;
auto const object_def = '{'_l >> -(object_element % ',') > '}';
auto const array_def = '['_l >> -(value % ',') > ']';
auto const value_def = bp::int_ | bp::bool_ | string | array | object;

BOOST_PARSER_DEFINE_RULES(string, object_element, object, array, value);

Here we have a parser for a Javascript-value-like type value_type. value_type may be an array, which itself may contain other arrays, objects, strings, etc. Since we need to be able to parse objects within arrays and vice versa, we need each of those two parsers to be able to refer to each other.

_val()

Inside all of a rule's semantic actions, the expression _val(ctx) is a reference to the attribute that the rule generates. This can be useful when you want subparsers to build up the attribute in a specific way:

namespace bp = boost::parser;
using namespace bp::literals;

bp::rule<class ints, std::vector<int>> const ints = "ints";
auto twenty_zeros = [](auto & ctx) { _val(ctx).resize(20, 0); };
auto push_back = [](auto & ctx) { _val(ctx).push_back(_attr(ctx)); };
auto const ints_def = "20-zeros"_l[twenty_zeros] | +bp::int_[push_back];
BOOST_PARSER_DEFINE_RULES(ints);
[Tip] Tip

That's just an example. It's almost always better to do things without using semantic actions. We could have instead written ints_def as "20-zeros" >> bp::attr(std::vector<int>(20)) | +bp::int_, which has the same semantics, is a lot easier to read, and is a lot less code.

Locals

The rule template takes another template parameter we have not discussed yet. You can pass a third parameter to rule, which will be available within semantic actions used in the rule as _locals(ctx). This gives your rule some local state, if it needs it:

struct foo_locals
{
    char first_value = 0;
};

namespace bp = boost::parser;

bp::rule<class foo, int, foo_locals> const foo = "foo";

auto record_first = [](auto & ctx) { _locals(ctx).first_value = _attr(ctx); }
auto check_against_first = [](auto & ctx) {
    char const first = _locals(ctx).first_value;
    char const attr = _attr(ctx);
    if (attr == first)
        _pass(ctx) = false;
    _val(ctx) = (int(first) << 8) | int(attr);
};

auto const foo_def = bp::cu[record_first] >> bp::cu[check_against_first];
BOOST_PARSER_DEFINE_RULES(foo);

foo matches the input if it can match two elements of the input in a row, but only if they are not the same value. Without locals, it's a lot harder to write parsers that have to track state as they parse.

Parameters

Sometimes, it is convenient to parameterize parsers. Consider this parsing rule from the YAML 1.2 spec:

[137] c-flow-sequence(n,c) ::= “[” s-separate(n,c)?
                               ns-s-flow-seq-entries(n,in-flow(c))? “]”

This YAML rule says that the parsing should proceed into two YAML subrules, both of which have these n and c parameters. It is certainly possible to transliterate these YAML parsing rules to something that uses unparameterized Boost.Parser rules, but it is quite painful to do so.

You give parameters to a rule by calling its with() member. The values you pass to with() are used to create a boost::parser::tuple that is available in semantic actions attached to the rule, using _params(ctx).

namespace bp = boost::parser;

// Declare our rules.
bp::rule</* ... */> foo = "foo";
bp::rule</* ... */> bar = "bar";

// Get the first parameter for this rule.
auto first_param = [](auto & ctx) {
    using namespace boost::hana::literals;
    return _params(ctx)[0_c];
};
auto const foo_def = bp::repeat(first_param)[' '_l]; // Match ' ' the number of times indicated by the first parameter to foo.

// Assume that bar has a locals struct with a local_indent member, and
// that set_local_indent and local_indent are lambdas that respectively write
// and read _locals(ctx).local_indent.

// Parse an integer, and then pass that as a parameter to foo.
auto const bar_def = bp::int_[set_local_indent] >> foo.with(local_indent);

BOOST_PARSER_DEFINE_RULES(foo, bar);

Passing parameters to rules like this allows you to easily write parsers that change the way they parse depending on contextual data that they have already parsed.

The _p variable template

Getting at one of a rule's arguments and passing it as an argument to another parser can be very verbose. _p is a variable template that allows you to refer to the nth argument to the current rule, so that you can, in turn, pass it to one of the rule's subparsers. Using this, foo_def above can be rewritten as:

auto const foo_def = bp::repeat(bp::_p<0>)[' '_l];

Using _p can prevent you from having to write a bunch of lambdas that get each get an argument out of the parse context using _params(ctx)[0_c] or similar.


PrevUpHomeNext