We saw in the previous section how parse()
is flexible in what types it will accept as attribute out-parameters.
That flexibility is a blessing and a curse. For instance, say you wanted
to use the parser +char_ to parse a std::string,
and capture the result in a std::string.
+char_ generates an
attribute of std::vector<char> when
parsing a sequence of char,
so you'd have to write the result into a vector first, including all the
allocations that implies, and then you'd have to allocate space in a string,
and copy the entire result. Not great. The flexibility of attribute out-parameters
lets you avoid that. On the other hand, if you want to parse your result
into a std::vector<char>,
but accidentally pass a std::string,
the code is well-formed. Usually, we expect type mismatches like this to
be ill-formed in C++. Fortunately, rules help you address both
these concerns.
Every rule has a specific attribute type. If one is not specified, the rule has no attribute. The fact that the attribute is a specific type allows you to remove attribute flexibility. For instance, say we have a rule defined like this:
bp::rule<struct doubles, std::vector<double>> doubles = "doubles"; auto const doubles_def = bp::double_ % ','; BOOST_PARSER_DEFINE_RULE(doubles);
You can then use it in a call to parse(),
and parse() will return a std::optional<std::vector<double>>:
auto const result = bp::parse(input, doubles, bp::ws);
If you call parse() with an attribute out-parameter,
it must be exactly std::vector<double>:
std::vector<double> vec_result; bp::parse(input, doubles, bp::ws, vec_result); // Ok. std::deque<double> deque_result; bp::parse(input, doubles, bp::ws, deque_result); // Ill-formed!
If we wanted to use a std::deque<double> as the attribute type of our rule:
// Attribute changed to std::deque<double>. bp::rule<struct doubles, std::deque<double>> doubles = "doubles"; auto const doubles_def = bp::double_ % ','; BOOST_PARSER_DEFINE_RULES(doubles); int main() { std::deque<double> deque_result; bp::parse(input, doubles, bp::ws, deque_result); // Ok. }
So, the attribute flexibility is still available, but only within
the rule — the parser bp::double_ % ',' can parse
into a std::vector<double>
or a std::deque<double>,
but the rule doubles must
parse into only the exact attribute it was declared to generate.
The reason for this is that, inside the rule parsing implementation, there is code something like this:
using attr_t = ATTR(doubles_def);
attr_t attr;
parse(first, last, parser, attr);
attribute_out_param = std::move(attr);
Where attribute_out_param
is the attribute out-parameter we pass to parse().
If that final move assignment is ill-formed, the call to parse()
is too.
So, even though a rule reduces the flexibility of attributes it can generate, the fact that it is so easy to write a new rule means that we can use rules themselves to get the attribute flexibility we want across our code:
namespace bp = boost::parser; // We only need to write the definition once... auto const generic_doubles_def = bp::double_ % ','; bp::rule<struct vec_doubles, std::vector<double>> vec_doubles = "vec_doubles"; auto const & vec_doubles_def = generic_doubles_def; // ... and re-use it, BOOST_PARSER_DEFINE_RULES(vec_doubles); // Attribute changed to std::deque<double>. bp::rule<struct deque_doubles, std::deque<double>> deque_doubles = "deque_doubles"; auto const & deque_doubles_def = generic_doubles_def; // ... and re-use it again. BOOST_PARSER_DEFINE_RULES(deque_doubles);
Now we have one of each, and we did not have to copy any parsing logic that would have to be maintained in two places.
One of the advantages of using rules is that you can declare all your rules up front and then use them immediately afterward. This lets you make rules that use each other without introducing cycles:
namespace bp = boost::parser; // Assume we have some polymorphic type that can be an object/dictionary, // array, string, or int, called `value_type`. bp::rule<class string, std::string> const string = "string"; bp::rule<class object_element, bp::tuple<std::string, value_type>> const object_element = "object-element"; bp::rule<class object, value_type> const object = "object"; bp::rule<class array, value_type> const array = "array"; bp::rule<class value_tag, value_type> const value = "value"; auto const string_def = bp::lexeme['"' >> *(bp::char_ - '"') > '"']; auto const object_element_def = string > ':' > value; auto const object_def = '{'_l >> -(object_element % ',') > '}'; auto const array_def = '['_l >> -(value % ',') > ']'; auto const value_def = bp::int_ | bp::bool_ | string | array | object; BOOST_PARSER_DEFINE_RULES(string, object_element, object, array, value);
Here we have a parser for a Javascript-value-like type value_type.
value_type may be an array,
which itself may contain other arrays, objects, strings, etc. Since we need
to be able to parse objects within arrays and vice versa, we need each of
those two parsers to be able to refer to each other.
Inside all of a rule's semantic actions, the expression _val(ctx)
is a reference to the attribute that the rule generates. This can be useful
when you want subparsers to build up the attribute in a specific way:
namespace bp = boost::parser; using namespace bp::literals; bp::rule<class ints, std::vector<int>> const ints = "ints"; auto twenty_zeros = [](auto & ctx) { _val(ctx).resize(20, 0); }; auto push_back = [](auto & ctx) { _val(ctx).push_back(_attr(ctx)); }; auto const ints_def = "20-zeros"_l[twenty_zeros] | +bp::int_[push_back]; BOOST_PARSER_DEFINE_RULES(ints);
![]() |
Tip |
|---|---|
That's just an example. It's almost always better to do things without
using semantic actions. We could have instead written |
The rule
template takes another template parameter we have not discussed yet. You
can pass a third parameter to rule, which will be available
within semantic actions used in the rule as _locals(ctx). This
gives your rule some local state, if it needs it:
struct foo_locals { char first_value = 0; }; namespace bp = boost::parser; bp::rule<class foo, int, foo_locals> const foo = "foo"; auto record_first = [](auto & ctx) { _locals(ctx).first_value = _attr(ctx); } auto check_against_first = [](auto & ctx) { char const first = _locals(ctx).first_value; char const attr = _attr(ctx); if (attr == first) _pass(ctx) = false; _val(ctx) = (int(first) << 8) | int(attr); }; auto const foo_def = bp::cu[record_first] >> bp::cu[check_against_first]; BOOST_PARSER_DEFINE_RULES(foo);
foo matches the input if
it can match two elements of the input in a row, but only if they are not
the same value. Without locals, it's a lot harder to write parsers that have
to track state as they parse.
Sometimes, it is convenient to parameterize parsers. Consider this parsing rule from the YAML 1.2 spec:
[137] c-flow-sequence(n,c) ::= “[” s-separate(n,c)?
ns-s-flow-seq-entries(n,in-flow(c))? “]”
This YAML rule says that the parsing should proceed into two YAML subrules,
both of which have these n
and c parameters. It is certainly
possible to transliterate these YAML parsing rules to something that uses
unparameterized Boost.Parser rules, but it is quite painful
to do so.
You give parameters to a rule by calling its with()
member. The values you pass to with() are used to create a boost::parser::tuple that is available in
semantic actions attached to the rule, using _params(ctx).
namespace bp = boost::parser; // Declare our rules. bp::rule</* ... */> foo = "foo"; bp::rule</* ... */> bar = "bar"; // Get the first parameter for this rule. auto first_param = [](auto & ctx) { using namespace boost::hana::literals; return _params(ctx)[0_c]; }; auto const foo_def = bp::repeat(first_param)[' '_l]; // Match ' ' the number of times indicated by the first parameter to foo. // Assume that bar has a locals struct with a local_indent member, and // that set_local_indent and local_indent are lambdas that respectively write // and read _locals(ctx).local_indent. // Parse an integer, and then pass that as a parameter to foo. auto const bar_def = bp::int_[set_local_indent] >> foo.with(local_indent); BOOST_PARSER_DEFINE_RULES(foo, bar);
Passing parameters to rules like this allows you
to easily write parsers that change the way they parse depending on contextual
data that they have already parsed.
Getting at one of a rule's arguments and passing it as an argument to another
parser can be very verbose. _p is a variable template
that allows you to refer to the nth
argument to the current rule, so that you can, in turn, pass it to one of
the rule's subparsers. Using this, foo_def
above can be rewritten as:
auto const foo_def = bp::repeat(bp::_p<0>)[' '_l];
Using _p
can prevent you from having to write a bunch of lambdas that get each get
an argument out of the parse context using _params(ctx)[0_c] or
similar.