From 262c19e4414a59be1662b4f182d5d80488c46ab8 Mon Sep 17 00:00:00 2001 From: Zach Laine Date: Sat, 9 Mar 2024 20:21:02 -0600 Subject: [PATCH] Explain how seq_parser combining logic interacts with directives. Fixes #161. --- doc/parser.qbk | 11 +++++++++ doc/tutorial.qbk | 64 ++++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 73 insertions(+), 2 deletions(-) diff --git a/doc/parser.qbk b/doc/parser.qbk index cfa0bae5..89124b72 100644 --- a/doc/parser.qbk +++ b/doc/parser.qbk @@ -69,6 +69,17 @@ [def _cb_r_ [classref boost::parser::callback_rule `callback_rule`]] [def _cb_rs_ [classref boost::parser::callback_rules `callback_rule`s]] +[def _skp_p_ [classref boost::parser::skip_parser `skip_parser`]] +[def _xfm_p_ [classref boost::parser::transform_parser `tranform_parser`]] +[def _noc_p_ [classref boost::parser::no_case_parser `no_case_parser`]] +[def _sv_p_ [classref boost::parser::string_view_parser `string_view_parser`]] +[def _raw_p_ [classref boost::parser::raw_parser `raw_parser`]] +[def _omt_p_ [classref boost::parser::omit_parser `omit_parser`]] +[def _rpt_p_ [classref boost::parser::repeat_parser `repeat_parser`]] +[def _lex_p_ [classref boost::parser::lexeme_parser `lexeme_parser`]] +[def _seq_p_ [classref boost::parser::seq_parser `seq_parser`]] +[def _seq_ps_ [classref boost::parser::seq_parser `seq_parser`s]] + [def _bp_tup_ [classref boost::parser::tuple `boost::parser::tuple`]] [def _bp_get_ [funcref boost::parser::get `boost::parser::get`]] [def _bh_tup_ `boost::hana::tuple`] diff --git a/doc/tutorial.qbk b/doc/tutorial.qbk index db73651c..ad3d2711 100644 --- a/doc/tutorial.qbk +++ b/doc/tutorial.qbk @@ -1196,9 +1196,35 @@ Non-directives might, but only when attaching a semantic action. The directives that are second order parsers are technically directives, but since they are also used to create parsers, it is more useful just to focus on that. The directives _rpt_ and _if_ were already described in the section on -parsers; we won't say more about them here. +parsers; we won't say much about them here. -That leaves the directives that affect aspects of the parse: +[heading Interaction with sequence parsers] + +Sequence and alternative parsers do not nest in most cases. (Let's consider +just sequence parsers to keep thinkgs simple, but all this logic applies to +alternative parsers as well.) `a >> b >> c` is the same as `(a >> b) >> c` +and `a >> (b >> c)`, and they are each represented by a single _seq_p_ with +three subparsers, `a`, `b`, and `c`. However, if something prevents two +_seq_ps_ from interacting directly, they *will* nest. For instance, `lexeme[a +>> b] >> c` is a _seq_p_ containing two parsers, `lexeme[a >> b]` and `c`. +This is because _lexeme_ takes its given parser and wraps it in a _lex_p_. +This in turn turns off the sequence parser combining logic, since both sides +of the second `operator>>` in `lexeme[a >> b] >> c` are not _seq_ps_. +Sequence parsers have several rules that govern what the overall attribute +type of the parser is, based on the positions and attributes of it subparsers +(see _attr_gen_). Therefore, it's important to know which directives create a +new parser (and what kind), and which ones do not; this is indicated for each +directive below. + +[heading The directives] + +[heading _rpt_] + +See _parsers_uses_. Creates a _rpt_p_. + +[heading _if_] + +See _parsers_uses_. Creates a _seq_p_. [heading _omit_] @@ -1213,6 +1239,8 @@ many times `p` can match a string (where the matches are non-overlapping). Instead of using `p` directly, and building all those attributes, or rewriting `p` without the attribute generation, use _omit_. +Creates an _omt_p_. + [heading _raw_] `_raw_np_[p]` changes the attribute from `_ATTR_np_(p)` to to a view that @@ -1227,6 +1255,8 @@ iterator. Just like _omit_, _raw_ causes all attribute-generation work within Similar to the re-use scenario for _omit_ above, _raw_ could be used to find the *locations* of all non-overlapping matches of `p` in a string. +Creates a _raw_p_. + [heading _string_view_] `_string_view_np_[p]` is very similar to `_raw_np_[p]`, except that it changes @@ -1241,6 +1271,8 @@ to find the *locations* of all non-overlapping matches of `p` in a string. Whether _raw_ or _string_view_ is more natural to use to report the locations depends on your use case, but they are essentially the same. +Creates a _sv_p_. + [heading _no_case_] `_no_case_np_[p]` enables case-insensitive parsing within the parse of `p`. @@ -1281,6 +1313,8 @@ sometimes expand a code point into multiple code points (e.g. case folding `"ẞ"` yields `"ss"`. When such a multi-code point expansion occurs, the expanded code points are in the NFKC normalization form.] +Creates a _noc_p_. + [heading _lexeme_] `_lexeme_np_[p]` disables use of the skipper, if a skipper is being used, @@ -1296,6 +1330,8 @@ _lexeme_: Without _lexeme_, our string parser would correctly match `"foo bar"`, but the generated attribute would be `"foobar"`. +Creates a _lex_p_. + [heading _skip_] _skip_ is like the inverse of _lexeme_. It enables skipping in the parse, @@ -1322,11 +1358,17 @@ call: The first occurrence of `zero_or_more` will use the skipper passed to _p_, which is _ws_; the second will use _blank_ as its skipper. +Creates a _skp_p_. + [heading _merge_, _sep_, and _transform_] These directives influence the generation of attributes. See _attr_gen_ section for more details on them. +_merge_ and _sep_ create a copy of the given _seq_p_. + +_transform_ creates a _xfm_p_. + [endsect] [section Combining Operations] @@ -1630,6 +1672,24 @@ created a merge group above, we disabled the default behavior in which the them. Since they are all treated as separate entities, and since they have different attribute types, the use of _merge_ is an error. +Many directives create a new parser out of the parser they are given. _merge_ +and _sep_ do not. Since they operate only on sequence parsers, all they do is +create a copy of the sequence parser they are given. The _seq_p_ template has +a template parameter `CombiningGroups`, and all _merge_ and _sep_ do is take a +given _seq_p_ and create a copy of it with a different `CombiningGroups` +template parameter. This means that _merge_ and _sep_ are can be ignored in +`operator>>` expressions much like parentheses are. Consider an example. + + namespace bp = boost::parser; + constexpr auto parser1 = bp::separate[bp::int_ >> bp::int_] >> bp::int_; + constexpr auto parser2 = bp::lexeme[bp::int_ >> ' ' >> bp::int_] >> bp::int_; + +Note that _sep_ is a no-op here; it's only being used this way for this +example. These parsers have different attribute types. `_ATTR_np_(parser1)` +is `_bp_tup_(int, int, int)`. `_ATTR_np_(parser2)` is `_bp_tup_(_bp_tup_(int, +int), int)`. This is because `bp::lexeme[]` wraps its given parser in a new +parser. _merge_ does not. That's why, even though `parser1` and `parser2` +look so structurally similar, they have different attributes. [heading _transform_]