[/ / Distributed under the Boost Software License, Version 1.0. (See accompanying / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) /] [section Introduction] _Parser_ is a _comb_ library. That is, it consists of a set of low-level primitive parsers, and operations that can be used to combine those parsers into more complicated parsers. There are primitive parsers that parse /epsilon/ (the empty string), `char`s, `int`s, `float`s, etc. There are operations which combine parsers to create new parsers. For instance, the _kl_ operation takes an existing parser `p` and creates a new parser that matches zero or more occurrences of whatever `p` matches. Both callable objects and operator overloads are used for the combining operations. For instance, `operator*()` is used for _kl_, and you can also write `repeat(n)[p]` to create a parser for exactly `n` repetitions of `p`. _Parser_ also tries to accommodate the multiple ways that people often want to get a parse result out of their parsing code. Some parsing may best be done by returning an object that represents the result of the parse. Other parsing may best be done by filling in a preexisting data structure. Yet other parsing may best be done by parsing small sections of a large document, and reporting the results of subparsers as they are finished, via callbacks. _Parser_ accommodates all these ways of working, and even makes it possible to do callback-based or non-callback-based parsing without rewriting any code (except by changing the top-level call from _p_ to _cbp_). All of _Parser_'s public interfaces are sentinel- and range-friendly, just like the interfaces in `std::ranges`. _Parser_ is Unicode-aware through and through. When you parse ranges of `char`, _Parser_ does not assume any particular encoding _emdash_ not Unicode or any other encoding. Parsing of inputs *other than* plain `char`s assumes that the input is Unicode. In the Unicode-aware code paths, all parsing is done by matching code points. This means that you can feed UTF-8 strings into _Parser_, both as input and within your parser, and the right sort of matching occurs. For instance, if your parser is trying to match repetitions of the `char` `'\xcc'` (which is a lead byte from a UTF-8 sequence, and so is malformed UTF-8 if not followed by an appropriate UTF-8 code unit), it will *not* match the start of `"\xcc\x80"` (UTF-8 for the code point U+0300). _Parser_ knows that the matching must be whole-code-point, and so it interprets the `char` `'\xcc'` as the code point U+00CC. Error reporting is important to get right, and it is important to make errors easy to understand, especially for end-users. _Parser_ produces runtime parse error messages that are very similar to the diagnostics that you get when compiling with GCC and Clang (it even supports warnings that don't fail the parse). The exact token associated with a diagnostic can be reported to the user, with the containing line quoted, and with a marker pointing right at the token. _Parser_ takes care of this for you; your parser does not need to include any special code to make this happen. Of course, you can also replace the error handler entirely, if it doesn't fit your needs. Debugging complex parsers can be a real nightmare. _Parser_ makes it trivial to get a trace of your entire parse, with easy-to-read (and very verbose) indications of where each part of the trace is within the parse, the state of values produced by the parse, etc. Again, you don't need to write any code to make this happen _emdash_ you just pass a parameter to _p_. Dependencies are still a nightmare in C++, so _Parser_ can be used as a purely standalone library, independent of Boost. [endsect] [section Configuration and Optional Features] _Parser_ can be used entirely on its own. If Boost is available, extra functionality provided by Boost is also available. To use _Parser_ entirely on its own, simply define `BOOST_PARSER_DISABLE_HANA_TUPLE`. This will force _std_tup_ to be the tuple-template used throughout _Parser_. The Boost.Hana tuple is much nicer, because it has an `operator[]`; you will see this operator used throughout the tutorial and examples. [important _Parser_ defines a template alias _bp_tup_ that aliases to _bh_tup_ by default, and _std_tup_ when `BOOST_PARSER_DISABLE_HANA_TUPLE` is defined. You can future-proof your code slightly by using _bp_tup_, so that the code is well-formed, whether or not `BOOST_PARSER_DISABLE_HANA_TUPLE` is defined. For the same reason, _Parser_ also provides a generic _bp_get_ that works with both kinds of tuple (since _std_tup_ has no `operator[]` and _bh_tup_ does not work with `std::get`).] The presence of Boost headers is detected using `__has_include()`. When it is present, all the typical Boost conventions are used; otherwise, non-Boost alternatives are used. This applies to the use of `BOOST_ASSERT` versus `assert`, and printing typenames with Boost.TypeIndex versus with `std::typeinfo`. _Parser_ automatically treats aggregate `struct`s as if they were tuples in many cases. There is some metaprogramming logic that makes this work, and this logic has a hard limit on the size of a `struct` that it can operate on. There is a configuration macro _AGGR_SIZE_ that you can adjust if the default value is too small. Note that turning this value up significantly can significantly increase compile times. Also, MSVC seems to have a hard time with large values; I successfully set this value to `50` on MSVC, but `100` broke the MSVC build entirely. _Parser_ uses `std::optional` internally. There is no way to change this. However, when _Parser_ generates values as a result of the parse (see _attr_gen_), it can place them into other implementations of optional, if you tell it to do so. You tell it a template is usable as an optional by specializing the `enable_optional` template. For instance, here is how you would tell _Parser_ that `boost::optional` is an optional-type: template constexpr bool boost::parser::enable_optional> = true; The requirements on a template used as an optional are pretty simple, since _Parser_ does almost nothing but assign to them. For a type `O` to be a usable optional, you must be able to assign to `O`, and `O` must have an `operator*` that returns the stored value, or a (possibly cv-qualified) reference to the stored value. [endsect] [section This Library's Relationship to Boost.Spirit] [note If you are familiar with Spirit 2 and/or Spirit X3, you may be interested in this section. If you are not, and you have not read the tutorial for _Parser_ yet, very little of this will make sense.] _Spirit_ is a library that is already in Boost, and it has been around for a long time. However, it does not suit user needs in some ways. * Spirit 2 suffers from very long compile times. * Spirit 2 has error reporting that requires a lot of user intervention to work. * Spirit 2 requires user intervention, including a (long) recompile, to enable parse tracing. * Spirit X3 has rules that do not compose well _emdash_ the attributes produced by a rule can change depending on the context in which you use the rule. * Spirit X3 is missing many of the convenient interfaces to parsers that Spirit 2 had. For instance, you cannot add parameters to a parser. * All versions of Spirit have Unicode support, but it is quite difficult to get working. I wanted a library that does not suffer from any of the above limitations. It should be noted that while Spirit X3 only has a couple of flaws in the list above, the one related to rules is a deal-breaker. The ability to write rules, test them in isolation, and then re-use them throughout a complex parser is essential. Though no version of _Spirit_ (Spirit 2 or Spirit X3) suffers from all those limitations, there also does not exist any one version that avoids all of them. _Parser_ does so. However, there are a lot of great ideas in _Spirit_ that have been retained in _Parser_. Both libraries: * use the same operator overloads to combine parsers; * use approximately the same set of directives to influence the parse (e.g. `lexeme[]`); * provide loosely-coupled rules that are separately compilable (at least for Spirit X3); and * are built around a flexible parse context object that has state added to and removed from it during the parse (again, comparing to Spirit X3). [endsect]