mirror of
https://github.com/boostorg/parser.git
synced 2026-01-21 17:12:16 +00:00
112 lines
8.0 KiB
HTML
112 lines
8.0 KiB
HTML
<html>
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
||
<title>Introduction</title>
|
||
<link rel="stylesheet" href="../boostbook.css" type="text/css">
|
||
<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
|
||
<link rel="home" href="../index.html" title="Chapter 1. Boost.Parser">
|
||
<link rel="up" href="../index.html" title="Chapter 1. Boost.Parser">
|
||
<link rel="prev" href="../index.html" title="Chapter 1. Boost.Parser">
|
||
<link rel="next" href="configuration_and_optional_features.html" title="Configuration and Optional Features">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||
</head>
|
||
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
|
||
<div class="spirit-nav">
|
||
<a accesskey="p" href="../index.html"><img src="../images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../images/home.png" alt="Home"></a><a accesskey="n" href="configuration_and_optional_features.html"><img src="../images/next.png" alt="Next"></a>
|
||
</div>
|
||
<div class="section">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="boost_parser.introduction"></a><a class="link" href="introduction.html" title="Introduction">Introduction</a>
|
||
</h2></div></div></div>
|
||
<p>
|
||
Boost.Parser is a <a href="https://en.wikipedia.org/wiki/Parser_combinator" target="_top">parser
|
||
combinator</a> library. That is, it consists of a set of low-level primitive
|
||
parsers, and operations that can be used to combine those parsers into more
|
||
complicated parsers.
|
||
</p>
|
||
<p>
|
||
There are primitive parsers that parse <span class="emphasis"><em>epsilon</em></span> (the empty
|
||
string), <code class="computeroutput"><span class="keyword">char</span></code>s, <code class="computeroutput"><span class="keyword">int</span></code>s, <code class="computeroutput"><span class="keyword">float</span></code>s,
|
||
etc.
|
||
</p>
|
||
<p>
|
||
There are operations which combine parsers to create new parsers. For instance,
|
||
the <a href="https://en.wikipedia.org/wiki/Kleene_star" target="_top">Kleene star</a>
|
||
operation takes an existing parser <code class="computeroutput"><span class="identifier">p</span></code>
|
||
and creates a new parser that matches zero or more occurrences of whatever
|
||
<code class="computeroutput"><span class="identifier">p</span></code> matches. Both callable objects
|
||
and operator overloads are used for the combining operations. For instance,
|
||
<code class="computeroutput"><span class="keyword">operator</span><span class="special">*()</span></code>
|
||
is used for <a href="https://en.wikipedia.org/wiki/Kleene_star" target="_top">Kleene star</a>,
|
||
and you can also write <code class="computeroutput"><span class="identifier">repeat</span><span class="special">(</span><span class="identifier">n</span><span class="special">)[</span><span class="identifier">p</span><span class="special">]</span></code> to create
|
||
a parser for exactly <code class="computeroutput"><span class="identifier">n</span></code> repetitions
|
||
of <code class="computeroutput"><span class="identifier">p</span></code>.
|
||
</p>
|
||
<p>
|
||
Boost.Parser also tries to accommodate the multiple ways that people often
|
||
want to get a parse result out of their parsing code. Some parsing may best
|
||
be done by returning an object that represents the result of the parse. Other
|
||
parsing may best be done by filling in a preexisting data structure. Yet other
|
||
parsing may best be done by parsing small sections of a large document, and
|
||
reporting the results of subparsers as they are finished, via callbacks. Boost.Parser
|
||
accommodates all these ways of working, and even makes it possible to do callback-based
|
||
or non-callback-based parsing without rewriting any code (except by changing
|
||
the top-level call from <code class="computeroutput"><a class="link" href="../boost/parser/parse_id2.html" title="Function template parse">parse()</a></code>
|
||
to <code class="computeroutput"><a class="link" href="../boost/parser/callback_parse_id6.html" title="Function template callback_parse">callback_parse()</a></code>).
|
||
</p>
|
||
<p>
|
||
All of Boost.Parser's public interfaces are sentinel- and range-friendly, just
|
||
like the interfaces in <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">ranges</span></code>.
|
||
</p>
|
||
<p>
|
||
Boost.Parser is Unicode-aware through and through. When you parse ranges of
|
||
<code class="computeroutput"><span class="keyword">char</span></code>, Boost.Parser does not assume
|
||
any particular encoding — not Unicode or any other encoding. Parsing
|
||
of inputs <span class="bold"><strong>other than</strong></span> plain <code class="computeroutput"><span class="keyword">char</span></code>s
|
||
assumes that the input is Unicode. In the Unicode-aware code paths, all parsing
|
||
is done by matching code points. This means that you can feed UTF-8 strings
|
||
into Boost.Parser, both as input and within your parser, and the right sort
|
||
of matching occurs. For instance, if your parser is trying to match repetitions
|
||
of the <code class="computeroutput"><span class="keyword">char</span></code> <code class="computeroutput"><span class="char">'\xcc'</span></code>
|
||
(which is a lead byte from a UTF-8 sequence, and so is malformed UTF-8 if not
|
||
followed by an appropriate UTF-8 code unit), it will <span class="bold"><strong>not</strong></span>
|
||
match the start of <code class="computeroutput"><span class="string">"\xcc\x80"</span></code>
|
||
(UTF-8 for the code point U+0300). Boost.Parser knows that the matching must
|
||
be whole-code-point, and so it interprets the <code class="computeroutput"><span class="keyword">char</span></code>
|
||
<code class="computeroutput"><span class="char">'\xcc'</span></code> as the code point U+00CC.
|
||
</p>
|
||
<p>
|
||
Error reporting is important to get right, and it is important to make errors
|
||
easy to understand, especially for end-users. Boost.Parser produces runtime
|
||
parse error messages that are very similar to the diagnostics that you get
|
||
when compiling with GCC and Clang (it even supports warnings that don't fail
|
||
the parse). The exact token associated with a diagnostic can be reported to
|
||
the user, with the containing line quoted, and with a marker pointing right
|
||
at the token. Boost.Parser takes care of this for you; your parser does not
|
||
need to include any special code to make this happen. Of course, you can also
|
||
replace the error handler entirely, if it doesn't fit your needs.
|
||
</p>
|
||
<p>
|
||
Debugging complex parsers can be a real nightmare. Boost.Parser makes it trivial
|
||
to get a trace of your entire parse, with easy-to-read (and very verbose) indications
|
||
of where each part of the trace is within the parse, the state of values produced
|
||
by the parse, etc. Again, you don't need to write any code to make this happen
|
||
— you just pass a parameter to <code class="computeroutput"><a class="link" href="../boost/parser/parse_id2.html" title="Function template parse">parse()</a></code>.
|
||
</p>
|
||
<p>
|
||
Dependencies are still a nightmare in C++, so Boost.Parser can be used as a
|
||
purely standalone library, independent of Boost.
|
||
</p>
|
||
</div>
|
||
<div class="copyright-footer">Copyright © 2020 T. Zachary Laine<p>
|
||
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
||
file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
|
||
</p>
|
||
</div>
|
||
<hr>
|
||
<div class="spirit-nav">
|
||
<a accesskey="p" href="../index.html"><img src="../images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../images/home.png" alt="Home"></a><a accesskey="n" href="configuration_and_optional_features.html"><img src="../images/next.png" alt="Next"></a>
|
||
</div>
|
||
</body>
|
||
</html>
|