2
0
mirror of https://github.com/boostorg/parser.git synced 2026-01-21 17:12:16 +00:00
Files
parser/doc/html/boost_parser/introduction.html
2024-12-08 17:19:48 -06:00

112 lines
8.0 KiB
HTML
Raw Permalink Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Introduction</title>
<link rel="stylesheet" href="../boostbook.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
<link rel="home" href="../index.html" title="Chapter 1. Boost.Parser">
<link rel="up" href="../index.html" title="Chapter 1. Boost.Parser">
<link rel="prev" href="../index.html" title="Chapter 1. Boost.Parser">
<link rel="next" href="configuration_and_optional_features.html" title="Configuration and Optional Features">
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<div class="spirit-nav">
<a accesskey="p" href="../index.html"><img src="../images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../images/home.png" alt="Home"></a><a accesskey="n" href="configuration_and_optional_features.html"><img src="../images/next.png" alt="Next"></a>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="boost_parser.introduction"></a><a class="link" href="introduction.html" title="Introduction">Introduction</a>
</h2></div></div></div>
<p>
Boost.Parser is a <a href="https://en.wikipedia.org/wiki/Parser_combinator" target="_top">parser
combinator</a> library. That is, it consists of a set of low-level primitive
parsers, and operations that can be used to combine those parsers into more
complicated parsers.
</p>
<p>
There are primitive parsers that parse <span class="emphasis"><em>epsilon</em></span> (the empty
string), <code class="computeroutput"><span class="keyword">char</span></code>s, <code class="computeroutput"><span class="keyword">int</span></code>s, <code class="computeroutput"><span class="keyword">float</span></code>s,
etc.
</p>
<p>
There are operations which combine parsers to create new parsers. For instance,
the <a href="https://en.wikipedia.org/wiki/Kleene_star" target="_top">Kleene star</a>
operation takes an existing parser <code class="computeroutput"><span class="identifier">p</span></code>
and creates a new parser that matches zero or more occurrences of whatever
<code class="computeroutput"><span class="identifier">p</span></code> matches. Both callable objects
and operator overloads are used for the combining operations. For instance,
<code class="computeroutput"><span class="keyword">operator</span><span class="special">*()</span></code>
is used for <a href="https://en.wikipedia.org/wiki/Kleene_star" target="_top">Kleene star</a>,
and you can also write <code class="computeroutput"><span class="identifier">repeat</span><span class="special">(</span><span class="identifier">n</span><span class="special">)[</span><span class="identifier">p</span><span class="special">]</span></code> to create
a parser for exactly <code class="computeroutput"><span class="identifier">n</span></code> repetitions
of <code class="computeroutput"><span class="identifier">p</span></code>.
</p>
<p>
Boost.Parser also tries to accommodate the multiple ways that people often
want to get a parse result out of their parsing code. Some parsing may best
be done by returning an object that represents the result of the parse. Other
parsing may best be done by filling in a preexisting data structure. Yet other
parsing may best be done by parsing small sections of a large document, and
reporting the results of subparsers as they are finished, via callbacks. Boost.Parser
accommodates all these ways of working, and even makes it possible to do callback-based
or non-callback-based parsing without rewriting any code (except by changing
the top-level call from <code class="computeroutput"><a class="link" href="../boost/parser/parse_id2.html" title="Function template parse">parse()</a></code>
to <code class="computeroutput"><a class="link" href="../boost/parser/callback_parse_id6.html" title="Function template callback_parse">callback_parse()</a></code>).
</p>
<p>
All of Boost.Parser's public interfaces are sentinel- and range-friendly, just
like the interfaces in <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">ranges</span></code>.
</p>
<p>
Boost.Parser is Unicode-aware through and through. When you parse ranges of
<code class="computeroutput"><span class="keyword">char</span></code>, Boost.Parser does not assume
any particular encoding — not Unicode or any other encoding. Parsing
of inputs <span class="bold"><strong>other than</strong></span> plain <code class="computeroutput"><span class="keyword">char</span></code>s
assumes that the input is Unicode. In the Unicode-aware code paths, all parsing
is done by matching code points. This means that you can feed UTF-8 strings
into Boost.Parser, both as input and within your parser, and the right sort
of matching occurs. For instance, if your parser is trying to match repetitions
of the <code class="computeroutput"><span class="keyword">char</span></code> <code class="computeroutput"><span class="char">'\xcc'</span></code>
(which is a lead byte from a UTF-8 sequence, and so is malformed UTF-8 if not
followed by an appropriate UTF-8 code unit), it will <span class="bold"><strong>not</strong></span>
match the start of <code class="computeroutput"><span class="string">"\xcc\x80"</span></code>
(UTF-8 for the code point U+0300). Boost.Parser knows that the matching must
be whole-code-point, and so it interprets the <code class="computeroutput"><span class="keyword">char</span></code>
<code class="computeroutput"><span class="char">'\xcc'</span></code> as the code point U+00CC.
</p>
<p>
Error reporting is important to get right, and it is important to make errors
easy to understand, especially for end-users. Boost.Parser produces runtime
parse error messages that are very similar to the diagnostics that you get
when compiling with GCC and Clang (it even supports warnings that don't fail
the parse). The exact token associated with a diagnostic can be reported to
the user, with the containing line quoted, and with a marker pointing right
at the token. Boost.Parser takes care of this for you; your parser does not
need to include any special code to make this happen. Of course, you can also
replace the error handler entirely, if it doesn't fit your needs.
</p>
<p>
Debugging complex parsers can be a real nightmare. Boost.Parser makes it trivial
to get a trace of your entire parse, with easy-to-read (and very verbose) indications
of where each part of the trace is within the parse, the state of values produced
by the parse, etc. Again, you don't need to write any code to make this happen
— you just pass a parameter to <code class="computeroutput"><a class="link" href="../boost/parser/parse_id2.html" title="Function template parse">parse()</a></code>.
</p>
<p>
Dependencies are still a nightmare in C++, so Boost.Parser can be used as a
purely standalone library, independent of Boost.
</p>
</div>
<div class="copyright-footer">Copyright © 2020 T. Zachary Laine<p>
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
</p>
</div>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="../index.html"><img src="../images/prev.png" alt="Prev"></a><a accesskey="u" href="../index.html"><img src="../images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../images/home.png" alt="Home"></a><a accesskey="n" href="configuration_and_optional_features.html"><img src="../images/next.png" alt="Next"></a>
</div>
</body>
</html>