mirror of
https://github.com/boostorg/spirit.git
synced 2026-01-19 04:42:11 +00:00
Annotation tutorial example docs
This commit is contained in:
@@ -197,6 +197,7 @@ __version__).
|
||||
[include tutorial/num_list4.qbk]
|
||||
[include tutorial/roman.qbk]
|
||||
[include tutorial/employee.qbk]
|
||||
[include tutorial/annotation.qbk]
|
||||
[include tutorial/rexpr.qbk]
|
||||
[endsect]
|
||||
|
||||
|
||||
393
doc/x3/tutorial/annotation.qbk
Normal file
393
doc/x3/tutorial/annotation.qbk
Normal file
@@ -0,0 +1,393 @@
|
||||
[/==============================================================================
|
||||
Copyright (C) 2001-2015 Joel de Guzman
|
||||
Copyright (C) 2001-2011 Hartmut Kaiser
|
||||
|
||||
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
||||
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
||||
|
||||
I would like to thank Rainbowverse, llc (https://primeorbial.com/)
|
||||
for sponsoring this work and donating it to the community.
|
||||
===============================================================================/]
|
||||
|
||||
[section Annotations - Decorating the ASTs]
|
||||
|
||||
Stop and think about it... We're actually generating ASTs (abstract
|
||||
syntax trees) in our previoius examples. We parsed a single structure and
|
||||
generated an in-memory representation of it in the form of a struct: the
|
||||
struct employee. If we changed the implementation to parse one or more
|
||||
employees, the result would be a std::vector<employee>. We can go on and
|
||||
add more hierarchy: teams, departments, corporations. Then we'll have an
|
||||
AST representation of it all.
|
||||
|
||||
This example shows how to annotate the AST with the iterator positions
|
||||
for access to the source code when post processing using a client supplied
|
||||
`on_success` handler. The example will show how to get the position in
|
||||
input source stream that corresponds to a given element in the AST.
|
||||
|
||||
In addition, This example also shows how to "inject" client data, using
|
||||
the "with" directive, that the `on_success` handler can access as it is
|
||||
called within the parse traversal through the parser's context.
|
||||
|
||||
The full cpp file for this example can be found here: [@../../../example/x3/annotation.cpp]
|
||||
|
||||
[heading The AST]
|
||||
|
||||
First, we'll update our previous employee struct, this time separating
|
||||
the person into its own struct. So now, we have two structs, the `person`
|
||||
and the `employee`. Take note too that we now inherit `person` and `employee`
|
||||
from `x3::position_tagged` which provides positional information that we
|
||||
can use to tell the AST's position in the input stream anytime.
|
||||
|
||||
namespace client { namespace ast
|
||||
{
|
||||
struct person : x3::position_tagged
|
||||
{
|
||||
person(
|
||||
std::string const& first_name = ""
|
||||
, std::string const& last_name = ""
|
||||
)
|
||||
: first_name(first_name)
|
||||
, last_name(last_name)
|
||||
{}
|
||||
|
||||
std::string first_name, last_name;
|
||||
};
|
||||
|
||||
struct employee : x3::position_tagged
|
||||
{
|
||||
int age;
|
||||
person who;
|
||||
double salary;
|
||||
};
|
||||
}}
|
||||
|
||||
Like before, we need to tell __fusion__ about our structs to make them first-class
|
||||
fusion citizens that the grammar can utilize:
|
||||
|
||||
BOOST_FUSION_ADAPT_STRUCT(client::ast::person,
|
||||
first_name, last_name
|
||||
)
|
||||
|
||||
BOOST_FUSION_ADAPT_STRUCT(client::ast::employee,
|
||||
age, who, salary
|
||||
)
|
||||
|
||||
[heading x3::position_cache]
|
||||
|
||||
Before we proceed, let me introduce a helper class called the `position_cache`.
|
||||
It is a simple class that collects iterator ranges that point to where each
|
||||
element in the AST are located in the input stream. Given an AST, you can
|
||||
ask the position_cache about its position. For example:
|
||||
|
||||
auto pos = positions.position_of(my_ast);
|
||||
|
||||
Where `my_ast` is the AST, `positions` and is the `position_cache`, `position_of`
|
||||
returns an iterator range that points to the start and end (`pos.begin()`
|
||||
and `pos.end()`) positions where the AST was parsed from. `positions.begin()`
|
||||
and `positions.end()` points to the start and end of the entire input stream.
|
||||
|
||||
[heading on_success]
|
||||
|
||||
The `on_success` gives you everything you want from semantic actions without
|
||||
the visual clutter. Declarative code can and should be free from imperative
|
||||
code. `on_success` as a concept and mechanism is an important departure
|
||||
from how things are done in Spirit's previous version: Qi.
|
||||
|
||||
As demonstrated in the previous employee example, the preferred way to
|
||||
extract data from an input source is by having the parser collect the data
|
||||
for us into C++ structs as it traverses the input stream. Ideally, Spirit
|
||||
X3 grammars are fully attributed and declared in such a way that you do
|
||||
not have to add any imperative code and there should be no need for semantic
|
||||
actions at all. The parser simply works as declared and you get your data
|
||||
back as a result.
|
||||
|
||||
However, there are certain cases where there's no way to avoid introducing
|
||||
imperative code. Yet, if we want to keep our code clean and free from semantic
|
||||
actions that messes up our clean declarative grammars, `on_success` handlers
|
||||
are alternative means to provide hooks to client code that is executed by the
|
||||
parser upon successful parse without polluting the grammar. Like semantic
|
||||
actions, `on_success` handlers also have access to the AST, the iterators,
|
||||
and context. But, unlike semantic actions, `on_success` handlers are cleanly
|
||||
separated from the actual grammar.
|
||||
|
||||
[heading Annotation Handler]
|
||||
|
||||
As discussed, we annotate the AST with its position in the input stream with
|
||||
our `on_success` handler:
|
||||
|
||||
// tag used to get the position cache from the context
|
||||
struct position_cache_tag;
|
||||
|
||||
struct annotate_position
|
||||
{
|
||||
template <typename T, typename Iterator, typename Context>
|
||||
inline void on_success(Iterator const& first, Iterator const& last
|
||||
, T& ast, Context const& context)
|
||||
{
|
||||
auto& position_cache = x3::get<position_cache_tag>(context).get();
|
||||
position_cache.annotate(ast, first, last);
|
||||
}
|
||||
};
|
||||
|
||||
`position_cache_tag` is a special tag we will use to get a reference to the
|
||||
actual `position_cache`, client data that we will inject at very start, when
|
||||
we call parse. More on that later.
|
||||
|
||||
Our `on_success` handler gets a reference to the actual `position_cache`
|
||||
and calls its `annotate` member function, passing in the AST and the iterators.
|
||||
`position_cache.annotate(ast, first, last)` annotates the AST with information
|
||||
required by `x3::position_tagged`.
|
||||
|
||||
[heading The Parser]
|
||||
|
||||
Now we'll write a parser for our employee. Like before, inputs will
|
||||
be of the form:
|
||||
|
||||
employee{ age, "forename", "surname", salary }
|
||||
|
||||
Here we go:
|
||||
|
||||
namespace parser
|
||||
{
|
||||
using x3::int_;
|
||||
using x3::double_;
|
||||
using x3::lexeme;
|
||||
using ascii::char_;
|
||||
|
||||
struct quoted_string_class;
|
||||
struct person_class;
|
||||
struct employee_class;
|
||||
|
||||
x3::rule<quoted_string_class, std::string> const quoted_string = "quoted_string";
|
||||
x3::rule<person_class, ast::person> const person = "person";
|
||||
x3::rule<employee_class, ast::employee> const employee = "employee";
|
||||
|
||||
auto const quoted_string_def = lexeme['"' >> +(char_ - '"') >> '"'];
|
||||
auto const person_def = quoted_string >> ',' >> quoted_string;
|
||||
|
||||
auto const employee_def =
|
||||
'{'
|
||||
>> int_ >> ','
|
||||
>> person >> ','
|
||||
>> double_
|
||||
>> '}'
|
||||
;
|
||||
|
||||
auto const employees = employee >> *(',' >> employee);
|
||||
|
||||
BOOST_SPIRIT_DEFINE(quoted_string, person, employee);
|
||||
}
|
||||
|
||||
Take a step back and look at the previous Employee example. We are incrementally
|
||||
building on top of that.
|
||||
|
||||
[heading Rule Declarations]
|
||||
|
||||
struct quoted_string_class;
|
||||
struct person_class;
|
||||
struct employee_class;
|
||||
|
||||
x3::rule<quoted_string_class, std::string> const quoted_string = "quoted_string";
|
||||
x3::rule<person_class, ast::person> const person = "person";
|
||||
x3::rule<employee_class, ast::employee> const employee = "employee";
|
||||
|
||||
What has changed?
|
||||
|
||||
* We split the single employee rule into three smaller rules: `quoted_string`,
|
||||
`person` and `employee`.
|
||||
* We're using forward declared rule classes: `quoted_string_class`, `person_class`,
|
||||
and `employee_class`.
|
||||
|
||||
[heading Rule Classes]
|
||||
|
||||
In this example, the rule classes, `quoted_string_class`, `person_class`, and
|
||||
`employee_class` provide statically known IDs for the rules required by X3 to
|
||||
perform its tasks. In addition from that, the rule class can also be extended
|
||||
to have some user-defined customization hooks that are called:
|
||||
|
||||
* On success: After a rule sucessfully parses an input.
|
||||
* On Error: After a rule fails to parse.
|
||||
|
||||
By subclassing the rule class from a client supplied handler such as
|
||||
our our `annotate_position` handler above:
|
||||
|
||||
struct person_class : annotate_position {};
|
||||
struct employee_class : annotate_position {};
|
||||
|
||||
The code above tells X3 to check the rule class if it has an `on_success`
|
||||
or `on_error` member functions and calls appropriately calls them on
|
||||
such events.
|
||||
|
||||
[heading The with Directive]
|
||||
|
||||
With any parser `'p`, one can inject any data into that the that semantic
|
||||
actions and handlers can access later on when they are called. The general
|
||||
syntax is:
|
||||
|
||||
with<tag>(data)[p]
|
||||
|
||||
For our particular example, we use to inject the `position_cache` into
|
||||
the parse for our `annotate_position` on_success handler to have access
|
||||
to:
|
||||
|
||||
auto const parser =
|
||||
// we pass our position_cache to the parser so we can access
|
||||
// it later in our on_sucess handlers
|
||||
with<position_cache_tag>(std::ref(positions))
|
||||
[
|
||||
employees
|
||||
];
|
||||
|
||||
Typically this is done just before calling `x3::parse` or `x3::phrase_parse`.
|
||||
`with` is a very lightwight operation. It is possible to inject as much
|
||||
data as you want, even multiple `with` directives:
|
||||
|
||||
with<tag1>(data1)
|
||||
[
|
||||
with<tag2>(data2)[p]
|
||||
]
|
||||
|
||||
Multiple `with` directives can (perhaps not obviously) be injected from the
|
||||
outside caller function. Here's an outline:
|
||||
|
||||
template <typename Parser>
|
||||
void bar(Parser const& p)
|
||||
{
|
||||
// Inject data2
|
||||
auto const parser = with<tag2>(data2)[p];
|
||||
x3::parse(first, last, parser);
|
||||
}
|
||||
|
||||
void foo()
|
||||
{
|
||||
// Inject data1
|
||||
auto const parser = with<tag1>(data1)[my_parser];
|
||||
bar(p);
|
||||
}
|
||||
|
||||
[heading Let's Parse]
|
||||
|
||||
Now we have the complete parse mechanism:
|
||||
|
||||
using iterator_type = std::string::const_iterator;
|
||||
using position_cache = boost::spirit::x3::position_cache<std::vector<iterator_type>>;
|
||||
|
||||
std::vector<client::ast::employee>
|
||||
parse(std::string const& input, position_cache& positions)
|
||||
{
|
||||
using boost::spirit::x3::ascii::space;
|
||||
|
||||
std::vector<client::ast::employee> ast;
|
||||
iterator_type iter = input.begin();
|
||||
iterator_type const end = input.end();
|
||||
|
||||
using boost::spirit::x3::with;
|
||||
|
||||
// Our parser
|
||||
using client::parser::employees;
|
||||
using client::parser::position_cache_tag;
|
||||
|
||||
auto const parser =
|
||||
// we pass our position_cache to the parser so we can access
|
||||
// it later in our on_sucess handlers
|
||||
with<position_cache_tag>(std::ref(positions))
|
||||
[
|
||||
employees
|
||||
];
|
||||
|
||||
bool r = phrase_parse(iter, end, parser, space, ast);
|
||||
|
||||
// ... Some error checking here
|
||||
|
||||
return ast;
|
||||
}
|
||||
|
||||
Let's walk through the code.
|
||||
|
||||
First, we have some typedefs for 1) The iterator type we are using for the
|
||||
parser, `iterator_type` and 2) For the `position_cache` type. The latter
|
||||
is a template that accepts the type of container it will hold. In this case,
|
||||
a `std::vector<iterator_type>`.
|
||||
|
||||
The main parse function accepts an input, a std::string and a reference
|
||||
to a position_cache, and retuns an AST: `std::vector<client::ast::employee>`.
|
||||
|
||||
Inside the parse function, we first create an AST where parsed data will
|
||||
be stored:
|
||||
|
||||
std::vector<client::ast::employee> ast;
|
||||
|
||||
Then finally, we create a parser, injecting a reference to the `position_cache`,
|
||||
and call phrase_parse:
|
||||
|
||||
using client::parser::employees;
|
||||
using client::parser::position_cache_tag;
|
||||
|
||||
auto const parser =
|
||||
// we pass our position_cache to the parser so we can access
|
||||
// it later in our on_sucess handlers
|
||||
with<position_cache_tag>(std::ref(positions))
|
||||
[
|
||||
employees
|
||||
];
|
||||
|
||||
bool r = phrase_parse(iter, end, parser, space, ast);
|
||||
|
||||
On successful parse, the AST, `ast`, will contain the actual parsed data.
|
||||
|
||||
[heading Getting The Source Positions]
|
||||
|
||||
Now that we have our main parse function, let's have an example sourcefile to
|
||||
parse and show how we can obtain the position of an AST element, returned on
|
||||
a successful parse.
|
||||
|
||||
Given this input:
|
||||
|
||||
std::string input = R"(
|
||||
{
|
||||
23,
|
||||
"Amanda",
|
||||
"Stefanski",
|
||||
1000.99
|
||||
},
|
||||
{
|
||||
35,
|
||||
"Angie",
|
||||
"Chilcote",
|
||||
2000.99
|
||||
},
|
||||
{
|
||||
43,
|
||||
"Dannie",
|
||||
"Dillinger",
|
||||
3000.99
|
||||
},
|
||||
{
|
||||
22,
|
||||
"Dorene",
|
||||
"Dole",
|
||||
2500.99
|
||||
},
|
||||
{
|
||||
38,
|
||||
"Rossana",
|
||||
"Rafferty",
|
||||
5000.99
|
||||
}
|
||||
)";
|
||||
|
||||
We call our parse function after instantiating a `position_cache` object
|
||||
that will hold the source stream positions:
|
||||
|
||||
position_cache positions{input.begin(), input.end()};
|
||||
auto ast = parse(input, positions);
|
||||
|
||||
We now have an AST, `ast`, that contains the parsed results. Let us get
|
||||
the source positions of the 2nd employee:
|
||||
|
||||
auto pos = positions.position_of(ast[1]); // zero based of course!
|
||||
|
||||
`pos` is an iterator range that contians iterators to the start and
|
||||
end of `ast[1]` in the input stream.
|
||||
|
||||
[endsect]
|
||||
@@ -27,8 +27,8 @@ First, let's create a struct representing an employee:
|
||||
struct employee
|
||||
{
|
||||
int age;
|
||||
std::string surname;
|
||||
std::string forename;
|
||||
std::string surname;
|
||||
double salary;
|
||||
};
|
||||
}}
|
||||
@@ -44,12 +44,12 @@ to be a fully conforming fusion tuple:
|
||||
|
||||
BOOST_FUSION_ADAPT_STRUCT(
|
||||
client::ast::employee,
|
||||
age, surname, forename, salary
|
||||
age, forename, surname, salary
|
||||
)
|
||||
|
||||
Now we'll write a parser for our employee. Inputs will be of the form:
|
||||
|
||||
employee{ age, "surname", "forename", salary }
|
||||
employee{ age, "forename", "surname", salary }
|
||||
|
||||
Here goes:
|
||||
|
||||
|
||||
@@ -5,19 +5,11 @@
|
||||
file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
|
||||
===============================================================================/]
|
||||
|
||||
[section:rexpr RExpressions - ASTs!]
|
||||
[section:rexpr RExpressions - Recursive ASTs!]
|
||||
|
||||
Stop and think about it... We're actually generating ASTs (abstract
|
||||
syntax trees) in our previoius examples. We parsed a single structure and
|
||||
generated an in-memory representation of it in the form of a struct: the struct
|
||||
employee. If we changed the implementation to parse one or more employees, the
|
||||
result would be a std::vector<employee>. We can go on and add more hierarchy:
|
||||
teams, departments, corporations. Then we'll have an AST representation of it
|
||||
all.
|
||||
|
||||
In this example, we'll explore more on how to create ASTs. We will parse a
|
||||
minimalistic JSON-like language and compile the results into our data structures
|
||||
in the form of a tree.
|
||||
In this example, we'll explore more on how to create heierarchical ASTs.
|
||||
We will parse a minimalistic JSON-like language and compile the results
|
||||
into our data structures in the form of a tree.
|
||||
|
||||
/rexpr/ is a parser for RExpressions, a language resembling a minimal subset
|
||||
of json, limited to a dictionary (composed of key=value pairs) where the value
|
||||
|
||||
@@ -142,7 +142,7 @@ namespace client
|
||||
// Our main parse entry point
|
||||
///////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
typedef std::string::const_iterator iterator_type;
|
||||
using iterator_type = std::string::const_iterator;
|
||||
using position_cache = boost::spirit::x3::position_cache<std::vector<iterator_type>>;
|
||||
|
||||
std::vector<client::ast::employee>
|
||||
|
||||
Reference in New Issue
Block a user