Annotation tutorial example docs

2026-01-19 04:42:11 +00:00 · 2018-03-07 14:05:01 +08:00
parent a0df3c098f
commit 130e6395d7
5 changed files with 402 additions and 16 deletions
--- a/doc/x3/spirit_x3.qbk
+++ b/doc/x3/spirit_x3.qbk
@@ -197,6 +197,7 @@ __version__).
 [include        tutorial/num_list4.qbk]
 [include        tutorial/roman.qbk]
 [include        tutorial/employee.qbk]
+[include        tutorial/annotation.qbk]
 [include        tutorial/rexpr.qbk]
 [endsect]

--- a/doc/x3/tutorial/annotation.qbk
+++ b/doc/x3/tutorial/annotation.qbk
@@ -0,0 +1,393 @@
+[/==============================================================================
+    Copyright (C) 2001-2015 Joel de Guzman
+    Copyright (C) 2001-2011 Hartmut Kaiser
+
+    Distributed under the Boost Software License, Version 1.0. (See accompanying
+    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
+
+    I would like to thank Rainbowverse, llc (https://primeorbial.com/)
+    for sponsoring this work and donating it to the community.
+===============================================================================/]
+
+[section Annotations - Decorating the ASTs]
+
+Stop and think about it... We're actually generating ASTs (abstract
+syntax trees) in our previoius examples. We parsed a single structure and
+generated an in-memory representation of it in the form of a struct: the
+struct employee. If we changed the implementation to parse one or more
+employees, the result would be a std::vector<employee>. We can go on and
+add more hierarchy: teams, departments, corporations. Then we'll have an
+AST representation of it all.
+
+This example shows how to annotate the AST with the iterator positions
+for access to the source code when post processing using a client supplied
+`on_success` handler. The example will show how to get the position in
+input source stream that corresponds to a given element in the AST.
+
+In addition, This example also shows how to "inject" client data, using
+the "with" directive, that the `on_success` handler can access as it is
+called within the parse traversal through the parser's context.
+
+The full cpp file for this example can be found here: [@../../../example/x3/annotation.cpp]
+
+[heading The AST]
+
+First, we'll update our previous employee struct, this time separating
+the person into its own struct. So now, we have two structs, the `person`
+and the `employee`. Take note too that we now inherit `person` and `employee`
+from `x3::position_tagged` which provides positional information that we
+can use to tell the AST's position in the input stream anytime.
+
+    namespace client { namespace ast
+    {
+        struct person : x3::position_tagged
+        {
+            person(
+                std::string const& first_name = ""
+              , std::string const& last_name = ""
+            )
+            : first_name(first_name)
+            , last_name(last_name)
+            {}
+
+            std::string first_name, last_name;
+        };
+
+        struct employee : x3::position_tagged
+        {
+            int age;
+            person who;
+            double salary;
+        };
+    }}
+
+Like before, we need to tell __fusion__ about our structs to make them first-class
+fusion citizens that the grammar can utilize:
+
+    BOOST_FUSION_ADAPT_STRUCT(client::ast::person,
+        first_name, last_name
+    )
+
+    BOOST_FUSION_ADAPT_STRUCT(client::ast::employee,
+        age, who, salary
+    )
+
+[heading x3::position_cache]
+
+Before we proceed, let me introduce a helper class called the `position_cache`.
+It is a simple class that collects iterator ranges that point to where each
+element in the AST are located in the input stream. Given an AST, you can
+ask the position_cache about its position. For example:
+
+    auto pos = positions.position_of(my_ast);
+
+Where `my_ast` is the AST, `positions` and is the `position_cache`, `position_of`
+returns an iterator range that points to the start and end (`pos.begin()`
+and `pos.end()`) positions where the AST was parsed from. `positions.begin()`
+and `positions.end()` points to the start and end of the entire input stream.
+
+[heading on_success]
+
+The `on_success` gives you everything you want from semantic actions without
+the visual clutter. Declarative code can and should be free from imperative
+code. `on_success` as a concept and mechanism is an important departure
+from how things are done in Spirit's previous version: Qi.
+
+As demonstrated in the previous employee example, the preferred way to
+extract data from an input source is by having the parser collect the data
+for us into C++ structs as it traverses the input stream. Ideally, Spirit
+X3 grammars are fully attributed and declared in such a way that you do
+not have to add any imperative code and there should be no need for semantic
+actions at all. The parser simply works as declared and you get your data
+back as a result.
+
+However, there are certain cases where there's no way to avoid introducing
+imperative code. Yet, if we want to keep our code clean and free from semantic
+actions that messes up our clean declarative grammars, `on_success` handlers
+are alternative means to provide hooks to client code that is executed by the
+parser upon successful parse without polluting the grammar. Like semantic
+actions, `on_success` handlers also have access to the AST, the iterators,
+and context. But, unlike semantic actions, `on_success` handlers are cleanly
+separated from the actual grammar.
+
+[heading Annotation Handler]
+
+As discussed, we annotate the AST with its position in the input stream with
+our `on_success` handler:
+
+    // tag used to get the position cache from the context
+    struct position_cache_tag;
+
+    struct annotate_position
+    {
+        template <typename T, typename Iterator, typename Context>
+        inline void on_success(Iterator const& first, Iterator const& last
+        , T& ast, Context const& context)
+        {
+            auto& position_cache = x3::get<position_cache_tag>(context).get();
+            position_cache.annotate(ast, first, last);
+        }
+    };
+
+`position_cache_tag` is a special tag we will use to get a reference to the
+actual `position_cache`, client data that we will inject at very start, when
+we call parse. More on that later.
+
+Our `on_success` handler gets a reference to the actual `position_cache`
+and calls its `annotate` member function, passing in the AST and the iterators.
+`position_cache.annotate(ast, first, last)` annotates the AST with information
+required by `x3::position_tagged`.
+
+[heading The Parser]
+
+Now we'll write a parser for our employee. Like before, inputs will
+be of the form:
+
+    employee{ age, "forename", "surname", salary }
+
+Here we go:
+
+    namespace parser
+    {
+        using x3::int_;
+        using x3::double_;
+        using x3::lexeme;
+        using ascii::char_;
+
+        struct quoted_string_class;
+        struct person_class;
+        struct employee_class;
+
+        x3::rule<quoted_string_class, std::string> const quoted_string = "quoted_string";
+        x3::rule<person_class, ast::person> const person = "person";
+        x3::rule<employee_class, ast::employee> const employee = "employee";
+
+        auto const quoted_string_def = lexeme['"' >> +(char_ - '"') >> '"'];
+        auto const person_def = quoted_string >> ',' >> quoted_string;
+
+        auto const employee_def =
+                '{'
+            >>  int_ >> ','
+            >>  person >> ','
+            >>  double_
+            >>  '}'
+            ;
+
+        auto const employees = employee >> *(',' >> employee);
+
+        BOOST_SPIRIT_DEFINE(quoted_string, person, employee);
+    }
+
+Take a step back and look at the previous Employee example. We are incrementally
+building on top of that.
+
+[heading Rule Declarations]
+
+    struct quoted_string_class;
+    struct person_class;
+    struct employee_class;
+
+    x3::rule<quoted_string_class, std::string> const quoted_string = "quoted_string";
+    x3::rule<person_class, ast::person> const person = "person";
+    x3::rule<employee_class, ast::employee> const employee = "employee";
+
+What has changed?
+
+* We split the single employee rule into three smaller rules: `quoted_string`,
+  `person` and `employee`.
+* We're using forward declared rule classes: `quoted_string_class`, `person_class`,
+  and `employee_class`.
+
+[heading Rule Classes]
+
+In this example, the rule classes, `quoted_string_class`, `person_class`, and
+`employee_class` provide statically known IDs for the rules required by X3 to
+perform its tasks. In addition from that, the rule class can also be extended
+to have some user-defined customization hooks that are called:
+
+* On success: After a rule sucessfully parses an input.
+* On Error: After a rule fails to parse.
+
+By subclassing the rule class from a client supplied handler such as
+our our `annotate_position` handler above:
+
+    struct person_class : annotate_position {};
+    struct employee_class : annotate_position {};
+
+The code above tells X3 to check the rule class if it has an `on_success`
+or `on_error` member functions and calls appropriately calls them on
+such events.
+
+[heading The with Directive]
+
+With any parser `'p`, one can inject any data into that the that semantic
+actions and handlers can access later on when they are called. The general
+syntax is:
+
+    with<tag>(data)[p]
+
+For our particular example, we use to inject the `position_cache` into
+the parse for our `annotate_position` on_success handler to have access
+to:
+
+    auto const parser =
+        // we pass our position_cache to the parser so we can access
+        // it later in our on_sucess handlers
+        with<position_cache_tag>(std::ref(positions))
+        [
+            employees
+        ];
+
+Typically this is done just before calling `x3::parse` or `x3::phrase_parse`.
+`with` is a very lightwight operation. It is possible to inject as much
+data as you want, even multiple `with` directives:
+
+    with<tag1>(data1)
+    [
+        with<tag2>(data2)[p]
+    ]
+
+Multiple `with` directives can (perhaps not obviously) be injected from the
+outside caller function. Here's an outline:
+
+    template <typename Parser>
+    void bar(Parser const& p)
+    {
+        // Inject data2
+        auto const parser = with<tag2>(data2)[p];
+        x3::parse(first, last, parser);
+    }
+
+    void foo()
+    {
+        // Inject data1
+        auto const parser = with<tag1>(data1)[my_parser];
+        bar(p);
+    }
+
+[heading Let's Parse]
+
+Now we have the complete parse mechanism:
+
+    using iterator_type = std::string::const_iterator;
+    using position_cache = boost::spirit::x3::position_cache<std::vector<iterator_type>>;
+
+    std::vector<client::ast::employee>
+    parse(std::string const& input, position_cache& positions)
+    {
+        using boost::spirit::x3::ascii::space;
+
+        std::vector<client::ast::employee> ast;
+        iterator_type iter = input.begin();
+        iterator_type const end = input.end();
+
+        using boost::spirit::x3::with;
+
+        // Our parser
+        using client::parser::employees;
+        using client::parser::position_cache_tag;
+
+        auto const parser =
+            // we pass our position_cache to the parser so we can access
+            // it later in our on_sucess handlers
+            with<position_cache_tag>(std::ref(positions))
+            [
+                employees
+            ];
+
+        bool r = phrase_parse(iter, end, parser, space, ast);
+
+        // ... Some error checking here
+
+        return ast;
+    }
+
+Let's walk through the code.
+
+First, we have some typedefs for 1) The iterator type we are using for the
+parser, `iterator_type` and 2) For the `position_cache` type. The latter
+is a template that accepts the type of container it will hold. In this case,
+a `std::vector<iterator_type>`.
+
+The main parse function accepts an input, a std::string and a reference
+to a position_cache, and retuns an AST: `std::vector<client::ast::employee>`.
+
+Inside the parse function, we first create an AST where parsed data will
+be stored:
+
+    std::vector<client::ast::employee> ast;
+
+Then finally, we create a parser, injecting a reference to the `position_cache`,
+and call phrase_parse:
+
+    using client::parser::employees;
+    using client::parser::position_cache_tag;
+
+    auto const parser =
+        // we pass our position_cache to the parser so we can access
+        // it later in our on_sucess handlers
+        with<position_cache_tag>(std::ref(positions))
+        [
+            employees
+        ];
+
+    bool r = phrase_parse(iter, end, parser, space, ast);
+
+On successful parse, the AST, `ast`, will contain the actual parsed data.
+
+[heading Getting The Source Positions]
+
+Now that we have our main parse function, let's have an example sourcefile to
+parse and show how we can obtain the position of an AST element, returned on
+a successful parse.
+
+Given this input:
+
+    std::string input = R"(
+    {
+        23,
+        "Amanda",
+        "Stefanski",
+        1000.99
+    },
+    {
+        35,
+        "Angie",
+        "Chilcote",
+        2000.99
+    },
+    {
+        43,
+        "Dannie",
+        "Dillinger",
+        3000.99
+    },
+    {
+        22,
+        "Dorene",
+        "Dole",
+        2500.99
+    },
+    {
+        38,
+        "Rossana",
+        "Rafferty",
+        5000.99
+    }
+    )";
+
+We call our parse function after instantiating a `position_cache` object
+that will hold the source stream positions:
+
+    position_cache positions{input.begin(), input.end()};
+    auto ast = parse(input, positions);
+
+We now have an AST, `ast`, that contains the parsed results. Let us get
+the source positions of the 2nd employee:
+
+    auto pos = positions.position_of(ast[1]); // zero based of course!
+
+`pos` is an iterator range that contians iterators to the start and
+end of `ast[1]` in the input stream.
+
+[endsect]
--- a/doc/x3/tutorial/employee.qbk
+++ b/doc/x3/tutorial/employee.qbk
@@ -27,8 +27,8 @@ First, let's create a struct representing an employee:
        struct employee
        {
            int age;
-            std::string surname;
            std::string forename;
+            std::string surname;
            double salary;
        };
    }}
@@ -44,12 +44,12 @@ to be a fully conforming fusion tuple:

    BOOST_FUSION_ADAPT_STRUCT(
        client::ast::employee,
-        age, surname, forename, salary
+        age, forename, surname, salary
    )

 Now we'll write a parser for our employee. Inputs will be of the form:

-    employee{ age, "surname", "forename", salary }
+    employee{ age, "forename", "surname", salary }

 Here goes:

--- a/doc/x3/tutorial/rexpr.qbk
+++ b/doc/x3/tutorial/rexpr.qbk
@@ -5,19 +5,11 @@
    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
 ===============================================================================/]

-[section:rexpr RExpressions - ASTs!]
+[section:rexpr RExpressions - Recursive ASTs!]

-Stop and think about it... We're actually generating ASTs (abstract
-syntax trees) in our previoius examples. We parsed a single structure and
-generated an in-memory representation of it in the form of a struct: the struct
-employee. If we changed the implementation to parse one or more employees, the
-result would be a std::vector<employee>. We can go on and add more hierarchy:
-teams, departments, corporations. Then we'll have an AST representation of it
-all.
-
-In this example, we'll explore more on how to create ASTs. We will parse a
-minimalistic JSON-like language and compile the results into our data structures
-in the form of a tree.
+In this example, we'll explore more on how to create heierarchical ASTs.
+We will parse a minimalistic JSON-like language and compile the results
+into our data structures in the form of a tree.

 /rexpr/ is a parser for RExpressions, a language resembling a minimal subset
 of json, limited to a dictionary (composed of key=value pairs) where the value
--- a/example/x3/annotation.cpp
+++ b/example/x3/annotation.cpp
@@ -142,7 +142,7 @@ namespace client
 // Our main parse entry point
 ///////////////////////////////////////////////////////////////////////////////

-typedef std::string::const_iterator iterator_type;
+using iterator_type = std::string::const_iterator;
 using position_cache = boost::spirit::x3::position_cache<std::vector<iterator_type>>;

 std::vector<client::ast::employee>