Add error reporting when encountering unexpected (left over) code points at

the end of an otherwise-successful parse, when doing non-prefix parsing.
2026-01-19 04:22:13 +00:00 · 2025-03-30 16:06:41 -05:00
parent 07153117ff
commit a3ca1193b2
3 changed files with 82 additions and 39 deletions
--- a/doc/tutorial.qbk
+++ b/doc/tutorial.qbk
@@ -3399,9 +3399,9 @@ _w_eh_ (see _p_api_).  If you do not set one, _default_eh_ will be used.
 [heading How diagnostics are generated]

 _Parser_ only generates error messages like the ones in this page at failed
-expectation points, like `a > b`, where you have successfully parsed `a`, but
-then cannot successfully parse `b`.  This may seem limited to you.  It's
-actually the best that we can do.
+expectation points (like `a > b`, where you have successfully parsed `a`, but
+then cannot successfully parse `b`), and at an unexepcted end of input.  This
+may seem limited to you.  It's actually the best that we can do.

 In order for error handling to happen other than at expectation points, we
 have to know that there is no further processing that might take place.  This
@@ -3409,21 +3409,26 @@ is true because _Parser_ has `P1 | P2 | ... | Pn` parsers ("`or_parser`s").
 If any one of these parsers `Pi` fails to match, it is not allowed to fail the
 parse _emdash_ the next one (`Pi+1`) might match.  If we get to the end of the
 alternatives of the or_parser and `Pn` fails, we still cannot fail the
-top-level parse, because the `or_parser` might be a subparser within a parent
-`or_parser`.
+top-level parse, because this `or_parser` might be a subparser within a parent
+`or_parser`.  The only exception to this is when: we have finished the
+top-level parse; the top-level parse is *not* a prefix parse; and there is
+still a part of the input range that is left over.  In that case, there is an
+implicit expectation that the end of the parse and the end of input are the
+same location, and this implicit expectation has just been violated.

-Ok, so what might we do?  Perhaps we could at least indicate when we ran into
-end-of-input.  But we cannot, for exactly the same reason already stated.  For
-any parser `P`, reaching end-of-input is a failure for `P`, but not
-necessarily for the whole parse.
+Note that we cannot fail the top-level parse when we run into end-of-input.
+We cannot for exactly the same reason already stated.  For any parser `P`,
+reaching end-of-input is a failure for `P`, but not necessarily for the whole
+parse.

-Perhaps we could record the farthest point ever reached during the parse, and
-report that at the top level, if the top level parser fails.  That would be
-little help without knowing which parser was active when we reached that
-point.  This would require some sort of repeated memory allocation, since in
-_Parser_ the progress point of the parser is stored exclusively on the stack
-_emdash_ by the time we fail the top-level parse, all those far-reaching stack
-frames are long gone.  Not the best.
+Ok, so what other kinds of error reporting might we do?  Perhaps we could
+record the farthest point ever reached during the parse, and report that at
+the top level, if the top level parser fails.  That would be little help
+without knowing which parser was active when we reached that point.  This
+would require some sort of repeated memory allocation, since in _Parser_ the
+progress point of the parser is stored exclusively on the stack _emdash_ by
+the time we fail the top-level parse, all those far-reaching stack frames are
+long gone.  Not the best.

 Worse still, knowing how far you got in the parse and which parser was active
 is not very useful.  Consider this.
@@ -3440,15 +3445,16 @@ Was the error in the input putting the `'a'` at the beginning or putting the
 failed, and never mention `c_b`, you are potentially just steering them in the
 wrong direction.

-All error messages must come from failed expectation points.  Consider parsing
-JSON.  If you open a list with `'['`, you know that you're parsing a list, and
-if the list is ill-formed, you'll get an error message saying so.  If you open
-an object with `'{'`, the same thing is possible _emdash_ when missing the
-matching `'}'`, you can tell the user, "That's not an object", and this is
-useful feedback.  The same thing with a partially parsed number, etc.  If the
-JSON parser does not build in expectations like matched braces and brackets,
-how can _Parser_ know that a missing `'}'` is really a problem, and that no
-later parser will match the input even without the `'}'`?
+All error messages must come from failed expectation points (or unexpected end
+of input).  Consider parsing JSON.  If you open a list with `'['`, you know
+that you're parsing a list, and if the list is ill-formed, you'll get an error
+message saying so.  If you open an object with `'{'`, the same thing is
+possible _emdash_ when missing the matching `'}'`, you can tell the user,
+"That's not an object", and this is useful feedback.  The same thing with a
+partially parsed number, etc.  If the JSON parser does not build in
+expectations like matched braces and brackets, how can _Parser_ know that a
+missing `'}'` is really a problem, and that no later parser will match the
+input even without the `'}'`?

 [important The bottom line is that you should build expectation points into
 your parsers using `operator>` as much as possible.]
--- a/include/boost/parser/parser.hpp
+++ b/include/boost/parser/parser.hpp
@@ -2715,20 +2715,28 @@ namespace boost { namespace parser {
            }
        }

-        template<typename I, typename S, typename T>
-        std::optional<T>
-        if_full_parse(I & first, S last, std::optional<T> retval)
+        template<typename I, typename S, typename ErrorHandler, typename T>
+        T if_full_parse(
+            I initial_first,
+            I & first,
+            S last,
+            ErrorHandler const & error_handler,
+            T retval)
        {
-            if (first != last)
-                retval = std::nullopt;
-            return retval;
-        }
-        template<typename I, typename S>
-        bool if_full_parse(I & first, S last, bool retval)
-        {
-            if (first != last)
-                retval = false;
-            return retval;
+            if (first != last) {
+                if (retval && error_handler(
+                                  initial_first,
+                                  last,
+                                  parse_error<I>(first, "end of input")) ==
+                                  error_handler_result::rethrow) {
+                    throw;
+                }
+                if constexpr (std::is_same_v<T, bool>)
+                    retval = false;
+                else
+                    retval = std::nullopt;
+            }
+            return std::move(retval);
        }

        // The notion of comaptibility is that, given a parser with the
@@ -8817,9 +8825,12 @@ namespace boost { namespace parser {
        auto r_ = detail::make_input_subrange(r);
        auto first = r_.begin();
        auto const last = r_.end();
+        auto const initial_first = first;
        return reset = detail::if_full_parse(
+                   initial_first,
                   first,
                   last,
+                   parser.error_handler_,
                   parser::prefix_parse(first, last, parser, attr, trace_mode));
    }

@@ -8922,8 +8933,13 @@ namespace boost { namespace parser {
        auto r_ = detail::make_input_subrange(r);
        auto first = r_.begin();
        auto const last = r_.end();
+        auto const initial_first = first;
        return detail::if_full_parse(
-            first, last, parser::prefix_parse(first, last, parser, trace_mode));
+            initial_first,
+            first,
+            last,
+            parser.error_handler_,
+            parser::prefix_parse(first, last, parser, trace_mode));
    }

    /** Parses `[first, last)` using `parser`, skipping all input recognized
@@ -9058,9 +9074,12 @@ namespace boost { namespace parser {
        auto r_ = detail::make_input_subrange(r);
        auto first = r_.begin();
        auto const last = r_.end();
+        auto const initial_first = first;
        return reset = detail::if_full_parse(
+                   initial_first,
                   first,
                   last,
+                   parser.error_handler_,
                   parser::prefix_parse(
                       first, last, parser, skip, attr, trace_mode));
    }
@@ -9169,9 +9188,12 @@ namespace boost { namespace parser {
        auto r_ = detail::make_input_subrange(r);
        auto first = r_.begin();
        auto const last = r_.end();
+        auto const initial_first = first;
        return detail::if_full_parse(
+            initial_first,
            first,
            last,
+            parser.error_handler_,
            parser::prefix_parse(first, last, parser, skip, trace_mode));
    }

@@ -9287,9 +9309,12 @@ namespace boost { namespace parser {
        auto r_ = detail::make_input_subrange(r);
        auto first = r_.begin();
        auto const last = r_.end();
+        auto const initial_first = first;
        return detail::if_full_parse(
+            initial_first,
            first,
            last,
+            parser.error_handler_,
            parser::callback_prefix_parse(first, last, parser, callbacks));
    }

@@ -9423,9 +9448,12 @@ namespace boost { namespace parser {
        auto r_ = detail::make_input_subrange(r);
        auto first = r_.begin();
        auto const last = r_.end();
+        auto const initial_first = first;
        return detail::if_full_parse(
+            initial_first,
            first,
            last,
+            parser.error_handler_,
            parser::callback_prefix_parse(
                first, last, parser, skip, callbacks, trace_mode));
    }
--- a/test/parser.cpp
+++ b/test/parser.cpp
@@ -292,6 +292,15 @@ int main()
            }
            BOOST_TEST(parse(str, parser_1));
            BOOST_TEST(!parse(str, parser_2));
+            {
+                BOOST_TEST(!parse(str, char_));
+                std::ostringstream err, warn;
+                stream_error_handler eh("", err, warn);
+                BOOST_TEST(!parse(str, with_error_handler(char_, eh)));
+                BOOST_TEST(
+                    err.str() ==
+                    "1:1: error: Expected end of input here:\nab\n ^\n");
+            }
        }
        {
            std::string str = "ab";