2
0
mirror of https://github.com/boostorg/parser.git synced 2026-01-19 04:22:13 +00:00

Add error reporting when encountering unexpected (left over) code points at

the end of an otherwise-successful parse, when doing non-prefix parsing.
This commit is contained in:
Zach Laine
2025-03-30 16:06:41 -05:00
parent 07153117ff
commit a3ca1193b2
3 changed files with 82 additions and 39 deletions

View File

@@ -3399,9 +3399,9 @@ _w_eh_ (see _p_api_). If you do not set one, _default_eh_ will be used.
[heading How diagnostics are generated]
_Parser_ only generates error messages like the ones in this page at failed
expectation points, like `a > b`, where you have successfully parsed `a`, but
then cannot successfully parse `b`. This may seem limited to you. It's
actually the best that we can do.
expectation points (like `a > b`, where you have successfully parsed `a`, but
then cannot successfully parse `b`), and at an unexepcted end of input. This
may seem limited to you. It's actually the best that we can do.
In order for error handling to happen other than at expectation points, we
have to know that there is no further processing that might take place. This
@@ -3409,21 +3409,26 @@ is true because _Parser_ has `P1 | P2 | ... | Pn` parsers ("`or_parser`s").
If any one of these parsers `Pi` fails to match, it is not allowed to fail the
parse _emdash_ the next one (`Pi+1`) might match. If we get to the end of the
alternatives of the or_parser and `Pn` fails, we still cannot fail the
top-level parse, because the `or_parser` might be a subparser within a parent
`or_parser`.
top-level parse, because this `or_parser` might be a subparser within a parent
`or_parser`. The only exception to this is when: we have finished the
top-level parse; the top-level parse is *not* a prefix parse; and there is
still a part of the input range that is left over. In that case, there is an
implicit expectation that the end of the parse and the end of input are the
same location, and this implicit expectation has just been violated.
Ok, so what might we do? Perhaps we could at least indicate when we ran into
end-of-input. But we cannot, for exactly the same reason already stated. For
any parser `P`, reaching end-of-input is a failure for `P`, but not
necessarily for the whole parse.
Note that we cannot fail the top-level parse when we run into end-of-input.
We cannot for exactly the same reason already stated. For any parser `P`,
reaching end-of-input is a failure for `P`, but not necessarily for the whole
parse.
Perhaps we could record the farthest point ever reached during the parse, and
report that at the top level, if the top level parser fails. That would be
little help without knowing which parser was active when we reached that
point. This would require some sort of repeated memory allocation, since in
_Parser_ the progress point of the parser is stored exclusively on the stack
_emdash_ by the time we fail the top-level parse, all those far-reaching stack
frames are long gone. Not the best.
Ok, so what other kinds of error reporting might we do? Perhaps we could
record the farthest point ever reached during the parse, and report that at
the top level, if the top level parser fails. That would be little help
without knowing which parser was active when we reached that point. This
would require some sort of repeated memory allocation, since in _Parser_ the
progress point of the parser is stored exclusively on the stack _emdash_ by
the time we fail the top-level parse, all those far-reaching stack frames are
long gone. Not the best.
Worse still, knowing how far you got in the parse and which parser was active
is not very useful. Consider this.
@@ -3440,15 +3445,16 @@ Was the error in the input putting the `'a'` at the beginning or putting the
failed, and never mention `c_b`, you are potentially just steering them in the
wrong direction.
All error messages must come from failed expectation points. Consider parsing
JSON. If you open a list with `'['`, you know that you're parsing a list, and
if the list is ill-formed, you'll get an error message saying so. If you open
an object with `'{'`, the same thing is possible _emdash_ when missing the
matching `'}'`, you can tell the user, "That's not an object", and this is
useful feedback. The same thing with a partially parsed number, etc. If the
JSON parser does not build in expectations like matched braces and brackets,
how can _Parser_ know that a missing `'}'` is really a problem, and that no
later parser will match the input even without the `'}'`?
All error messages must come from failed expectation points (or unexpected end
of input). Consider parsing JSON. If you open a list with `'['`, you know
that you're parsing a list, and if the list is ill-formed, you'll get an error
message saying so. If you open an object with `'{'`, the same thing is
possible _emdash_ when missing the matching `'}'`, you can tell the user,
"That's not an object", and this is useful feedback. The same thing with a
partially parsed number, etc. If the JSON parser does not build in
expectations like matched braces and brackets, how can _Parser_ know that a
missing `'}'` is really a problem, and that no later parser will match the
input even without the `'}'`?
[important The bottom line is that you should build expectation points into
your parsers using `operator>` as much as possible.]

View File

@@ -2715,20 +2715,28 @@ namespace boost { namespace parser {
}
}
template<typename I, typename S, typename T>
std::optional<T>
if_full_parse(I & first, S last, std::optional<T> retval)
template<typename I, typename S, typename ErrorHandler, typename T>
T if_full_parse(
I initial_first,
I & first,
S last,
ErrorHandler const & error_handler,
T retval)
{
if (first != last)
retval = std::nullopt;
return retval;
}
template<typename I, typename S>
bool if_full_parse(I & first, S last, bool retval)
{
if (first != last)
retval = false;
return retval;
if (first != last) {
if (retval && error_handler(
initial_first,
last,
parse_error<I>(first, "end of input")) ==
error_handler_result::rethrow) {
throw;
}
if constexpr (std::is_same_v<T, bool>)
retval = false;
else
retval = std::nullopt;
}
return std::move(retval);
}
// The notion of comaptibility is that, given a parser with the
@@ -8817,9 +8825,12 @@ namespace boost { namespace parser {
auto r_ = detail::make_input_subrange(r);
auto first = r_.begin();
auto const last = r_.end();
auto const initial_first = first;
return reset = detail::if_full_parse(
initial_first,
first,
last,
parser.error_handler_,
parser::prefix_parse(first, last, parser, attr, trace_mode));
}
@@ -8922,8 +8933,13 @@ namespace boost { namespace parser {
auto r_ = detail::make_input_subrange(r);
auto first = r_.begin();
auto const last = r_.end();
auto const initial_first = first;
return detail::if_full_parse(
first, last, parser::prefix_parse(first, last, parser, trace_mode));
initial_first,
first,
last,
parser.error_handler_,
parser::prefix_parse(first, last, parser, trace_mode));
}
/** Parses `[first, last)` using `parser`, skipping all input recognized
@@ -9058,9 +9074,12 @@ namespace boost { namespace parser {
auto r_ = detail::make_input_subrange(r);
auto first = r_.begin();
auto const last = r_.end();
auto const initial_first = first;
return reset = detail::if_full_parse(
initial_first,
first,
last,
parser.error_handler_,
parser::prefix_parse(
first, last, parser, skip, attr, trace_mode));
}
@@ -9169,9 +9188,12 @@ namespace boost { namespace parser {
auto r_ = detail::make_input_subrange(r);
auto first = r_.begin();
auto const last = r_.end();
auto const initial_first = first;
return detail::if_full_parse(
initial_first,
first,
last,
parser.error_handler_,
parser::prefix_parse(first, last, parser, skip, trace_mode));
}
@@ -9287,9 +9309,12 @@ namespace boost { namespace parser {
auto r_ = detail::make_input_subrange(r);
auto first = r_.begin();
auto const last = r_.end();
auto const initial_first = first;
return detail::if_full_parse(
initial_first,
first,
last,
parser.error_handler_,
parser::callback_prefix_parse(first, last, parser, callbacks));
}
@@ -9423,9 +9448,12 @@ namespace boost { namespace parser {
auto r_ = detail::make_input_subrange(r);
auto first = r_.begin();
auto const last = r_.end();
auto const initial_first = first;
return detail::if_full_parse(
initial_first,
first,
last,
parser.error_handler_,
parser::callback_prefix_parse(
first, last, parser, skip, callbacks, trace_mode));
}

View File

@@ -292,6 +292,15 @@ int main()
}
BOOST_TEST(parse(str, parser_1));
BOOST_TEST(!parse(str, parser_2));
{
BOOST_TEST(!parse(str, char_));
std::ostringstream err, warn;
stream_error_handler eh("", err, warn);
BOOST_TEST(!parse(str, with_error_handler(char_, eh)));
BOOST_TEST(
err.str() ==
"1:1: error: Expected end of input here:\nab\n ^\n");
}
}
{
std::string str = "ab";