mirror of
https://github.com/boostorg/parser.git
synced 2026-01-19 04:22:13 +00:00
Add error reporting when encountering unexpected (left over) code points at
the end of an otherwise-successful parse, when doing non-prefix parsing.
This commit is contained in:
@@ -3399,9 +3399,9 @@ _w_eh_ (see _p_api_). If you do not set one, _default_eh_ will be used.
|
||||
[heading How diagnostics are generated]
|
||||
|
||||
_Parser_ only generates error messages like the ones in this page at failed
|
||||
expectation points, like `a > b`, where you have successfully parsed `a`, but
|
||||
then cannot successfully parse `b`. This may seem limited to you. It's
|
||||
actually the best that we can do.
|
||||
expectation points (like `a > b`, where you have successfully parsed `a`, but
|
||||
then cannot successfully parse `b`), and at an unexepcted end of input. This
|
||||
may seem limited to you. It's actually the best that we can do.
|
||||
|
||||
In order for error handling to happen other than at expectation points, we
|
||||
have to know that there is no further processing that might take place. This
|
||||
@@ -3409,21 +3409,26 @@ is true because _Parser_ has `P1 | P2 | ... | Pn` parsers ("`or_parser`s").
|
||||
If any one of these parsers `Pi` fails to match, it is not allowed to fail the
|
||||
parse _emdash_ the next one (`Pi+1`) might match. If we get to the end of the
|
||||
alternatives of the or_parser and `Pn` fails, we still cannot fail the
|
||||
top-level parse, because the `or_parser` might be a subparser within a parent
|
||||
`or_parser`.
|
||||
top-level parse, because this `or_parser` might be a subparser within a parent
|
||||
`or_parser`. The only exception to this is when: we have finished the
|
||||
top-level parse; the top-level parse is *not* a prefix parse; and there is
|
||||
still a part of the input range that is left over. In that case, there is an
|
||||
implicit expectation that the end of the parse and the end of input are the
|
||||
same location, and this implicit expectation has just been violated.
|
||||
|
||||
Ok, so what might we do? Perhaps we could at least indicate when we ran into
|
||||
end-of-input. But we cannot, for exactly the same reason already stated. For
|
||||
any parser `P`, reaching end-of-input is a failure for `P`, but not
|
||||
necessarily for the whole parse.
|
||||
Note that we cannot fail the top-level parse when we run into end-of-input.
|
||||
We cannot for exactly the same reason already stated. For any parser `P`,
|
||||
reaching end-of-input is a failure for `P`, but not necessarily for the whole
|
||||
parse.
|
||||
|
||||
Perhaps we could record the farthest point ever reached during the parse, and
|
||||
report that at the top level, if the top level parser fails. That would be
|
||||
little help without knowing which parser was active when we reached that
|
||||
point. This would require some sort of repeated memory allocation, since in
|
||||
_Parser_ the progress point of the parser is stored exclusively on the stack
|
||||
_emdash_ by the time we fail the top-level parse, all those far-reaching stack
|
||||
frames are long gone. Not the best.
|
||||
Ok, so what other kinds of error reporting might we do? Perhaps we could
|
||||
record the farthest point ever reached during the parse, and report that at
|
||||
the top level, if the top level parser fails. That would be little help
|
||||
without knowing which parser was active when we reached that point. This
|
||||
would require some sort of repeated memory allocation, since in _Parser_ the
|
||||
progress point of the parser is stored exclusively on the stack _emdash_ by
|
||||
the time we fail the top-level parse, all those far-reaching stack frames are
|
||||
long gone. Not the best.
|
||||
|
||||
Worse still, knowing how far you got in the parse and which parser was active
|
||||
is not very useful. Consider this.
|
||||
@@ -3440,15 +3445,16 @@ Was the error in the input putting the `'a'` at the beginning or putting the
|
||||
failed, and never mention `c_b`, you are potentially just steering them in the
|
||||
wrong direction.
|
||||
|
||||
All error messages must come from failed expectation points. Consider parsing
|
||||
JSON. If you open a list with `'['`, you know that you're parsing a list, and
|
||||
if the list is ill-formed, you'll get an error message saying so. If you open
|
||||
an object with `'{'`, the same thing is possible _emdash_ when missing the
|
||||
matching `'}'`, you can tell the user, "That's not an object", and this is
|
||||
useful feedback. The same thing with a partially parsed number, etc. If the
|
||||
JSON parser does not build in expectations like matched braces and brackets,
|
||||
how can _Parser_ know that a missing `'}'` is really a problem, and that no
|
||||
later parser will match the input even without the `'}'`?
|
||||
All error messages must come from failed expectation points (or unexpected end
|
||||
of input). Consider parsing JSON. If you open a list with `'['`, you know
|
||||
that you're parsing a list, and if the list is ill-formed, you'll get an error
|
||||
message saying so. If you open an object with `'{'`, the same thing is
|
||||
possible _emdash_ when missing the matching `'}'`, you can tell the user,
|
||||
"That's not an object", and this is useful feedback. The same thing with a
|
||||
partially parsed number, etc. If the JSON parser does not build in
|
||||
expectations like matched braces and brackets, how can _Parser_ know that a
|
||||
missing `'}'` is really a problem, and that no later parser will match the
|
||||
input even without the `'}'`?
|
||||
|
||||
[important The bottom line is that you should build expectation points into
|
||||
your parsers using `operator>` as much as possible.]
|
||||
|
||||
@@ -2715,20 +2715,28 @@ namespace boost { namespace parser {
|
||||
}
|
||||
}
|
||||
|
||||
template<typename I, typename S, typename T>
|
||||
std::optional<T>
|
||||
if_full_parse(I & first, S last, std::optional<T> retval)
|
||||
template<typename I, typename S, typename ErrorHandler, typename T>
|
||||
T if_full_parse(
|
||||
I initial_first,
|
||||
I & first,
|
||||
S last,
|
||||
ErrorHandler const & error_handler,
|
||||
T retval)
|
||||
{
|
||||
if (first != last)
|
||||
retval = std::nullopt;
|
||||
return retval;
|
||||
}
|
||||
template<typename I, typename S>
|
||||
bool if_full_parse(I & first, S last, bool retval)
|
||||
{
|
||||
if (first != last)
|
||||
retval = false;
|
||||
return retval;
|
||||
if (first != last) {
|
||||
if (retval && error_handler(
|
||||
initial_first,
|
||||
last,
|
||||
parse_error<I>(first, "end of input")) ==
|
||||
error_handler_result::rethrow) {
|
||||
throw;
|
||||
}
|
||||
if constexpr (std::is_same_v<T, bool>)
|
||||
retval = false;
|
||||
else
|
||||
retval = std::nullopt;
|
||||
}
|
||||
return std::move(retval);
|
||||
}
|
||||
|
||||
// The notion of comaptibility is that, given a parser with the
|
||||
@@ -8817,9 +8825,12 @@ namespace boost { namespace parser {
|
||||
auto r_ = detail::make_input_subrange(r);
|
||||
auto first = r_.begin();
|
||||
auto const last = r_.end();
|
||||
auto const initial_first = first;
|
||||
return reset = detail::if_full_parse(
|
||||
initial_first,
|
||||
first,
|
||||
last,
|
||||
parser.error_handler_,
|
||||
parser::prefix_parse(first, last, parser, attr, trace_mode));
|
||||
}
|
||||
|
||||
@@ -8922,8 +8933,13 @@ namespace boost { namespace parser {
|
||||
auto r_ = detail::make_input_subrange(r);
|
||||
auto first = r_.begin();
|
||||
auto const last = r_.end();
|
||||
auto const initial_first = first;
|
||||
return detail::if_full_parse(
|
||||
first, last, parser::prefix_parse(first, last, parser, trace_mode));
|
||||
initial_first,
|
||||
first,
|
||||
last,
|
||||
parser.error_handler_,
|
||||
parser::prefix_parse(first, last, parser, trace_mode));
|
||||
}
|
||||
|
||||
/** Parses `[first, last)` using `parser`, skipping all input recognized
|
||||
@@ -9058,9 +9074,12 @@ namespace boost { namespace parser {
|
||||
auto r_ = detail::make_input_subrange(r);
|
||||
auto first = r_.begin();
|
||||
auto const last = r_.end();
|
||||
auto const initial_first = first;
|
||||
return reset = detail::if_full_parse(
|
||||
initial_first,
|
||||
first,
|
||||
last,
|
||||
parser.error_handler_,
|
||||
parser::prefix_parse(
|
||||
first, last, parser, skip, attr, trace_mode));
|
||||
}
|
||||
@@ -9169,9 +9188,12 @@ namespace boost { namespace parser {
|
||||
auto r_ = detail::make_input_subrange(r);
|
||||
auto first = r_.begin();
|
||||
auto const last = r_.end();
|
||||
auto const initial_first = first;
|
||||
return detail::if_full_parse(
|
||||
initial_first,
|
||||
first,
|
||||
last,
|
||||
parser.error_handler_,
|
||||
parser::prefix_parse(first, last, parser, skip, trace_mode));
|
||||
}
|
||||
|
||||
@@ -9287,9 +9309,12 @@ namespace boost { namespace parser {
|
||||
auto r_ = detail::make_input_subrange(r);
|
||||
auto first = r_.begin();
|
||||
auto const last = r_.end();
|
||||
auto const initial_first = first;
|
||||
return detail::if_full_parse(
|
||||
initial_first,
|
||||
first,
|
||||
last,
|
||||
parser.error_handler_,
|
||||
parser::callback_prefix_parse(first, last, parser, callbacks));
|
||||
}
|
||||
|
||||
@@ -9423,9 +9448,12 @@ namespace boost { namespace parser {
|
||||
auto r_ = detail::make_input_subrange(r);
|
||||
auto first = r_.begin();
|
||||
auto const last = r_.end();
|
||||
auto const initial_first = first;
|
||||
return detail::if_full_parse(
|
||||
initial_first,
|
||||
first,
|
||||
last,
|
||||
parser.error_handler_,
|
||||
parser::callback_prefix_parse(
|
||||
first, last, parser, skip, callbacks, trace_mode));
|
||||
}
|
||||
|
||||
@@ -292,6 +292,15 @@ int main()
|
||||
}
|
||||
BOOST_TEST(parse(str, parser_1));
|
||||
BOOST_TEST(!parse(str, parser_2));
|
||||
{
|
||||
BOOST_TEST(!parse(str, char_));
|
||||
std::ostringstream err, warn;
|
||||
stream_error_handler eh("", err, warn);
|
||||
BOOST_TEST(!parse(str, with_error_handler(char_, eh)));
|
||||
BOOST_TEST(
|
||||
err.str() ==
|
||||
"1:1: error: Expected end of input here:\nab\n ^\n");
|
||||
}
|
||||
}
|
||||
{
|
||||
std::string str = "ab";
|
||||
|
||||
Reference in New Issue
Block a user