2
0
mirror of https://github.com/boostorg/parser.git synced 2026-01-19 04:22:13 +00:00

Address a TODO about documenting part of the loose attribute match behavior.

This commit is contained in:
Zach Laine
2024-01-28 20:40:19 -06:00
parent f7d26dabae
commit 5b7889df61
2 changed files with 40 additions and 1 deletions

View File

@@ -2367,6 +2367,45 @@ using a _ui_, and writing its attribute into a `double`. In general, you can
swap any type `T` out of the attribute, as long as the swap would not result
in some ill-formed assignment within the parse.
Here is another example that also produces surprising results, for a different
reason.
namespace bp = boost::parser;
constexpr auto parser = bp::char_('a') >> bp::char_('b') >> bp::char_('c') |
bp::char_('x') >> bp::char_('y') >> bp::char_('z');
std::string str = "abc";
bp::tuple<char, char, char> chars;
bool b = bp::parse(str, parser, chars);
assert(b);
assert(chars == bp::tuple('c', '\0', '\0'));
This looks wrong, but is expected behavior. At every stage of the parse that
produces an attribute, _Parser_ tries to assign that attribute to some part of
the out-param attribute provided to _p_, if there is one. Note that
`_ATTR_np_(parser)` is `std::string`, because each sequence parser is three
`char_` parsers in a row, which forms a `std::string`; there are two such
alternatives, so the overall attribute is also `std::string`. During the
parse, when the first parser `bp::char_('a')` matches the input, it produces
the attribute `'a'` and needs to assign it to its destination. Some logic
inside the sequence parser indicates that this `'a'` contributes to the value
in the `0`th position in the result tuple, if the result is being written into
a tuple. Here, we passed a `bp::tuple<char, char, char>`, so it writes `'a'`
into the first element. Each subsequent `char_` parser does the same thing,
and writes over the first element. If we had passed a `std::string` as the
out-param instead, the logic would have seen that the out-param attribute is a
string, and would have appended `'a'` to it. Then each subsequent parser
would have appended to the string.
_Parser_ never looks at the arity of the tuple passed to _p_ to see if there
are too many or too few elements in it, compared to the expected attribute for
the parser. In this case, there are two extra elements that are never
touched. If there had been too few elements in the tuple, you would have seen
a compilation error. The reason that _Parser_ never does this kind of
type-checking up front is that the loose assignment logic is spread out among
the individual parsers; the top-level parse can determine what the expected
attribute is, but not whether a passed attribute of another type is a suitable
stand-in.
[heading Unicode versus non-Unicode parsing]
A call to _p_ either considers the entire input to be in a UTF format (UTF-8,

View File

@@ -1808,7 +1808,7 @@ TEST(parser, combined_seq_and_or)
std::string str = "abc";
tuple<char, char, char> chars;
EXPECT_TRUE(parse(str, parser, chars));
EXPECT_EQ(chars, tup('c', '\0', '\0')); // TODO: Document this behavior.
EXPECT_EQ(chars, tup('c', '\0', '\0'));
}
{