mirror of
https://github.com/boostorg/locale.git
synced 2026-01-19 04:22:08 +00:00
Don't use stdlib UTF-8 codecvt facet for Cygwin too.
Handle that the same as for native Windows. Reason is an issue discovered converting an UTF-8 sequence of 1000x U+2008A to wchar_t (UTF-16): UTF-8: "\xF0\xA0\x82\x8A" The correct result are 1000x L"\xD840\xDC8A" The first 255 pairs are correct (1020 input bytes consumed) but the low surrogate of the 256th pair becomes `0xDC82` hinting it repeats the second last byte (index 1023) instead of reading the correct one.
This commit is contained in:
@@ -18,11 +18,14 @@ namespace boost { namespace locale { namespace impl_std {
|
||||
std::locale
|
||||
create_codecvt(const std::locale& in, const std::string& locale_name, char_facet_t type, utf8_support utf)
|
||||
{
|
||||
#if defined(BOOST_WINDOWS)
|
||||
#if defined(BOOST_WINDOWS) || defined(__CYGWIN__)
|
||||
// This isn't fully correct:
|
||||
// It will treat the 2-Byte wchar_t as UTF-16 encoded while it may be UCS-2
|
||||
// std::basic_filebuf explicitely disallows using suche multi-byte codecvts
|
||||
// but it works in practice so far, so use it instead of failing for codepoints above U+FFFF
|
||||
//
|
||||
// Additionally, the stdlib in Cygwin has issues converting long UTF-8 sequences likely due to left-over
|
||||
// state across buffer boundaries. E.g. the low surrogate after a sequence of 255 UTF-16 pairs gets corrupted.
|
||||
if(utf != utf8_support::none)
|
||||
return util::create_utf8_codecvt(in, type);
|
||||
#endif
|
||||
|
||||
Reference in New Issue
Block a user