2
0
mirror of https://github.com/boostorg/spirit.git synced 2026-01-27 07:22:09 +00:00

4 Commits

Author SHA1 Message Date
Nikita Kniazev
15362c0263 to_utf8: more tests 2023-03-17 03:36:25 +03:00
Nikita Kniazev
785a5f35d2 Refix to_utf8 for MinGW and Clang on Windows 2020-04-21 02:12:06 +03:00
Kevin Puetz
c0288e1e8b _WIN32 does not always imply wchar_t is UTF-16
Win32 only defines the types from wtypes.h, like WCHAR, LPWCSTR, etc.
These may or may not be the same thing as wchar_t.

MinGW, cygwin, and wineg++ all support f(no-)short-wchar,
with the caveat that libstdc++ must be compiled with the same option.
Doing so is quite unusual for MinGW or cygwin, but more common for wineg++
as it enables building a winelib app with system glibc/libstdc++.
Win32's WCHAR is then unsigned short, or with C++11 perhaps char16_t.

MSVC does explicitly document that its wchar_t is always UTF16:
https://docs.microsoft.com/en-us/cpp/cpp/char-wchar-t-char16-t-char32-t?view=vs-2019

C99/C++11 compilers should provide __STDC_ISO_10646__ if wchar_t is unicode

GCC, Clang, and ICC all provide __SIZEOF_WCHAR_T__ to distinguish
-fshort-wchar (defaulted by mingw/cygwin) from -fno-short-wchar
https://gcc.gnu.org/onlinedocs/gcc-9.2.0/cpp/Common-Predefined-Macros.html

This takes the approach of assuming that a 2-byte unicode wchar_t
might be UTF-16 (and still works if its's actually UCS-2, it just never
finds any surrogate pairs), and 4-byte unicode must be UCS-4.
2020-01-29 23:14:01 -06:00
Nikita Kniazev
3fbde9c195 to_utf8: Fixed wchar_t handling on Windows
Spirit were assuming that wchar_t is 32-bit and the content is UCS-4.
It is wrong, the actual representation is implementation defined [lex.ccon]/6.
However, on most Unix platforms this assumption is valid and gives the
expected outcome, but on Windows wchar_t is 16-bit and the content is UTF-16.
2018-10-28 17:43:05 +03:00