mirror of
https://github.com/boostorg/locale.git
synced 2026-01-19 16:32:09 +00:00
Improve docs
This commit is contained in:
@@ -37,7 +37,6 @@ Almost every(!) facet has design flaws:
|
||||
|
||||
- \c std::ctype, which is responsible for case conversion, assumes that all conversions can be done on a per-character basis. This is
|
||||
probably correct for many languages but it isn't correct in general.
|
||||
\n
|
||||
-# Case conversion may change a string's length. For example, the German word "grüßen" should be converted to "GRÜSSEN" in upper
|
||||
case: the letter "ß" should be converted to "SS", but the \c toupper function works on a single-character basis.
|
||||
-# Case conversion is context-sensitive. For example, the Greek word "ὈΔΥΣΣΕΎΣ" should be converted to "ὀδυσσεύς", where the Greek letter
|
||||
@@ -48,11 +47,9 @@ Almost every(!) facet has design flaws:
|
||||
- \c std::numpunct and \c std::moneypunct do not specify the code points for digit representation at all,
|
||||
so they cannot format numbers with the digits used under Arabic locales. For example,
|
||||
the number "103" is expected to be displayed as "١٠٣" in the \c ar_EG locale.
|
||||
\n
|
||||
\c std::numpunct and \c std::moneypunct assume that the thousands separator is a single character. This is untrue
|
||||
for the UTF-8 encoding where only Unicode 0-0x7F range can be represented as a single character. As a result, localized numbers can't be
|
||||
represented correctly under locales that use the Unicode "EN SPACE" character for the thousands separator, such as Russian.
|
||||
\n
|
||||
This actually causes real problems under GCC and SunStudio compilers, where formatting numbers under a Russian locale creates invalid
|
||||
UTF-8 sequences.
|
||||
- \c std::time_put and \c std::time_get have several flaws:
|
||||
@@ -60,8 +57,6 @@ Almost every(!) facet has design flaws:
|
||||
countries dates may be displayed using different calendars.
|
||||
-# They always use a global time zone, not allowing specification of the time zone for formatting. The standard \c std::tm doesn't
|
||||
even include a timezone field at all.
|
||||
-# \c std::time_get is not symmetric with \c std::time_put, so you cannot parse dates and times created with \c std::time_put .
|
||||
(This issue is addressed in C++11 and some STL implementation like the Apache standard C++ library.)
|
||||
- \c std::messages does not provide support for plural forms, making it impossible to correctly localize such simple strings as
|
||||
"There are X files in the directory".
|
||||
|
||||
@@ -75,13 +70,13 @@ ICU is a very good localization library, but it has several serious flaws:
|
||||
- It is absolutely unfriendly to C++ developers. It ignores popular C++ idioms (the STL, RTTI, exceptions, etc), instead
|
||||
mostly mimicking the Java API.
|
||||
- It provides support for only one kind of string, UTF-16, when some users may want other Unicode encodings.
|
||||
For example, for XML or HTML processing UTF-8 is much more convenient and UTF-32 easier to use. Also there is no support for
|
||||
For example, for XML or HTML processing UTF-8 is much more convenient and UTF-32 easier to use. Also, there is no support for
|
||||
"narrow" encodings that are still very popular, such as the ISO-8859 encodings.
|
||||
|
||||
For example: Boost.Locale provides direct integration with \c iostream allowing a more natural way of data formatting. For example:
|
||||
|
||||
\code
|
||||
cout << "You have "<<as::currency << 134.45 << " in your account as of "<<as::datetime << std::time(0) << endl;
|
||||
cout << "You have "<<as::currency << 134.45 << " in your account as of "<< as::datetime << std::time(0) << endl;
|
||||
\endcode
|
||||
|
||||
\section why_icu_wrapper Why an ICU wrapper and not an implementation-from-scratch?
|
||||
@@ -145,21 +140,16 @@ There are several reasons:
|
||||
-# A Gregorian Date by definition can't be used to represent locale-independent dates, because not all
|
||||
calendars are Gregorian.
|
||||
-# \c ptime -- definitely could be used, but it has several problems:
|
||||
\n
|
||||
- It is created in GMT or Local time clock, when `time()` gives a representation that is independent of time zones
|
||||
(usually GMT time), and only later should it be represented in a time zone that the user requests.
|
||||
\n
|
||||
The timezone is not a property of time itself, but it is rather a property of time formatting.
|
||||
\n
|
||||
- \c ptime already defines \c operator<< and \c operator>> for time formatting and parsing.
|
||||
- The existing facets for \c ptime formatting and parsing were not designed in a way that the user can override.
|
||||
The major formatting and parsing functions are not virtual. This makes it impossible to reimplement the formatting and
|
||||
parsing functions of \c ptime unless the developers of the Boost.DateTime library decide to change them.
|
||||
\n
|
||||
Also, the facets of \c ptime are not "correctly" designed in terms of division of formatting information and
|
||||
locale information. Formatting information should be stored within \c std::ios_base and information about
|
||||
locale-specific formatting should be stored in the facet itself.
|
||||
\n
|
||||
The user of the library should not have to create new facets to change simple formatting information like "display only
|
||||
the date" or "display both date and time."
|
||||
|
||||
@@ -174,30 +164,28 @@ do not actually know how the text should be encoded -- UTF-8, ISO-8859-1, ISO-88
|
||||
This may vary between different operating systems and depends on the current installation. So it is critical
|
||||
to provide all the required information.
|
||||
- ICU fully understands POSIX locales and knows how to treat them correctly.
|
||||
- They are native locale names for most operating system APIs (with the exception of Windows)
|
||||
- They are native locale names for most operating system APIs (except for Windows)
|
||||
|
||||
\section why_linear_chunks Why do most parts of Boost.Locale work only on linear/contiguous chunks of text?
|
||||
|
||||
There are two reasons:
|
||||
|
||||
- Boost.Locale relies heavily on the third-party APIs like ICU, POSIX or Win32 API, all of them
|
||||
work only on linear chunks of text, so providing non-linear API would just hide the
|
||||
- Boost.Locale relies heavily on third-party APIs like ICU, POSIX or Win32 API, all of them
|
||||
work only on linear chunks of text, so providing a non-linear API would just hide the
|
||||
real situation and would hurt performance.
|
||||
- In fact, all known libraries that work with Unicode: ICU, Qt, Glib, Win32 API, POSIX API
|
||||
and others accept an input as single linear chunks of text and there is a good reason for this:
|
||||
\n
|
||||
-# Most supported operations on text like collation, case handling usually work on small
|
||||
chunks of text. For example: you probably would never want to compare two chapters of a book, but rather
|
||||
their titles.
|
||||
-# We should remember that even very large texts require quite a small amount of memory, for example
|
||||
the entire book "War and Peace" takes only about 3MB of memory.
|
||||
\n
|
||||
|
||||
However:
|
||||
|
||||
- There are API's that support stream processing. For example: character set conversion using
|
||||
- There are APIs that support stream processing. For example: character set conversion using the
|
||||
\c std::codecvt API works on streams of any size without problems.
|
||||
- When new API is introduced into Boost.Locale in future, such that it likely works
|
||||
- When new API is introduced into Boost.Locale in the future, such that it likely works
|
||||
on large chunks of text, will provide an interface for non-linear text handling.
|
||||
|
||||
|
||||
@@ -207,27 +195,9 @@ There are several major reasons:
|
||||
|
||||
- This is how the C++'s \c std::locale class is build. Each feature is represented using a subclass of
|
||||
\c std::locale::facet that provides an abstract API for specific operations it works on, see \ref std_locales.
|
||||
- This approach allows to switch underlying API without changing the actual application code even in run-time depending
|
||||
- This approach allows to switch underlying the API without changing the actual application code even in run-time depending
|
||||
on performance and localization requirements.
|
||||
- This approach reduces compilation times significantly. This is very important for library that may be
|
||||
- This approach reduces compilation times significantly. This is very important for a library that may be
|
||||
used in almost every part of specific program.
|
||||
|
||||
\section why_no_special_character_type Why doesn't Boost.Locale provide char16_t/char32_t for non-C++11 platforms?
|
||||
|
||||
There are several reasons:
|
||||
|
||||
- C++11 defines \c char16_t and \c char32_t as distinct types, so substituting it with something like \c uint16_t or \c uint32_t
|
||||
would not work as for example writing \c uint16_t to \c uint32_t stream would write a number to stream.
|
||||
- The C++ locales system would work only if standard facets like \c std::num_put are installed into the
|
||||
existing instance of \c std::locale, however in the many standard C++ libraries these facets are specialized for each
|
||||
specific character that the standard library supports, so an attempt to create a new facet would
|
||||
fail as it is not specialized.
|
||||
|
||||
These are exactly the reasons why Boost.Locale fails with current limited C++11 characters support on GCC-4.5 (the second reason)
|
||||
and MSVC-2010 (the first reason)
|
||||
|
||||
Basically it is impossible to use non-C++ characters with the C++'s locales framework.
|
||||
|
||||
The best and the most portable solution is to use the C++'s \c char type and UTF-8 encodings.
|
||||
|
||||
*/
|
||||
|
||||
@@ -7,32 +7,30 @@
|
||||
/*!
|
||||
\page status_of_cpp0x_characters_support Status of C++11 char16_t/char32_t support
|
||||
|
||||
The support of C++11 \c char16_t and \c char32_t is experimental, mostly does not work, and is not
|
||||
intended to be used in production with the latest compilers: GCC-4.5, MSVC10 until major
|
||||
compiler flaws are fixed.
|
||||
The support of C++11 \c char16_t and \c char32_t is experimental and is not
|
||||
intended to be used in production until various compiler/standard library flaws are fixed.
|
||||
|
||||
\section status_of_cpp0x_characters_support_gnu GNU GCC 4.5/C++11 Status
|
||||
Many recent C++ compilers provide decent support of C++11 characters, however often:
|
||||
|
||||
GNU C++ compiler provides decent support of C++11 characters however:
|
||||
|
||||
-# Standard library does not install any std::locale::facets for this support so any attempt
|
||||
-# The standard library does not install any std::locale::facets for this support so any attempt
|
||||
to format numbers using \c char16_t or \c char32_t streams would just fail.
|
||||
-# Standard library misses specialization for required \c char16_t/char32_t locale facets,
|
||||
-# The standard library misses specialization for required \c char16_t/char32_t locale facets,
|
||||
so "std" backends is not build-able as essential symbols missing, also \c codecvt facet
|
||||
can't be created as well.
|
||||
|
||||
\section status_of_cpp0x_characters_support_msvc Visual Studio 2010 (MSVC10)/C++11 Status
|
||||
\section status_of_cpp0x_characters_support_msvc Visual Studio
|
||||
|
||||
MSVC provides all required facets however:
|
||||
MSVC provides all required facets since VS 2010 however:
|
||||
|
||||
-# Standard library does not provide installations of std::locale::id for these facets
|
||||
-# The standard library does not provide installations of std::locale::id for these facets
|
||||
in DLL so it is not usable with \c /MD, \c /MDd compiler flags and requires static link of the runtime
|
||||
library.
|
||||
-# \c char16_t and \c char32_t are not distinct types but rather aliases of unsigned short and unsigned
|
||||
types which contradicts to C++11 requirements making it impossible to write \c char16_t/char32_t to stream
|
||||
and causing multiple faults.
|
||||
|
||||
If you want to build or test Boost.Locale with C++11 char16_t and char32_t support you should pass `cxxflags="-DBOOST_LOCALE_ENABLE_CHAR32_T -DBOOST_LOCALE_ENABLE_CHAR16_T"` to `b2` during build and define `BOOST_LOCALE_ENABLE_CHAR32_T` and `BOOST_LOCALE_ENABLE_CHAR32_T` when using Boost.Locale
|
||||
If you want to build or test Boost.Locale with C++11 char16_t and char32_t support
|
||||
you should pass `define=BOOST_LOCALE_ENABLE_CHAR32_T define=BOOST_LOCALE_ENABLE_CHAR16_T` to `b2` during build and define `BOOST_LOCALE_ENABLE_CHAR32_T` and `BOOST_LOCALE_ENABLE_CHAR32_T` when using Boost.Locale
|
||||
|
||||
*/
|
||||
|
||||
|
||||
@@ -94,7 +94,7 @@ problems with this.
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Non UTF-8 encodings</th>
|
||||
<td>Yes</td><td>Yes</td><td>No</td><td>Yes</td>
|
||||
<td>Yes</td><td>Yes</td><td>Yes</td><td>Yes</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Date/Time Formatting/Parsing</th>
|
||||
@@ -132,10 +132,6 @@ problems with this.
|
||||
<th>Unicode Normalization</th>
|
||||
<td>Yes</td><td>No</td><td>Vista and above</td><td>No</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>C++11 characters</th>
|
||||
<td>Yes</td><td>No</td><td>No</td><td>Yes</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>OS Support</th>
|
||||
<td>Any</td><td>Linux, Mac OS X</td><td>Windows, Cygwin</td><td>Any</td>
|
||||
|
||||
Reference in New Issue
Block a user