mirror of
https://github.com/boostorg/locale.git
synced 2026-01-19 04:22:08 +00:00
100 lines
4.0 KiB
Plaintext
100 lines
4.0 KiB
Plaintext
//
|
|
// Copyright (c) 2009-2011 Artyom Beilis (Tonkikh)
|
|
//
|
|
// Distributed under the Boost Software License, Version 1.0.
|
|
// https://www.boost.org/LICENSE_1_0.txt
|
|
|
|
/*!
|
|
\page conversions Text Conversions
|
|
|
|
Boost.Locale provides several functions for basic string manipulation:
|
|
|
|
- \ref boost::locale::to_upper "to_upper": convert a string to upper case
|
|
- \ref boost::locale::to_upper "to_lower": convert a string to lower case
|
|
- \ref boost::locale::to_title "to_title": convert a string to title case
|
|
- \ref boost::locale::fold_case "fold_case": makes a string case-agnostic (see \ref term_case_folding "case folding")
|
|
- \ref boost::locale::normalize "normalize": convert equivalent code points to a consistent binary form (\ref term_normalization "normalization")
|
|
|
|
All these functions receive an \c std::locale object as parameter or use a global locale by default.
|
|
|
|
Global locale is used in all examples below.
|
|
|
|
\section conversions_case Case Handing
|
|
|
|
For example:
|
|
\code
|
|
std::string grussen = "grüßEN";
|
|
std::cout <<"Upper "<< boost::locale::to_upper(grussen) << std::endl
|
|
<<"Lower "<< boost::locale::to_lower(grussen) << std::endl
|
|
<<"Title "<< boost::locale::to_title(grussen) << std::endl
|
|
<<"Fold "<< boost::locale::fold_case(grussen) << std::endl;
|
|
\endcode
|
|
|
|
Would print:
|
|
|
|
\verbatim
|
|
Upper GRÜSSEN
|
|
Lower grüßen
|
|
Title Grüßen
|
|
Fold grüssen
|
|
\endverbatim
|
|
|
|
There are existing functions \c to_upper and \c to_lower in the Boost.StringAlgo library, however the
|
|
Boost.Locale functions operate on an entire string instead of performing incorrect character-by-character conversions.
|
|
|
|
For example:
|
|
|
|
\code
|
|
std::wstring grussen = L"grüßen";
|
|
std::wcout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl;
|
|
\endcode
|
|
|
|
Would give in output:
|
|
|
|
\verbatim
|
|
GRÜßEN GRÜSSEN
|
|
\endverbatim
|
|
|
|
Where a letter "ß" was not converted correctly to double-S in first case because of a limitation of \c std::ctype facet.
|
|
|
|
This is even more problematic in case of UTF-8 encodings where non US-ASCII are not converted at all.
|
|
For example, this code:
|
|
|
|
\code
|
|
std::string grussen = "grüßen";
|
|
std::cout << boost::algorithm::to_upper_copy(grussen) << " " << boost::locale::to_upper(grussen) << std::endl;
|
|
\endcode
|
|
|
|
Would modify ASCII characters only
|
|
|
|
\verbatim
|
|
GRüßEN GRÜSSEN
|
|
\endverbatim
|
|
|
|
\section conversions_normalization Unicode Normalization
|
|
|
|
Unicode normalization is the process of converting strings to a standard form, suitable for text processing and
|
|
comparison. For example, character "ü" can be represented by a single code point or a combination of the character "u" and the
|
|
diaeresis "¨". Normalization is an important part of Unicode text processing.
|
|
|
|
Unicode defines four normalization forms. Each specific form is selected by a flag passed
|
|
to \ref boost::locale::normalize() "normalize" function:
|
|
|
|
- NFD - Canonical decomposition - boost::locale::norm_nfd
|
|
- NFC - Canonical decomposition followed by canonical composition - boost::locale::norm_nfc or boost::locale::norm_default
|
|
- NFKD - Compatibility decomposition - boost::locale::norm_nfkd
|
|
- NFKC - Compatibility decomposition followed by canonical composition - boost::locale::norm_nfkc
|
|
|
|
For more details on normalization forms, read <a href="http://unicode.org/reports/tr15/#Norm_Forms">this report on unicode.org</a>.
|
|
|
|
\section conversions_notes Notes
|
|
|
|
- \ref boost::locale::normalize() "normalize" operates only on Unicode-encoded strings, i.e.: UTF-8, UTF-16 and UTF-32 depending on the
|
|
character width. So be careful when using non-UTF encodings as they may be treated incorrectly.
|
|
- \ref boost::locale::fold_case() "fold_case" is generally a locale-independent operation, but it receives a locale as a parameter to
|
|
determine the 8-bit encoding.
|
|
- All of these functions can work with an STL string, a NULL terminated string, or a range defined by two pointers. They always
|
|
return a newly created STL string.
|
|
- The length of the string may change; see the above example.
|
|
*/
|