2
0
mirror of https://github.com/boostorg/nowide.git synced 2026-01-19 04:22:12 +00:00
Files
nowide/doc/html/index.html

310 lines
23 KiB
HTML

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/xhtml;charset=UTF-8"/>
<meta http-equiv="X-UA-Compatible" content="IE=9"/>
<meta name="generator" content="Doxygen 1.8.7"/>
<title>Boost.Nowide: Boost.Nowide</title>
<link href="tabs.css" rel="stylesheet" type="text/css"/>
<script type="text/javascript" src="jquery.js"></script>
<script type="text/javascript" src="dynsections.js"></script>
<link href="doxygen.css" rel="stylesheet" type="text/css" />
</head>
<body>
<div id="top"><!-- do not remove this div, it is closed by doxygen! -->
<div id="titlearea">
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 56px;">
<td style="padding-left: 0.5em;">
<div id="projectname">Boost.Nowide
</div>
</td>
</tr>
</tbody>
</table>
</div>
<!-- end header part -->
<!-- Generated by Doxygen 1.8.7 -->
<div id="navrow1" class="tabs">
<ul class="tablist">
<li class="current"><a href="index.html"><span>Main&#160;Page</span></a></li>
<li><a href="namespaces.html"><span>Namespaces</span></a></li>
<li><a href="annotated.html"><span>Classes</span></a></li>
<li><a href="files.html"><span>Files</span></a></li>
</ul>
</div>
</div><!-- top -->
<div class="header">
<div class="headertitle">
<div class="title">Boost.Nowide </div> </div>
</div><!--header-->
<div class="contents">
<div class="textblock"><p>Table of Contents:</p>
<ul>
<li><a class="el" href="index.html#main">What is Boost.Nowide</a></li>
<li><a class="el" href="index.html#main_rationale">Rationale</a><ul>
<li><a class="el" href="index.html#main_the_problem">The Problem</a></li>
<li><a class="el" href="index.html#main_the_solution">The Solution</a></li>
<li><a class="el" href="index.html#main_wide">Why Not Narrow and Wide?</a></li>
<li><a class="el" href="index.html#main_reading">Further Reading</a></li>
</ul>
</li>
<li><a class="el" href="index.html#using">Using The Library</a><ul>
<li><a class="el" href="index.html#using_standard">Standard Features</a></li>
<li><a class="el" href="index.html#using_custom">Custom API</a></li>
<li><a class="el" href="index.html#using_integration">Integration with Boost.Filesystem</a></li>
</ul>
</li>
<li><a class="el" href="index.html#technical">Technical Details</a><ul>
<li><a class="el" href="index.html#technical_imple">Windows vs POSIX</a></li>
<li><a class="el" href="index.html#technical_cio">Console I/O</a></li>
</ul>
</li>
<li><a class="el" href="index.html#qna">Q &amp; A</a></li>
<li><a class="el" href="index.html#standalone_version">Standalone Version</a></li>
<li><a class="el" href="index.html#sources">Sources and Downloads</a></li>
</ul>
<h1><a class="anchor" id="main"></a>
What is Boost.Nowide</h1>
<p>Boost.Nowide is a library implemented by Artyom Beilis that makes cross platform Unicode aware programming easier.</p>
<p>The library provides an implementation of standard C and C++ library functions, such that their inputs are UTF-8 aware on Windows without requiring to use Wide API.</p>
<h1><a class="anchor" id="main_rationale"></a>
Rationale</h1>
<h2><a class="anchor" id="main_the_problem"></a>
The Problem</h2>
<p>Consider a simple application that splits a big file into chunks, such that they can be sent by e-mail. It requires doing a few very simple tasks:</p>
<ul>
<li>Access command line arguments: <code>int main(int argc,char **argv)</code></li>
<li>Open an input file, open several output files: <code>std::fstream::open(char const *,std::ios::openmode m)</code></li>
<li>Remove the files in case of fault: <code>std::remove(char const *file)</code></li>
<li>Print a progress report onto the console: <code>std::cout &lt;&lt; file_name </code></li>
</ul>
<p>Unfortunately it is impossible to implement this simple task in plain C++ if the file names contain non-ASCII characters.</p>
<p>The simple program that uses the API would work on the systems that use UTF-8 internally &ndash; the vast majority of Unix-Line operating systems: Linux, Mac OS X, Solaris, BSD. But it would fail on files like <code>War and Peace - Война и мир - מלחמה ושלום.zip</code> under Microsoft Windows because the native Windows Unicode aware API is Wide-API &ndash; UTF-16.</p>
<p>This incredibly trivial task is very hard to implement in a cross platform manner.</p>
<h2><a class="anchor" id="main_the_solution"></a>
The Solution</h2>
<p>Boost.Nowide provides a set of standard library functions that are UTF-8 aware and makes Unicode aware programming easier.</p>
<p>The library provides:</p>
<ul>
<li>Easy to use functions for converting UTF-8 to/from UTF-16</li>
<li>A class to make the <code>argc</code>, <code>argc</code> and <code>env</code> parameters of <code>main</code> use UTF-8</li>
<li>UTF-8 aware functions<ul>
<li><code>stdio.h</code> functions:<ul>
<li><code>fopen</code> </li>
<li><code>freopen</code> </li>
<li><code>remove</code> </li>
<li><code>rename</code> </li>
</ul>
</li>
<li><code>stdlib.h</code> functions:<ul>
<li><code>system</code> </li>
<li><code>getenv</code> </li>
<li><code>setenv</code> </li>
<li><code>unsetenv</code> </li>
<li><code>putenv</code> </li>
</ul>
</li>
<li><code>fstream</code> <ul>
<li><code>filebuf</code> </li>
<li><code>fstream/ofstream/ifstream</code> </li>
</ul>
</li>
<li><code>iostream</code> <ul>
<li><code>cout</code> </li>
<li><code>cerr</code> </li>
<li><code>clog</code> </li>
<li><code>cin</code> </li>
</ul>
</li>
</ul>
</li>
</ul>
<h2><a class="anchor" id="main_wide"></a>
Why Not Narrow and Wide?</h2>
<p>Why not provide both Wide and Narrow implementations so the developer can choose to use Wide characters on Unix-like platforms?</p>
<p>Several reasons:</p>
<ul>
<li><code>wchar_t</code> is not really portable, it can be 2 bytes, 4 bytes or even 1 byte making Unicode aware programming harder</li>
<li>The C and C++ standard libraries use narrow strings for OS interactions. This library follows the same general rule. There is no such thing as <code>fopen(wchar_t const *,wchar_t const *)</code> in the standard library, so it is better to stick to the standards rather than re-implement Wide API in "Microsoft Windows Style"</li>
</ul>
<h2><a class="anchor" id="main_reading"></a>
Further Reading</h2>
<ul>
<li><a href="http://www.utf8everywhere.org/">www.utf8everywhere.org</a></li>
<li><a href="http://alfps.wordpress.com/2011/11/22/unicode-part-1-windows-console-io-approaches/">Windows console i/o approaches</a></li>
</ul>
<h1><a class="anchor" id="using"></a>
Using The Library</h1>
<h2><a class="anchor" id="using_standard"></a>
Standard Features</h2>
<p>The library is mostly header only, only console I/O requires separate compilation under Windows.</p>
<p>As a developer you are expected to use <code><a class="el" href="namespaceboost_1_1nowide.html" title="This namespace includes implementation of the standard library functios such that they accept UTF-8 s...">boost::nowide</a></code> functions instead of the functions available in the <code>std</code> namespace.</p>
<p>For example, here is a Unicode unaware implementation of a line counter: </p><div class="fragment"><div class="line"><span class="preprocessor">#include &lt;fstream&gt;</span></div>
<div class="line"><span class="preprocessor">#include &lt;iostream&gt;</span></div>
<div class="line"></div>
<div class="line"><span class="keywordtype">int</span> main(<span class="keywordtype">int</span> argc,<span class="keywordtype">char</span> **argv)</div>
<div class="line">{</div>
<div class="line"> <span class="keywordflow">if</span>(argc!=2) {</div>
<div class="line"> std::cerr &lt;&lt; <span class="stringliteral">&quot;Usage: file_name&quot;</span> &lt;&lt; std::endl;</div>
<div class="line"> <span class="keywordflow">return</span> 1;</div>
<div class="line"> }</div>
<div class="line"></div>
<div class="line"> std::ifstream f(argv[1]);</div>
<div class="line"> <span class="keywordflow">if</span>(!f) {</div>
<div class="line"> std::cerr &lt;&lt; <span class="stringliteral">&quot;Can&#39;t open &quot;</span> &lt;&lt; argv[1] &lt;&lt; std::endl;</div>
<div class="line"> <span class="keywordflow">return</span> 1;</div>
<div class="line"> }</div>
<div class="line"> <span class="keywordtype">int</span> total_lines = 0;</div>
<div class="line"> <span class="keywordflow">while</span>(f) {</div>
<div class="line"> <span class="keywordflow">if</span>(f.get() == <span class="charliteral">&#39;\n&#39;</span>)</div>
<div class="line"> total_lines++;</div>
<div class="line"> }</div>
<div class="line"> f.close();</div>
<div class="line"> std::cout &lt;&lt; <span class="stringliteral">&quot;File &quot;</span> &lt;&lt; argv[1] &lt;&lt; <span class="stringliteral">&quot; has &quot;</span> &lt;&lt; total_lines &lt;&lt; <span class="stringliteral">&quot; lines&quot;</span> &lt;&lt; std::endl;</div>
<div class="line"> <span class="keywordflow">return</span> 0;</div>
<div class="line">}</div>
</div><!-- fragment --><p>To make this program handle Unicode properly, we do the following changes:</p>
<div class="fragment"><div class="line"><span class="preprocessor">#include &lt;boost/nowide/args.hpp&gt;</span></div>
<div class="line"><span class="preprocessor">#include &lt;boost/nowide/fstream.hpp&gt;</span></div>
<div class="line"><span class="preprocessor">#include &lt;boost/nowide/iostream.hpp&gt;</span></div>
<div class="line"></div>
<div class="line"><span class="keywordtype">int</span> main(<span class="keywordtype">int</span> argc,<span class="keywordtype">char</span> **argv)</div>
<div class="line">{</div>
<div class="line"> <a class="code" href="classboost_1_1nowide_1_1args.html">boost::nowide::args</a> a(argc,argv); <span class="comment">// Fix arguments - make them UTF-8</span></div>
<div class="line"> <span class="keywordflow">if</span>(argc!=2) {</div>
<div class="line"> <a class="code" href="namespaceboost_1_1nowide.html#a1c43cbf142f4e42edb0fea6044d40bcb">boost::nowide::cerr</a> &lt;&lt; <span class="stringliteral">&quot;Usage: file_name&quot;</span> &lt;&lt; std::endl; <span class="comment">// Unicode aware console</span></div>
<div class="line"> <span class="keywordflow">return</span> 1;</div>
<div class="line"> }</div>
<div class="line"></div>
<div class="line"> <a class="code" href="classboost_1_1nowide_1_1basic__ifstream.html">boost::nowide::ifstream</a> f(argv[1]); <span class="comment">// argv[1] - is UTF-8</span></div>
<div class="line"> <span class="keywordflow">if</span>(!f) {</div>
<div class="line"> <span class="comment">// the console can display UTF-8</span></div>
<div class="line"> <a class="code" href="namespaceboost_1_1nowide.html#a1c43cbf142f4e42edb0fea6044d40bcb">boost::nowide::cerr</a> &lt;&lt; <span class="stringliteral">&quot;Can&#39;t open &quot;</span> &lt;&lt; argv[1] &lt;&lt; std::endl;</div>
<div class="line"> <span class="keywordflow">return</span> 1;</div>
<div class="line"> }</div>
<div class="line"> <span class="keywordtype">int</span> total_lines = 0;</div>
<div class="line"> <span class="keywordflow">while</span>(f) {</div>
<div class="line"> <span class="keywordflow">if</span>(f.get() == <span class="charliteral">&#39;\n&#39;</span>)</div>
<div class="line"> total_lines++;</div>
<div class="line"> }</div>
<div class="line"> f.close();</div>
<div class="line"> <span class="comment">// the console can display UTF-8</span></div>
<div class="line"> <a class="code" href="namespaceboost_1_1nowide.html#a3150e32a8082927f4843791c1dbc3587">boost::nowide::cout</a> &lt;&lt; <span class="stringliteral">&quot;File &quot;</span> &lt;&lt; argv[1] &lt;&lt; <span class="stringliteral">&quot; has &quot;</span> &lt;&lt; total_lines &lt;&lt; <span class="stringliteral">&quot; lines&quot;</span> &lt;&lt; std::endl;</div>
<div class="line"> <span class="keywordflow">return</span> 0;</div>
<div class="line">}</div>
</div><!-- fragment --><p>This very simple and straightforward approach helps writing Unicode aware programs.</p>
<h2><a class="anchor" id="using_custom"></a>
Custom API</h2>
<p>Of course, this simple set of functions does not cover all needs. If you need to access Wide API from a Windows application that uses UTF-8 internally you can use functions like <code><a class="el" href="namespaceboost_1_1nowide.html#a6baacc1bb80c134a2ce37f13977b5500">boost::nowide::widen</a></code> and <code><a class="el" href="namespaceboost_1_1nowide.html#ac2b772caed760a75f611c2dc24153e4a">boost::nowide::narrow</a></code>.</p>
<p>For example: </p><div class="fragment"><div class="line">CopyFileW( <a class="code" href="namespaceboost_1_1nowide.html#a6baacc1bb80c134a2ce37f13977b5500">boost::nowide::widen</a>(existing_file).c_str(),</div>
<div class="line"> <a class="code" href="namespaceboost_1_1nowide.html#a6baacc1bb80c134a2ce37f13977b5500">boost::nowide::widen</a>(new_file).c_str(),</div>
<div class="line"> TRUE);</div>
</div><!-- fragment --><p>The conversion is done at the last stage, and you continue using UTF-8 strings everywhere else. You only switch to the Wide API at glue points.</p>
<p><code><a class="el" href="namespaceboost_1_1nowide.html#a6baacc1bb80c134a2ce37f13977b5500">boost::nowide::widen</a></code> returns <code>std::string</code>. Sometimes it is useful to prevent allocation and use on-stack buffers instead. Boost.Nowide provides the <code><a class="el" href="classboost_1_1nowide_1_1basic__stackstring.html" title="A class that allows to create a temporary wide or narrow UTF strings from wide or narrow UTF source...">boost::nowide::basic_stackstring</a></code> class for this purpose.</p>
<p>The example above could be rewritten as:</p>
<div class="fragment"><div class="line"><a class="code" href="classboost_1_1nowide_1_1basic__stackstring.html">boost::nowide::basic_stackstring&lt;wchar_t,char,64&gt;</a> wexisting_file,wnew_file;</div>
<div class="line"><span class="keywordflow">if</span>(!wexisting_file.convert(existing_file) || !wnew_file.convert(new_file)) {</div>
<div class="line"> <span class="comment">// invalid UTF-8</span></div>
<div class="line"> <span class="keywordflow">return</span> -1;</div>
<div class="line">}</div>
<div class="line"></div>
<div class="line">CopyFileW(wexisting_file.c_str(),wnew_file.c_str(),TRUE);</div>
</div><!-- fragment --><dl class="section note"><dt>Note</dt><dd>There are a few convenience typedefs: <code>stackstring</code> and <code>wstackstring</code> using 256-character buffers, and <code>short_stackstring</code> and <code>wshort_stackstring</code> using 16-character buffers. If the string is longer, they fall back to memory allocation.</dd></dl>
<h2><a class="anchor" id="using_windows_h"></a>
The windows.h header</h2>
<p>The library does not include the <code>windows.h</code> in order to prevent namespace pollution with numerous defines and types. Instead, the library defines the prototypes of the Win32 API functions.</p>
<p>However, you may request to use the <code>windows.h</code> header by defining <code>BOOST_USE_WINDOWS_H</code> before including any of the Boost.Nowide headers</p>
<h2><a class="anchor" id="using_integration"></a>
Integration with Boost.Filesystem</h2>
<p>Boost.Filesystem supports selection of narrow encoding. Unfortunatelly the default narrow encoding on Windows isn't UTF-8, you can enable UTF-8 as default encoding on Boost.Filesystem by calling <code><a class="el" href="namespaceboost_1_1nowide.html#a7f94a60d0a9e5534a6dcc41bbc826ee8">boost::nowide::nowide_filesystem()</a></code> in the beginning of your program</p>
<h1><a class="anchor" id="technical"></a>
Technical Details</h1>
<h2><a class="anchor" id="technical_imple"></a>
Windows vs POSIX</h2>
<p>For Microsoft Windows, the library provides UTF-8 aware variants of some <code>std:</code>: functions in the <code><a class="el" href="namespaceboost_1_1nowide.html" title="This namespace includes implementation of the standard library functios such that they accept UTF-8 s...">boost::nowide</a></code> namespace. For example, <code>std::fopen</code> becomes <code><a class="el" href="namespaceboost_1_1nowide.html#a45d432e5684010f865702af97d47f087" title="Same as fopen but file_name and mode are UTF-8 strings. ">boost::nowide::fopen</a></code>.</p>
<p>Under POSIX platforms, the functions in <a class="el" href="namespaceboost_1_1nowide.html" title="This namespace includes implementation of the standard library functios such that they accept UTF-8 s...">boost::nowide</a> are aliases of their standard library counterparts:</p>
<div class="fragment"><div class="line"><span class="keyword">namespace </span>boost {</div>
<div class="line"><span class="keyword">namespace </span>nowide {</div>
<div class="line"><span class="preprocessor">#ifdef BOOST_WINDOWS</span></div>
<div class="line"><span class="keyword">inline</span> FILE *<a class="code" href="namespaceboost_1_1nowide.html#a45d432e5684010f865702af97d47f087">fopen</a>(<span class="keywordtype">char</span> <span class="keyword">const</span> *name,<span class="keywordtype">char</span> <span class="keyword">const</span> *mode)</div>
<div class="line">{</div>
<div class="line"> ...</div>
<div class="line">}</div>
<div class="line"><span class="preprocessor">#else</span></div>
<div class="line"><span class="keyword">using</span> std::fopen</div>
<div class="line"><span class="preprocessor">#endif</span></div>
<div class="line">} <span class="comment">// nowide</span></div>
<div class="line">} <span class="comment">// boost</span></div>
</div><!-- fragment --><h2><a class="anchor" id="technical_cio"></a>
Console I/O</h2>
<p>Console I/O is implemented as a wrapper around ReadConsoleW/WriteConsoleW (used when the stream goes to the "real" console) and ReadFile/WriteFile (used when the stream was piped/redirected).</p>
<p>This approach eliminates a need of manual code page handling. If TrueType fonts are used the Unicode aware input and output works as intended.</p>
<h1><a class="anchor" id="qna"></a>
Q &amp; A</h1>
<p><b>Q: Why doesn't the library convert the string to/from the locale's encoding (instead of UTF-8) on POSIX systems?</b></p>
<p>A: It is inherently incorrect to convert strings to/from locale encodings on POSIX platforms.</p>
<p>You can create a file named "\xFF\xFF.txt" (invalid UTF-8), remove it, pass its name as a parameter to a program and it would work whether the current locale is UTF-8 or not. Also, changing the locale from let's say <code>en_US.UTF-8</code> to <code>en_US.ISO-8859-1</code> would not magically change all files in the OS or the strings a user may pass to the program (which is different on Windows)</p>
<p>POSIX OSs treat strings as <code>NULL</code> terminated cookies.</p>
<p>So altering their content according to the locale would actually lead to incorrect behavior.</p>
<p>For example, this is a naive implementation of a standard program "rm"</p>
<div class="fragment"><div class="line"><span class="preprocessor">#include &lt;cstdio&gt;</span></div>
<div class="line"></div>
<div class="line"><span class="keywordtype">int</span> main(<span class="keywordtype">int</span> argc,<span class="keywordtype">char</span> **argv)</div>
<div class="line">{</div>
<div class="line"> <span class="keywordflow">for</span>(<span class="keywordtype">int</span> i=1;i&lt;argc;i++)</div>
<div class="line"> std::remove(argv[i]);</div>
<div class="line"> <span class="keywordflow">return</span> 0;</div>
<div class="line">}</div>
</div><!-- fragment --><p>It would work with ANY locale and changing the strings would lead to incorrect behavior.</p>
<p>The meaning of a locale under POSIX and Windows platforms is different and has very different effects.</p>
<h2><a class="anchor" id="standalone_version"></a>
Standalone Version</h2>
<p>It is possible to use Nowide library without having the huge Boost project as a dependency. There is a standalone version that has all the functionality in the <code>nowide</code> namespace instead of <code><a class="el" href="namespaceboost_1_1nowide.html" title="This namespace includes implementation of the standard library functios such that they accept UTF-8 s...">boost::nowide</a></code>. The example above would look like</p>
<div class="fragment"><div class="line"><span class="preprocessor">#include &lt;nowide/args.hpp&gt;</span></div>
<div class="line"><span class="preprocessor">#include &lt;nowide/fstream.hpp&gt;</span></div>
<div class="line"><span class="preprocessor">#include &lt;nowide/iostream.hpp&gt;</span></div>
<div class="line"></div>
<div class="line"><span class="keywordtype">int</span> main(<span class="keywordtype">int</span> argc,<span class="keywordtype">char</span> **argv)</div>
<div class="line">{</div>
<div class="line"> nowide::args a(argc,argv); <span class="comment">// Fix arguments - make them UTF-8</span></div>
<div class="line"> <span class="keywordflow">if</span>(argc!=2) {</div>
<div class="line"> nowide::cerr &lt;&lt; <span class="stringliteral">&quot;Usage: file_name&quot;</span> &lt;&lt; std::endl; <span class="comment">// Unicode aware console</span></div>
<div class="line"> <span class="keywordflow">return</span> 1;</div>
<div class="line"> }</div>
<div class="line"></div>
<div class="line"> nowide::ifstream f(argv[1]); <span class="comment">// argv[1] - is UTF-8</span></div>
<div class="line"> <span class="keywordflow">if</span>(!f) {</div>
<div class="line"> <span class="comment">// the console can display UTF-8</span></div>
<div class="line"> nowide::cerr &lt;&lt; <span class="stringliteral">&quot;Can&#39;t open a file &quot;</span> &lt;&lt; argv[1] &lt;&lt; std::endl;</div>
<div class="line"> <span class="keywordflow">return</span> 1;</div>
<div class="line"> }</div>
<div class="line"> <span class="keywordtype">int</span> total_lines = 0;</div>
<div class="line"> <span class="keywordflow">while</span>(f) {</div>
<div class="line"> <span class="keywordflow">if</span>(f.get() == <span class="charliteral">&#39;\n&#39;</span>)</div>
<div class="line"> total_lines++;</div>
<div class="line"> }</div>
<div class="line"> f.close();</div>
<div class="line"> <span class="comment">// the console can display UTF-8</span></div>
<div class="line"> nowide::cout &lt;&lt; <span class="stringliteral">&quot;File &quot;</span> &lt;&lt; argv[1] &lt;&lt; <span class="stringliteral">&quot; has &quot;</span> &lt;&lt; total_lines &lt;&lt; <span class="stringliteral">&quot; lines&quot;</span> &lt;&lt; std::endl;</div>
<div class="line"> <span class="keywordflow">return</span> 0;</div>
<div class="line">}</div>
</div><!-- fragment --> <h2><a class="anchor" id="sources"></a>
Sources and Downloads</h2>
<p>The upstream sources can be found at GitHub: <a href="https://github.com/boostorg/nowide">https://github.com/boostorg/nowide</a></p>
<p>You can download the latest sources there:</p>
<ul>
<li>Standard Version: <a href="https://github.com/boostorg/nowide/archive/master.zip">nowide-master.zip</a></li>
<li>Standalone Boost independent version <a href="../nowide_standalone.zip">nowide_standalone.zip</a> </li>
</ul>
</div></div><!-- contents -->
<!-- start footer part -->
<hr class="footer"/><address class="footer"><small>
Generated by &#160;<a href="http://www.doxygen.org/index.html">
<img class="footer" src="doxygen.png" alt="doxygen"/>
</a> 1.8.7
</small></address>
</body>
</html>