mirror of
https://github.com/boostorg/filesystem.git
synced 2026-02-23 03:32:18 +00:00
426 lines
24 KiB
HTML
426 lines
24 KiB
HTML
<html>
|
|
|
|
<head>
|
|
<meta http-equiv="Content-Language" content="en-us">
|
|
<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
|
|
<meta name="ProgId" content="FrontPage.Editor.Document">
|
|
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
|
|
<title>1.34 (Internationalization) Changes</title>
|
|
</head>
|
|
|
|
<body bgcolor="#FFFFFF">
|
|
|
|
<h1>1.34 (Internationalization) Changes</h1>
|
|
<h2>Introduction</h2>
|
|
<p>This release is a major upgrade for the Filesystem Library, in preparation
|
|
for submission to the C++ Standards Committee. Features of this release
|
|
include:</p>
|
|
<ul>
|
|
<li><a href="#Internationalization">Internationalization</a>, provided by
|
|
class templates <i>basic_path</i>, <i>basic_filesystem_error</i>, <i>
|
|
basic_directory_iterator</i>, and <i>basic_directory_entry</i>.<br>
|
|
</li>
|
|
<li><a href="#Simplification">Simplification</a> of the path interface,
|
|
including elimination of distinction between native and generic formats,
|
|
and separation of name checking functionality from general path functionality.
|
|
Also simplification of <i>basic_filesystem_error</i>.<br>
|
|
</li>
|
|
<li><a href="#Rationalization">Rationalization</a> of predicate function
|
|
design, including the addition of several new functions.<br>
|
|
</li>
|
|
<li>Clearer specification by reference to [<a href="design.htm#POSIX-01">POSIX-01</a>],
|
|
the ISO/IEEE Single Unix Standard, with provisions for Windows and other
|
|
operating systems.<br>
|
|
</li>
|
|
<li><a href="#Preservation">Preservation</a> of existing user code whenever
|
|
possible.<br>
|
|
</li>
|
|
<li><a href="#More_efficient">More efficient operations</a> when iterating over directories.<br>
|
|
</li>
|
|
<li>A <a href="reference.html#recursive_directory_iterator">recursive
|
|
directory iterator</a> is now provided. </li>
|
|
</ul>
|
|
<p><a href="#Rationale">Rationale</a> for some of the changes is also provided.</p>
|
|
<h2><a name="Internationalization">Internationalization</a></h2>
|
|
<p>Cass templates <i>basic_path</i>, <i>basic_filesystem_error</i>, and <i>
|
|
basic_directory_iterator</i> provide the basic mechanisms for
|
|
internationalization, in ways very similar to the C++ Standard Library's <i>
|
|
basic_string</i> and similar class templates. The following typedefs are also
|
|
provided:</p>
|
|
<blockquote>
|
|
<pre>typedef basic_path<std::string, ...> path;
|
|
typedef basic_path<std::wstring, ...> wpath;
|
|
|
|
typedef basic_filesystem_error<path> filesystem_error;
|
|
typedef basic_filesystem_error<wpath> wfilesystem_error;
|
|
|
|
typedef basic_directory_iterator<path> directory_iterator;
|
|
typedef basic_directory_iterator<wpath> wdirectory_iterator;</pre>
|
|
</blockquote>
|
|
<p>The string type used by Boost.Filesystem <i>basic_path</i> (std::string,
|
|
std::wstring, or whatever) is called the <i>internal</i> string type. The string
|
|
type used by the operating system for paths (often char*, sometimes wchar_t*) is
|
|
called the <i>external</i> string type. Conversion between internal and external
|
|
types is performed by path traits classes. The specific conversions for <i>path</i>
|
|
and <i>wpath</i> is implementation defined, with normative encouragement to use
|
|
the operating system's preferred file system encoding. For many modern POSIX-based
|
|
file systems the <i>wpath</i> external encoding is <a href="design.htm#Kuhn">
|
|
UTF-8</a>, while for modern Windows file systems such as NTFS it is
|
|
<a href="http://en.wikipedia.org/wiki/UTF-16">UTF-16</a>.</p>
|
|
<p>The <a href="reference.html#Operations-functions">operational functions</a> in
|
|
<a href="../../../boost/filesystem/operations.hpp">operations.hpp</a> are provided with overloads for
|
|
<i>path</i>, <i>wpath</i>, and user-defined <i>basic_path</i>'s. A
|
|
<a href="reference.html#Requirements-on-implementations">"do-the-right-thing" rule</a>
|
|
applies to implementations, ensuring that the correct overload will be chosen.</p>
|
|
<h2><a name="Simplification">Simplification</a> of path interface</h2>
|
|
<p>Prior versions of the library required users of class <i>path</i> to identify
|
|
the format (native or generic) and name error-checking policy, either via a
|
|
second constructor argument or via a default mechanism. That approach caused
|
|
complaints, particularly from users not needing the name checking features. The
|
|
interface has now been simplified:</p>
|
|
<ul>
|
|
<li>The distinction between native and generic formats has been eliminated.
|
|
See <a href="#distinction">rationale</a>. Two argument forms of path
|
|
constructors are now deprecated, with the second argument having no effect.
|
|
These constructors are only provided to ease the transition of existing code.<br>
|
|
</li>
|
|
<li>Path name checking functionality has been moved out of class path and into
|
|
separate free-functions. This still provides name checking for those who need
|
|
it, but with much less impact on those who don't need it.</li>
|
|
</ul>
|
|
<p>Additionally,
|
|
<a href="reference.html#Class-template-basic_filesystem_error">basic_filesystem_error</a> has been put
|
|
on a diet and generally simplified.</p>
|
|
<p>Error codes have been moved to a separate library,
|
|
<a href="../../system/doc/index.html">Boost.System</a>.</p>
|
|
<p><code>"//:"</code> has been introduced as a path escape prefix to identify
|
|
native paths. Rationale: simplifies basic_path constructor interfaces, easier
|
|
use for platforms needing explicit native format identification.</p>
|
|
<h2><a name="Rationalization">Rationalization</a> of predicate functions</h2>
|
|
<p>In discussions and bug reports on the Boost developers mailing list, it
|
|
became obvious that Boost.Filesystem's exists(), symbolic_link_exists(), and
|
|
is_directory() predicate functions were poorly specified. There were suggestions
|
|
to add an is_accessible() function, but Peter Dimov argued that this amounted to
|
|
papering over the lack of a clear specification and would likely lead to future
|
|
problems.</p>
|
|
<p>Peter suggested that an interesting way to analyze the problem was to ask
|
|
what the expectations were for true and false values of the various predicates.
|
|
See the <a href="#table">table</a> below.</p>
|
|
<h3>status()</h3>
|
|
<p>As part of the predicate discussions, particularly with Rob Stewart, it
|
|
became obvious that sometimes applications need access to raw status information
|
|
without any possibility of an exception being thrown. The
|
|
<a href="reference.html#Status-functions">status()</a> function was added to meet this
|
|
need. It also proved clearer to specify the semantics of predicate functions in
|
|
terms of status().</p>
|
|
<h3><a name="is_file">is_file</a>()</h3>
|
|
<p>About the same time, Jeff Garland suggested that an
|
|
<a href="reference.html#Predicate-functions">is_file()</a> predicate would
|
|
compliment <a href="reference.html#Predicate-functions">is_directory()</a>. In working on the analysis below, it became obvious
|
|
that the expectations for is_file() were different from the expectations for !is_directory(),
|
|
so is_file() was added. </p>
|
|
<h3><a name="is_other">is_other</a>()</h3>
|
|
<p>On some operating systems, it is possible to have a directory entry which is
|
|
not for either a directory or a file. The
|
|
<a href="reference.html#Predicate-functions">is_other()</a>
|
|
function identifies such cases.</p>
|
|
<h3>Should predicates throw on errors?</h3>
|
|
<p>Some conditions reported by operating systems as errors (see
|
|
<a href="#Footnote">footnote</a>) clearly simply indicate that the predicate is
|
|
false, rather than indicating serious failure. But other errors represent
|
|
serious hardware or network problems, or permissions problems.</p>
|
|
<p>Some people, particularly Rob Stewart, argue that in a function like
|
|
<a href="reference.html#Predicate-functions">is_directory()</a>, any error should simply cause the function to return false. If
|
|
there is actually an underlying problem, it will be detected it due course when
|
|
a directory_iterator or fstream operation is attempted.</p>
|
|
<p>That view is was rejected because of the following considerations:</p>
|
|
<ul>
|
|
<li>As a general principle, the earlier errors can be reported, the better.
|
|
The rationale being that it is often much cheaper to fix errors sooner rather
|
|
than later. I've also had a lot of negative experiences where failure to
|
|
detect errors early caused a lot of pain and unhappy customers. Some of these
|
|
were directly caused by ignoring error returns from file system operations.<br>
|
|
</li>
|
|
<li>Analysis of existing programs indicated that as much as 30% of the use of
|
|
a predicate was not followed by directory_iterator or fstream operations on
|
|
the path in question. Instead, the applications performed reporting or
|
|
fall-back operations that would not fail, and thus were either misleading or
|
|
completely wrong if the <i>false</i> return value was in fact caused by
|
|
hardware or network failure, or permissions problems.</li>
|
|
</ul>
|
|
<p>However, the discussion did identify that there are valid cases where
|
|
non-throwing behavior is a requirement, and a programmer may prefer to deal with
|
|
file or directory attributes and errors at a very low, bit-mask, level. Function <a href="#status">status()</a>
|
|
was proposed to meet those needs.</p>
|
|
<h3><a name="Expectations">Expectations</a> <a name="table">table</a></h3>
|
|
<p>In the table below, <i>p</i> is a non-empty path.</p>
|
|
<p>Unless otherwise specified, all functions throw on hardware or general
|
|
failure errors, permission or access errors, symbolic link loop errors, and
|
|
invalid path errors. If an O/S fails to distinguish between error types,
|
|
predicate operations return false on such ambiguous errors.</p>
|
|
<p><i><b>Expectations</b></i> identify operations that are expected to succeed
|
|
or fail, assuming no hardware, permission, or access right errors, and no race
|
|
conditions.</p>
|
|
<table border="1" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">
|
|
<tr>
|
|
<td width="22%" align="center"><b><i>Expression</i></b></td>
|
|
<td width="48%" align="center"><b><i>Expectations</i></b></td>
|
|
<td width="108%" align="center"><b><i>Semantics</i></b></td>
|
|
</tr>
|
|
<tr>
|
|
<td width="22%">is_directory(p)</td>
|
|
<td width="48%">Returns true if p is found and is a directory, else false.<br>
|
|
If true, then directory_iterator(p) would succeed.<br>
|
|
If false, then directory_iterator(p) would fail.</td>
|
|
<td width="108%">Throws: if <a href="#status">status()</a> & error_flag<br>
|
|
Returns: status() & directory_flag</td>
|
|
</tr>
|
|
<tr>
|
|
<td width="22%">is_file(p)</td>
|
|
<td width="48%">Returns true if p is found and is not a directory, else
|
|
false.<br>
|
|
If true, then ifstream(p) would succeed.<br>
|
|
False, however, does not imply ifstream(p) would fail (because some
|
|
operating systems allow directories to be opened as files, but stat() does
|
|
set the "regular file" flag.)</td>
|
|
<td width="108%">Throws: if status() & error_flag<br>
|
|
Returns: status() & file_flag</td>
|
|
</tr>
|
|
<tr>
|
|
<td width="22%">exists(p) </td>
|
|
<td width="48%">Returns is_directory(p) || is_file(p) || is_other(p)</td>
|
|
<td width="108%">Throws: if status() & error_flag<br>
|
|
Returns: status() & (directory_flag|file_flag|other_flag)</td>
|
|
</tr>
|
|
<tr>
|
|
<td width="22%">is_symlink(p)</td>
|
|
<td width="48%">Returns true if p is found by shallow (non-transitive)
|
|
search, and is a symbolic link, else false.<br>
|
|
If true, and p points to q, then for any filesystem function f except those
|
|
specified as working shallowly on symlinks themselves, f(p) calls f(q), and
|
|
returns any value returned by f(q).</td>
|
|
<td width="108%">Throws: if <a href="#status">symlink_status</a>() &
|
|
error_flag<br>
|
|
Returns: symlink_status() & symlink_flag</td>
|
|
</tr>
|
|
<tr>
|
|
<td width="22%">!exists(p) && ((p.has_branch_path() && exists( p.branch_path())
|
|
|| (!p.has_branch_path() && !p.has_root_path()))<br>
|
|
<i>In other words, if the path does not exist, and (the branch does exist,
|
|
or (there is no branch and no root)).</i></td>
|
|
<td width="48%">If true, create_directory(p) would succeed.<br>
|
|
If true, ofstream(p) would succeed.<br>
|
|
</td>
|
|
<td width="108%"> </td>
|
|
</tr>
|
|
<tr>
|
|
<td width="22%">directory_iterator it(p)</td>
|
|
<td width="48%">If it != directory_iterator(), assert(exists(*it)||is_symlink(*it)).
|
|
Note: exists(*it) may throw, and likewise status(*it) may return error_flag
|
|
- there is no guarantee of accessibility.</td>
|
|
<td width="108%"> </td>
|
|
</tr>
|
|
</table>
|
|
<h3><a name="Conclusion">Conclusion</a></h3>
|
|
<p>Predicate operations is_directory(), is_file(), is_symlink(), and exists()
|
|
with the indicated semantics form a self-consistent set that meets expectations.</p>
|
|
<h2><a name="Preservation">Preservation</a> of existing user code</h2>
|
|
<p>Although the change to a template based approach required a complete overhaul
|
|
of the implementation code, the interface as used by existing applications is mostly unchanged.
|
|
Conversion problems which would
|
|
otherwise affect user code have been reduced by providing deprecated
|
|
functions to ease transition. The deprecated functions are:</p>
|
|
<blockquote>
|
|
<pre>// class basic_path - 2nd constructor argument ignored:
|
|
basic_path( const string_type & str, name_check );
|
|
basic_path( const typename string_type::value_type * s, name_check );
|
|
|
|
// class basic_path - old names provided for renamed functions:
|
|
string_type native_file_string() const;
|
|
string_type native_directory_string() const;
|
|
|
|
// class basic_path - now defined such that these no longer have any real effect:
|
|
static bool default_name_check_writable() { return false; }
|
|
static void default_name_check( name_check ) {}
|
|
static name_check default_name_check() { return 0; }
|
|
|
|
// non-deducible operations functions assume class path
|
|
inline path current_path()
|
|
inline const path & initial_path()
|
|
|
|
// the new basic_directory_entry provides leaf()
|
|
// to cover the common existing use case itr->leaf()
|
|
typename Path::string_type leaf() const;</pre>
|
|
</blockquote>
|
|
<p>If you do not want the deprecated functions to be included, define the macro BOOST_FILESYSTEM_NO_DEPRECATED.</p>
|
|
<p>The greatest impact on existing code is the change of directory iterator
|
|
value type from <code>path</code> to <code>directory_entry</code>. To ease the
|
|
most common directory iterator use case, <code>basic_directory_entry</code>
|
|
provides an automatic conversion to <code>basic_path</code>, and this also
|
|
serves to prevent breakage of a lot of existing code. See the
|
|
<a href="#More_efficient">next section</a> for discussion of rationale.</p>
|
|
<blockquote>
|
|
<pre>// the new basic_directory_entry provides:
|
|
operator const path_type &() const;</pre>
|
|
</blockquote>
|
|
<h2><a name="More_efficient">More efficient</a> operations when iterating over
|
|
directories</h2>
|
|
<p>Several common real-world operating systems (BSD derivatives, Linux, Windows)
|
|
provide status information during directory iteration. Caching of this status
|
|
information results in three to six times faster operation for typical predicate
|
|
operations. (For a directory containing 15,047 files, iteration in 1 second vs 6
|
|
seconds on a freshly booted system, and 0.3 seconds vs 0.9 seconds after prior use of
|
|
the directory.</p>
|
|
<p>The efficiency gains from caching such status information were considered too
|
|
significant to ignore. Because the possibility of race-conditions differs
|
|
depending on whether the cached information is used or an actual system call is
|
|
performed, it was considered necessary to provide explicit functions utilizing
|
|
the cached information, rather than implicitly using the cache behind the
|
|
scenes.</p>
|
|
<p>Three options were explored for exposing the cached status information, with
|
|
full implementations of each. After initial implementation of option 1 exposed
|
|
the problems noted below, option 2 was tested as a possible engineering
|
|
tradeoff. Option 3
|
|
was finally chosen as the cleanest design.</p>
|
|
<table border="1" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">
|
|
<tr>
|
|
<td width="8%" align="center"><b><i>Option</i></b></td>
|
|
<td width="25%" align="center"><i><b>How cache accessed</b></i></td>
|
|
<td width="94%" align="center"><i><b>Pros and Cons</b></i></td>
|
|
</tr>
|
|
<tr>
|
|
<td width="8%" valign="top" align="center"><i><b>1</b></i></td>
|
|
<td width="25%" valign="top">Predicate function overloads<br>
|
|
(basic_directory_iterator value_type is path)</td>
|
|
<td width="94%">
|
|
<ul>
|
|
<li>Very Questionable design (friendship abuse, overload abuse, etc)</li>
|
|
<li>User cannot reuse cache</li>
|
|
<li>Readability problem; easy to miss difference between f(*it) and f(it)</li>
|
|
<li>Write-ability problem (error prone?)</li>
|
|
<li>Most common iterator use is brief: *it</li>
|
|
<li>Preserves existing code</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td width="8%" valign="top" align="center"><b><i>2</i></b></td>
|
|
<td width="25%" valign="top">Predicate member functions of basic_directory_<span style="background-color: #FFFF00">iterator</span><br>
|
|
(basic_directory_iterator value_type is path)</td>
|
|
<td width="94%">
|
|
<ul>
|
|
<li>Somewhat cleaner design (although added iterator functions is unusual)</li>
|
|
<li>User cannot reuse cache</li>
|
|
<li>Readability and write-ability is OK: f(*it) and it.f() sufficiently
|
|
different</li>
|
|
<li>Most common iterator use is brief: *it</li>
|
|
<li>Preserves existing code</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td width="8%" valign="top" align="center"><b><i>3</i></b></td>
|
|
<td width="25%" valign="top">Predicate member functions of basic_directory_<span style="background-color: #FFFF00">entry</span><br>
|
|
(basic_directory_iterator value_type is basic_directory_entry)<br>
|
|
</td>
|
|
<td width="94%">
|
|
<ul>
|
|
<li>Cleanest design.</li>
|
|
<li>User can reuse cache.</li>
|
|
<li>Readability and write-ability is OK: f(*it) and it->f() sufficiently
|
|
different.</li>
|
|
<li>Most common iterator use is longer: it->path(), but by providing
|
|
"operator const basic_path &" it is still possible to write a bare *it.</li>
|
|
<li>Breaks some existing code. The "operator const basic_path &"
|
|
conversion eliminates breakage of the most common use case, while
|
|
providing a (deprecated) leaf() prevents breakage of the second most
|
|
common use case.</li>
|
|
</ul>
|
|
</td>
|
|
</tr>
|
|
</table>
|
|
<h2><a name="Rationale">Rationale</a></h2>
|
|
<h3>Elimination of the native versus generic <a name="distinction">distinction</a></h3>
|
|
<p>Elimination of user confusion and general design simplification was the
|
|
original motivation for elimination of the distinction between native and
|
|
generic paths.</p>
|
|
<p>During design work, a further technical argument was discovered. Consider the
|
|
path <code>"c:foo/bar"</code>. On many POSIX systems, <code>"c:foo"</code> is a
|
|
valid directory name, so we have a two element path and there is no issue of
|
|
native versus generic format. On Windows system, however, <code>"c:"</code> is a
|
|
drive specification, so we have a three element path. All calls to the operating
|
|
system will result in <code>"c:"</code> being considered a drive specification;
|
|
there is no way that fact-of-life can be changed by claiming the format is
|
|
generic. The native versus generic distinction is thus useless and misleading
|
|
for POSIX, Windows, and probably most other operating systems.</p>
|
|
<p>If paths for a particular operating system did require a distinction be made,
|
|
it could be done by requiring that native paths be prefixed with some unique
|
|
implementation-defined identification. For example, <code>"native-path:"</code>.
|
|
This would only be required for operating systems where (1) the distinction
|
|
mattered, and (2) there was no lexical way to distinguish the two forms. For
|
|
example, a native operating system that used the same syntax as the Filesystem
|
|
Library's generic POSIX-like format, but processed the elements right-to-left
|
|
instead of left-to-right.</p>
|
|
<h3>Preservation of <a name="existing-code">existing code</a></h3>
|
|
<p>Allowing existing user code to continue to work with the updated version of
|
|
the library has obvious benefits in terms of preserving the effort users have
|
|
applied to both learning the library and writing code which uses the library.</p>
|
|
<p>There is an additional motivation; other than the name checking portion of
|
|
class path, the existing interface has proven to be useful and robust, so
|
|
there is no reason to fiddle with it.</p>
|
|
<h3><a name="Single_path_design">Single path design</a></h3>
|
|
<p>During preliminary internationalization discussion on the Boost developer's
|
|
list, a design was considered for a single path class which could hold either
|
|
narrow or wide character based paths. That design was rejected because:</p>
|
|
<ul>
|
|
<li>The design was, for many applications, an over-generalization with runtime
|
|
memory and speed costs which would have to be paid for even when not needed.<br>
|
|
</li>
|
|
<li>There was concern that the design would be confusing to users, given that
|
|
the standard library already uses single-value-type strings, rather than
|
|
strings which morph value types as needed.<br>
|
|
</li>
|
|
<li>There were technical issues with conversions when a narrow path was
|
|
appended to a wide path, and visa versa. The concern was that double
|
|
conversions could cause incorrect results, that conversions best left to the
|
|
operating system would be performed, and that the technical complexity was too
|
|
great in relation to perceived benefits. User-defined types would only make
|
|
the problem worse.<br>
|
|
</li>
|
|
</ul>
|
|
<h3>No versions of <a href="reference.html#Status-functions">status()</a> which throw exceptions on
|
|
errors</h3>
|
|
<p>The rationale for not including versions of status()
|
|
which throw exceptions on errors is that (1) the primary purpose of this
|
|
function is to perform queries at a very low-level, where exceptions are usually
|
|
unwanted, and (2) exceptions on errors are already provided by the predicate
|
|
functions. There would be little or no efficiency gain from providing a throwing
|
|
version of status().</p>
|
|
<h3>Symlink identifying version of <a href="reference.html#Status-functions">status()</a> function</h3>
|
|
<p>A symlink identifying version of the status() function is distinguished by a
|
|
second argument. Often separately named functions are more appropriate than
|
|
overloading when behavior
|
|
differs, which is the case here, while overloads are more appropriate when
|
|
behavior is the same but argument types differ (Iain Hanson). Overloading was
|
|
chosen in this particular case because a subjective judgment that a single
|
|
function name with an optional "symlink" second argument produced more
|
|
understandable code. The original implementation of the function used the name "symlink_status",
|
|
but that just didn't read right in real code.</p>
|
|
<h3>POSIX wpath_traits defaults to locale(""), but allows imbuing of locale</h3>
|
|
<p>Vladimir Prus pointed out that for Linux (and presumably other POSIX
|
|
operating systems) that need to convert wide character paths to narrow
|
|
characters, the default conversion should not depend on the operating system
|
|
alone, but on the std::locale("") default. For example, the usual encoding
|
|
for Russian on Linux (and Russian web sites) is KOI8-R (RFC1489). The ability to safely specify a different locale
|
|
is also provided, to meet unforeseen needs.</p>
|
|
<hr>
|
|
<p>Revised
|
|
<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->18 March, 2008<!--webbot bot="Timestamp" endspan i-checksum="29005" --></p>
|
|
<p>© Copyright Beman Dawes, 2005</p>
|
|
<p>Distributed under the Boost Software License, Version 1.0.
|
|
(See accompanying file <a href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
|
|
copy at <a href="http://www.boost.org/LICENSE_1_0.txt">www.boost.org/LICENSE_1_0.txt</a>)</p>
|
|
|
|
</body>
|
|
|
|
</html> |