mirror of
https://github.com/boostorg/numeric_conversion.git
synced 2026-01-23 17:52:12 +00:00
471 lines
26 KiB
HTML
471 lines
26 KiB
HTML
<HTML>
|
||
<HEAD>
|
||
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
|
||
<LINK REL="stylesheet" TYPE="text/css" HREF="../../../../boost.css">
|
||
<TITLE>Boost Numeric Conversion Library - Definitions</TITLE>
|
||
</HEAD>
|
||
<BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000ff" VLINK="#800080">
|
||
<TABLE BORDER="0" CELLPADDING="7" CELLSPACING="0" WIDTH="100%"
|
||
SUMMARY="header">
|
||
<TR>
|
||
<TH VALIGN="top" WIDTH="300">
|
||
<H3><A HREF="../../../../index.htm"><IMG HEIGHT="86" WIDTH="277"
|
||
ALT="C++ Boost" SRC="../../../../boost.png" BORDER="0"></A></H3> </TH>
|
||
<TH VALIGN="top">
|
||
<H1 ALIGN="center">Boost Numeric Conversion Library</H1>
|
||
<H1 ALIGN="center">Definitions</H1>
|
||
</TH>
|
||
</TR>
|
||
</TABLE>
|
||
<HR>
|
||
<H2>Contents</H2>
|
||
<DL CLASS="page-index">
|
||
<dt><A HREF="#intro">Introduction</A></dt>
|
||
<dt><A HREF="#typeval">Types and Values</A></dt>
|
||
<dt><A HREF="#stdtypes">C++ Arithmetic Types</A></dt>
|
||
<dt><A HREF="#numtypes">Numeric Types</A></dt>
|
||
<dt><A HREF="#range">Range and Precision</A></dt>
|
||
<dt><A HREF="#roundoff">Exact, Correctly Rounded and Out-Of-Range Representations</A></dt>
|
||
<dt><A HREF="#stdconv">Standard (numeric) Conversions</A></dt>
|
||
<dt><A HREF="#subranged">Subranged Conversion Direction, Subtype and Supertype</A></dt>
|
||
</DL>
|
||
|
||
|
||
|
||
<h2><A NAME="intro">Introduction</A></h2>
|
||
<P>This section provides definitions of terms used in the Numeric Conversion library.</p>
|
||
<p><b>Notation:</b>
|
||
<li><u>underlined text</u> denotes terms defined in the C++ standard.</li>
|
||
<li><b>bold face</b> denotes terms defined here but not in the standard.</li>
|
||
<p></p>
|
||
|
||
|
||
|
||
<hr>
|
||
<h2><A NAME="typeval">Types and Values</A></h2>
|
||
<p>As defined by the <u>C++ Object Model</u> (§1.7) the <u>storage</u> or
|
||
memory on which a C++ program runs is a contiguous sequence of <u>bytes</u>
|
||
where each byte is a contiguous sequence of <u>bits</u>.<br>
|
||
An <u>object</u> is a region of storage (§1.8) and has a type (§3.9).<br>
|
||
A <u>type</u> is a discrete set of values. <br>
|
||
An object of type T has an <u>object representation</u> which is the sequence
|
||
of bytes stored in the object (§3.9/4)<br>
|
||
An object of type T has a <u>value representation</u> which is the set of bits
|
||
that determine the <i>value</i> of an object of that type (§3.9/4). For
|
||
<u>POD</u> types (§3.9/10), this bitset is given by the object representation,
|
||
but not all the bits in the storage need to participate in the value representation
|
||
(except for character types): for example, some bits might be used for padding
|
||
or there may be trap-bits.</p>
|
||
<p>The <b>typed value</b> that is held by an object is
|
||
the value which is determined by its value representation.<br>
|
||
An <b>abstract value</b> (untyped) is
|
||
the conceptual information that is represented in a type
|
||
(i.e. the number π).<br>
|
||
The <b>intrinsic value</b> of an object is
|
||
the binary value of the sequence of unsigned characters which form its object representation.</p>
|
||
<p><i>Abstract values</i> can be <b>represented</b> in a given type.<br>
|
||
To <b>represent</b> an abstract value 'V' in a type 'T'
|
||
is to obtain a typed value 'v' which <i>corresponds</i> to the abstract value 'V'.<br>
|
||
The operation is denoted using the 'rep()' operator, as in: <code>v=rep(V)</code>.<br>
|
||
'v' is the <b>representation</b> of 'V' in the type 'T'.<br>
|
||
For example, the abstract value π can be represented in the type <code>'double'</code> as the
|
||
'double value M_PI' and in the type <code>'int'</code> as the 'int value 3'</p>
|
||
<p>Conversely, <i>typed values</i> can be <b>abstracted</b>.<br>
|
||
To <b>abstract</b> a typed value 'v' of type 'T' is to obtain the
|
||
abstract value 'V' whose representation in 'T' is 'v'.<br>
|
||
The operation is denoted using the 'abt()' operator, as in: <code>V=abt(v)</code>.<br>
|
||
'V' is the <b>abstraction</b> of 'v' of type 'T'.<br>
|
||
Abstraction is just an abstract operation (you can't do it); but it is defined nevertheless
|
||
because it will be used to give the definitions in the rest of this document.</p>
|
||
|
||
|
||
|
||
|
||
<hr>
|
||
<h2><A NAME="stdtypes">C++ Arithmetic Types</A></h2>
|
||
<P>The C++ language defines <u>fundamental types</u> (§3.9.1). The following
|
||
subsets of the fundamental types are intended to represent <i>numbers</i>:</p>
|
||
<li><u>signed integer types</u> (§3.9.1/2):<br>
|
||
<blockquote>
|
||
<code>{signed char, signed short int, signed int, signed long int}</code><br>
|
||
Can be used to represent general integer numbers (both negative and positive).
|
||
</blockquote>
|
||
</li>
|
||
<li><u>unsigned integer types</u> (§3.9.1/3):<br>
|
||
<blockquote>
|
||
<code>{unsigned char, unsigned short int, unsigned int, unsigned long int}</code><br>
|
||
Can be used to represent positive integer numbers <u>with modulo-arithmetic</u>.<br>
|
||
</blockquote>
|
||
<li><u>floating-point types</u> (§3.9.1/8):<br>
|
||
<blockquote>
|
||
<code>{float,double,long double}</code><br>
|
||
Can be used to represent real numbers.
|
||
</blockquote>
|
||
</li>
|
||
<li><u>integral or integer types</u> (§3.9.1/7):<br>
|
||
<blockquote>
|
||
<code>{{signed integers},{unsigned integers}, bool, char and wchar_t}</code>
|
||
</blockquote>
|
||
</li>
|
||
<li><u>arithmetic types</u> (§3.9.1/8):<br>
|
||
<blockquote>
|
||
<code>{{integer types},{floating types}}</code>
|
||
</blockquote>
|
||
</li>
|
||
<P>The integer types are required to have a <i>binary</i> value representation.<br>
|
||
Additionally, the signed/unsigned integer types of the same base type (short, int or long)
|
||
are required to have the same value representation, that is:</P>
|
||
<pre> int i = -3 ; // suppose value representation is: 10011 (sign bit + 4 magnitude bits)
|
||
unsigned int u = i ; // u is required to have the same 10011 as its value representation.
|
||
</pre>
|
||
<P>In other words, the integer types signed/unsigned X use the same value representation
|
||
but a different <i>interpretation</i> of it; that is, their <i>typed values</i>
|
||
might differ.<br>
|
||
Another consequence of this is that the range for signed X is always a smaller subset
|
||
of the range of unsigned X, as required by §3.9.1/3.</P>
|
||
<P>Note: always remember that unsigned types, unlike signed types, have modulo-arithmetic;
|
||
that is, they do not overflow.<br>
|
||
This means that:
|
||
<li> Always be extra careful when mixing signed/unsigned types</li>
|
||
<li> Use unsigned types only when you need modulo arithmetic or very very large numbers.
|
||
Don't use unsigned types just because you intend to deal with positive values only
|
||
(you can do this with signed types as well).</li>.
|
||
<p></P>
|
||
|
||
|
||
|
||
|
||
|
||
<hr>
|
||
<h2><A NAME="numtypes">Numeric Types</A></h2>
|
||
<p>This section introduces the following definitions intended to integrate arithmetic
|
||
types with user-defined types which behave like numbers. Some definitions are
|
||
purposely broad in order to include a vast variety of user-defined number
|
||
types.</p>
|
||
<p>Within this library, the term <i>number</i> refers to an abstract numeric value.</p>
|
||
<p>A type is <b>numeric</b> if:</p>
|
||
<li>It is an arithmetic type, or,</li>
|
||
<li>It is a user-defined type which</li>
|
||
<blockquote>
|
||
<li>Represents numeric abstract values (i.e. numbers).</li>
|
||
|
||
<li>Can be converted (either implicitly or explicitly) to/from at least one
|
||
arithmetic type.</li>
|
||
<li>Has <a href="#range">range</a> (possibly unbounded) and <a href="#range">precision</a>
|
||
(possibly dynamic or unlimited).</li>
|
||
<li>Provides an specialization of <code>std::numeric_limits</code>.</li>
|
||
</blockquote>
|
||
<p></p>
|
||
<p>A numeric type is <b>signed</b> if the abstract values it represent include negative numbers.<br>
|
||
A numeric type is <b>unsigned</b> if the abstract values it represent exclude negative numbers.<br>
|
||
A numeric type is <b>modulo</b> if it has modulo-arithmetic (does not overflow).<br>
|
||
A numeric type is <b>integer</b> if the abstract values it represent are whole numbers.<br>
|
||
A numeric type is <b>floating</b> if the abstract values it represent are real numbers.<br>
|
||
An <b>arithmetic value</b> is the typed value of an arithmetic type<br>
|
||
A <b>numeric value</b> is the typed value of a numeric type</p>
|
||
<p></p>
|
||
<p>These definitions simply generalize the standard notions of arithmetic types
|
||
and values by introducing a superset called <u>numeric</u>. All arithmetic types
|
||
and values are numeric types and values, but not vice versa, since user-defined
|
||
numeric types are not arithmetic types.</p>
|
||
<p>The following examples clarify the differences between arithmetic and numeric types (and values):</p>
|
||
<pre>// A numeric type which is not an arithmetic type (is user-defined)
|
||
// and which is intended to represent integer numbers (i.e., an 'integer' numeric type)
|
||
class MyInt
|
||
{
|
||
MyInt ( long long v ) ;
|
||
long long to_builtin();
|
||
} ;
|
||
namespace std {
|
||
template<> numeric_limits<MyInt> { ... } ;
|
||
}
|
||
|
||
// A 'floating' numeric type (double) which is also an arithmetic type (built-in),
|
||
// with a float numeric value.
|
||
double pi = M_PI ;
|
||
|
||
// A 'floating' numeric type with a whole numeric value.
|
||
// NOTE: numeric values are typed valued, hence, they are, for instance,
|
||
// integer or floating, despite the value itself being whole or including
|
||
// a fractional part.
|
||
double two = 2.0 ;
|
||
|
||
// An integer numeric type with an integer numeric value.
|
||
MyInt i(1234);
|
||
</pre>
|
||
|
||
|
||
|
||
|
||
<hr>
|
||
<h2><A NAME="range">Range and Precision</A></h2>
|
||
<p>Given a number set 'N', some of its elements are representable in a numeric type 'T'.<br>
|
||
The set of representable values of type 'T', or numeric set of 'T', is a set of numeric values
|
||
whose elements are the representation of some <i>subset</i> of 'N'.<br>
|
||
For example, the interval of 'int' values [INT_MIN,INT_MAX] is the set of representable values
|
||
of type 'int', i.e. the 'int' numeric set, and corresponds to the representation of the elements
|
||
of the interval of abstract values [abt(INT_MIN),abt(INT_MAX)] from the integer numbers.<br>
|
||
Similarly, the interval of 'double' values [-DBL_MAX,DBL_MAX] is the 'double' numeric set,
|
||
which corresponds to the subset of the real numbers from abt(-DBL_MAX) to abt(DBL_MAX).
|
||
</p>
|
||
<p>Let <b>next(x)</b> denote the lowest numeric value greater than x.<br>
|
||
Let <b>prev(x)</b> denote the highest numeric value lower then x.</p>
|
||
<p>Let <code><b>v=prev(next(V))</b></code> and <code><b>v=next(prev(V))</b></code> be identities that relate a numeric
|
||
typed value 'v' with a number 'V'.</p>
|
||
<p>An ordered pair of numeric values <i>x,y</i> s.t. <i>x<y</i> are <b>consecutive</b> iff
|
||
<code>next(x)==y</code>.</p>
|
||
<p>The abstract distance between consecutive numeric values is usually referred
|
||
to as a <u>Unit in the Last Place</u>, or <b>ulp</b> for short. A ulp is a quantity whose abstract
|
||
magnitude is <i>relative</i> to the numeric values it corresponds to: If the numeric set is not evenly
|
||
distributed, that is, if the abstract distance between consecutive numeric values varies along the set
|
||
-as is the case with the floating-point types-, the magnitude of 1ulp after the numeric value x
|
||
might be (usually is) different from the magnitude of a 1ulp after the numeric value y for x!=y.</p>
|
||
<p>Since numbers are inherently ordered, a <b>numeric set</b> of type 'T'
|
||
is an ordered sequence of numeric values (of type 'T') of the form:
|
||
</p>
|
||
<p><code>REP(T)={l,next(l),next(next(l)),...,prev(prev(h)),prev(h),h}</code>
|
||
</p>
|
||
<p>where 'l' and 'h' are respectively the lowest and highest values of type 'T', called the
|
||
<b>boundary values</b> of type T.</p>
|
||
<p>A numeric set is discrete. It has a <b>size</b> which is the number
|
||
of numeric values in the set, a <b>width</b> which is the abstract difference between
|
||
the highest and lowest boundary values: [abt(h)-abt(l)], and a <b>density</b>
|
||
which is the relation between its size and width: 'density=size/width'.<br>
|
||
The integer types have density 1, which means that there are no unrepresentable integer numbers
|
||
between abt(l) and abt(h) (i.e. there are no gaps). On the other hand,
|
||
floating types have density much smaller than 1, which means that there are
|
||
real numbers unrepresented between consecutive floating values (i.e. there are gaps).
|
||
</p>
|
||
<p>The interval of <u>abstract values</u> [abt(l),abt(h)] is the <b>range</b> of the type 'T',
|
||
denoted 'R(T)'.<br>
|
||
A range is a set of abstract values and not a set of numeric values. In other
|
||
documents, such as the C++ standard, the word 'range' is <i>sometimes</i> used
|
||
as synonym for 'numeric set', that is, as the ordered sequence of numeric values
|
||
from 'l' to 'h'. In this document, however, a range is an abstract interval
|
||
which subtends the numeric set.<br>
|
||
For example, the sequence [-DBL_MAX,DBL_MAX] is the numeric set of the type 'double', and
|
||
the real interval [abt(-DBL_MAX),abt(DBL_MAX)] is its range.<br>
|
||
Notice, for instance, that the range of a floating-point type is <i>continuous</i> unlike
|
||
its numeric set.<br>
|
||
This definition was chosen because:
|
||
<li>(a) The discrete set of numeric values is already given by the numeric set.</li>
|
||
<li>(b) Abstract intervals are easier to compare and overlap since only boundary values
|
||
need to be considered.</li><br>
|
||
This definition allows for a concise definition of 'subranged' as given in the last section.<br>
|
||
The width of a numeric set, as defined, is exactly equivalent to the width of a range.
|
||
<p></p>
|
||
<p>The <b>precision</b> of a type is given by the width or density of the numeric set.<br>
|
||
For integer types, which have density 1, the precision is conceptually equivalent to the range
|
||
and is determined by the number of bits used in the value representation: The higher the
|
||
number of bits the bigger the size of the numeric set, the wider the range, and the higher
|
||
the precision.<br>
|
||
For floating types, which have density <<1, the precision is given not by the
|
||
width of the range but by the density. In a typical implementation,
|
||
the range is determined by the number of bits used in the exponent, and the precision by
|
||
the number of bits used in the mantissa (giving the maximum number of significant digits
|
||
that can be exactly represented). The higher the number of exponent bits the
|
||
wider the range, while the higher the number of mantissa bits, the higher the precision.
|
||
</p>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<hr>
|
||
<h2><A NAME="roundoff">Exact, Correctly Rounded and Out-Of-Range Representations</A></h2>
|
||
<p>Given an abstract value 'V' and a type 'T' with its corresponding range [abt(l),abt(h)]:</p>
|
||
<p>If <code>V < abt(l)</code> or <code>V > abt(h)</code>, 'V' is <b>not representable</b>
|
||
(cannot be represented) in the type T, or, equivalently, it's representation in the type 'T'
|
||
is <b>out of range</b>, or <b>overflows</b>.<br>
|
||
If <code>V < abt(l)</code>, the <b>overflow is negative</b>.<br>
|
||
If <code>V > abt(h)</code>, the <b>overflow is positive</b>.
|
||
</p>
|
||
<p>If <code>V ≥ abt(l)</code> and <code>V ≤ abt(h)</code>,'V' is <b>representable</b>
|
||
(can be represented) in the type T, or, equivalently, its representation in the type 'T'
|
||
is in <b>in range</b>, or <b>does not overflow</b>.</p>
|
||
<p>Notice that a numeric type, such as a C++ unsigned type, can define that any 'V' does not
|
||
overflow by always representing not 'V' itself but the abstract value <code>U = [ V % (abt(h)+1) ]</code>,
|
||
which is always in range.</p>
|
||
<p>Given an abstract value 'V' represented in the type 'T' as 'v', the <b>roundoff</b> error
|
||
of the representation is the abstract difference: (abt(v)-V).<br>
|
||
Notice that a representation is an <i>operation</i>, hence, the roundoff error corresponds to
|
||
the representation operation and not to the numeric value itself (i.e. numeric values do not
|
||
have any error themselves)<br>
|
||
If the roundoff is 0, the representation is <b>exact</b>, and 'V' is <b>exactly representable</b>
|
||
in the type T.<br>
|
||
If the roundoff is not 0, the representation is <b>inexact</b>, and 'V' is <b>inexactly representable</b>
|
||
in the type T.</p>
|
||
<p>Given an abstract value 'V' representable in a type 'T', there are always two consecutive
|
||
numeric values of type 'T', 'prev' and 'next', such that <code>abt(prev) ≤ V ≤ abt(next)</code>.
|
||
These are called the <b>adjacents</b> of 'V' in the type 'T'.<br>
|
||
If a representation 'v' in a type 'T' -either exact or inexact-, is any of the adjacents of 'V'
|
||
in that type, that is, if <code>v==prev or v==next</code>, the representation is
|
||
<b>faithfully rounded</b>. If the choice between 'prev' and 'next'
|
||
matches a given <b>rounding direction</b>, it is <b>correctly rounded</b>.<br>
|
||
All exact representations are correctly rounded, but not all inexact representations are. In particular,
|
||
C++ requires numeric conversions (described below) and the result of arithmetic operations
|
||
(not covered by this document) to be correctly rounded, but batch operations propagate roundoff, thus
|
||
final results are usually incorrectly rounded, that is, the numeric value 'r' which is the computed
|
||
result is neither of the adjacents of the abstract value 'R' which is the theoretical result.<br>
|
||
Because a correctly rounded representation is always one of adjacents of the abstract value being
|
||
represented, the roundoff is guaranteed to be at most 1ulp.</p>
|
||
<P>The following examples summarize the given definitions. Consider:</p>
|
||
<li>A numeric type 'Int' representing integer numbers with a <i>numeric set</i>: {-2,-1,0,1,2}
|
||
and <i>range</i>: [-2,2]</li>.
|
||
<li>A numeric type 'Cardinal' representing integer numbers with a <i>numeric set</i>:
|
||
{0,1,2,3,4,5,6,7,8,9} and <i>range</i>: [0,9] (no modulo-arithmetic here)</li>.
|
||
<li>A numeric type 'Real' representing real numbers with a <i>numeric set</i>:
|
||
{-2.0,-1.5,-1.0,-0.5,-0.0,+0.0,+0.5,+1.0,+1.5,+2.0} and <i>range</i>: [-2.0,+2.0]</li>
|
||
<li>A numeric type 'Whole' representing real numbers with a <i>numeric set</i>:
|
||
{-2.0,-1.0,0.0,+1.0,+2.0} and <i>range</i>: [-2.0,+2.0]</li>
|
||
<p>First, notice that the types 'Real' and 'Whole' both represent real numbers, have the
|
||
same range, but different precision.</p>
|
||
<p>The integer number 1 (an abstract value) can be exactly represented in any of these types.<br>
|
||
The integer number -1 can be exactly represented in 'Int', 'Real' and 'Whole', but cannot
|
||
be represented in 'Cardinal', yielding negative overflow.<br>
|
||
The real number 1.5 can be exactly represented in 'Real', and inexactly represented in the
|
||
other types.<br>
|
||
If 1.5 is represented as either 1 or 2 in any of the types (except Real), the
|
||
representation is correctly rounded.<br>
|
||
If 0.5 is represented as +1.5 in the type 'Real', it is incorrectly rounded.<br>
|
||
(-2.0,-1.5) are the 'Real' adjacents of any real number in the interval [-2.0,-1.5],
|
||
yet there are no 'Real' adjacents for x < -2.0, nor for x > +2.0.
|
||
</p>
|
||
|
||
|
||
|
||
|
||
<hr>
|
||
<h2><A NAME="stdconv">Standard (numeric) Conversions</A></h2>
|
||
<P>The C++ language defines <u>Standard Conversions</u> (§4) some of which are
|
||
conversions between arithmetic types.<br>
|
||
These are <u>Integral promotions</u> (§4.5), <u>Integral conversions</u> (§4.7),
|
||
<u>Floating point promotions</u> (§4.6), <u>Floating point conversions</u> (§4.8)
|
||
and <u>Floating-integral conversions</u> (§4.9).<br>
|
||
In the sequel, integral and floating point promotions are called <b>arithmetic promotions</b>,
|
||
and these plus integral, floating-point and floating-integral conversions are called
|
||
<b>arithmetic conversions</b> (i.e, promotions are conversions).
|
||
</P>
|
||
<P>Promotions, both Integral and Floating point, are <i>value-preserving</i>, which means
|
||
that the typed value is not changed with the conversion.</p>
|
||
<p>In the sequel, consider a source typed value 's' of type 'S', the source abstract value 'N=abt(s)',
|
||
a destination type 'T'; and whenever possible, a result typed value 't' of type 'T'.</p>
|
||
<p>Integer to integer conversions are always defined:<br>
|
||
If 'T' is unsigned, the abstract value which is effectively represented is not 'N' but
|
||
'M=[ N % ( abt(h) + 1 ) ]', where 'h' is the highest unsigned typed value of type 'T'.<br>
|
||
If 'T' is signed and 'N' is not directly representable, the result 't' is
|
||
<u>implementation-defined</u>, which means that the C++ implementation is required to produce
|
||
a value 't' even if it is totally unrelated to 's'.</p>
|
||
<p>Floating to Floating conversions are defined only if 'N' is representable;
|
||
if it is not, the conversion has <u>undefined behavior.</u><br>
|
||
If 'N' is exactly representable, 't' is required to be the exact representation.<br>
|
||
If 'N' is inexactly representable, 't' is required to be one of the two adjacents, with
|
||
an implementation-defined choice of rounding direction; that is, the conversion is required
|
||
to be correctly rounded.</p>
|
||
<p>Floating to Integer conversions represent not 'N' but 'M=trunc(N)', were trunc() is to truncate: i.e.
|
||
to remove the fractional part, if any.<br>
|
||
If 'M' is not representable in 'T', the conversion has <u>undefined behavior</u>
|
||
(unless 'T' is bool, see §4.12).</p>
|
||
<p>Integer to Floating conversions are always defined.<br>
|
||
If 'N' is exactly representable, 't' is required to be the exact representation.<br>
|
||
If 'N' is inexactly representable, 't' is required to be one of the two adjacents, with
|
||
an implementation-defined choice of rounding direction; that is, the conversion is required
|
||
to be correctly rounded.</p>
|
||
|
||
|
||
|
||
|
||
|
||
|
||
<hr>
|
||
<h2><A NAME="subranged">Subranged Conversion Direction, Subtype and Supertype</A></h2>
|
||
<P>Given a source type 'S' and a destination type 'T', there is a <b>conversion direction</b>
|
||
denoted: <code>'S->T'</code>.<br>
|
||
For any two ranges the following <i>range relation</i> can be defined: A range
|
||
'X' can be <i>entirely contained</i> in a range 'Y', in which case it is said that
|
||
'X' is enclosed by 'Y'.<br>
|
||
Formally: R(S) is <b>enclosed</b> by R(T) iif (R(S) intersection R(T)) == R(S).</P>
|
||
<P>If the source type range, R(S), is <i>not enclosed</i> in the target type range, R(T);
|
||
that is, if (R(S) & R(T)) != R(S), the conversion direction is said to be <b>subranged</b>,
|
||
which means that R(S) is not entirely contained in R(T) and therefore there is
|
||
some portion of the source range which falls outside the target range. In other words,
|
||
if a conversion direction S->T is subranged, there are values in S which cannot be represented
|
||
in T because they are out of range.<br>
|
||
Notice that for S->T, the adjective subranged applies to 'T'.</p>
|
||
<p>Examples:<br>
|
||
Given the following numeric types all representing real numbers:<br>
|
||
<br>
|
||
X with numeric set {-2.0,-1.0,0.0,+1.0,+2.0} and range [-2.0,+2.0]<br>
|
||
Y with numeric set {-2.0,-1.5,-1.0,-0.5,0.0,+0.5,+1.0,+1.5,+2.0} and range [-2.0,+2.0]<br>
|
||
Z with numeric set {-1.0,0.0,+1.0} and range [-1.0,+1.0]<br>
|
||
<br>
|
||
For:<br>
|
||
<br>
|
||
(a) X->Y:
|
||
<blockquote>
|
||
R(X) & R(Y) == R(X), then X->Y is not subranged.
|
||
Thus, all values of type X are representable in the type Y.
|
||
</blockquote>
|
||
(b) Y->X:
|
||
<blockquote>
|
||
R(Y) & R(X) == R(Y), then Y->X is not subranged.
|
||
Thus, all values of type Y are representable in the type X, but in this case, some values
|
||
are <i>inexactly</i> representable (all the halves).<br>
|
||
(note: it is to permit this case that a range is an interval of abstract values
|
||
and not an interval of typed values)
|
||
</blockquote>
|
||
(b) X->Z:
|
||
<blockquote>
|
||
R(X) & R(Z) != R(X), then X->Z is subranged.
|
||
Thus, some values of type X are not representable in the type Z, they fall out of range
|
||
(-2.0 and +2.0)
|
||
</blockquote>
|
||
<p></p>
|
||
<p>It is possible that R(S) is not enclosed by R(T), while neither is R(T) enclosed
|
||
by R(S); for example, UNSIG=[0,255] is not enclosed by SIG=[-128,127]; neither is SIG
|
||
enclosed by UNSIG.<br>
|
||
This implies that is possible that a conversion direction is subranged both
|
||
ways. This occurs when a mixture of signed/unsigned types are involved and indicates
|
||
that in both directions there are values which can fall out of range.</P>
|
||
<P>Given the range relation (subranged or not) of a conversion direction S->T,
|
||
it is possible to classify 'S' and 'T' as <b>supertype</b> and <b>subtype</b>:<br>
|
||
If the conversion is subranged, which means that 'T' cannot represent all possible values of type 'S',
|
||
'S' is the supertype and 'T' the subtype; otherwise, 'T' is the supertype and 'S' the subtype.<br>
|
||
<br>
|
||
For example:<br>
|
||
R(float)=[-FLT_MAX,FLT_MAX] and R(double)=[-DBL_MAX,DBL_MAX].<br>
|
||
If FLT_MAX < DBL_MAX:<br>
|
||
'double->float' is subranged and supertype=double, subtype=float.<br>
|
||
'float->double' is not subranged and supertype=double, subtype=float.<br>
|
||
Notice that while 'double->float' is subranged, 'float->double' is not,
|
||
which yields the same supertype,subtype for both directions.<br>
|
||
<br>
|
||
Now consider:<br>
|
||
R(int)=[INT_MIN,INT_MAX] and R(unsigned int)=[0,UINT_MAX].<br>
|
||
A C++ implementation is required to have UINT_MAX > INT_MAX (§3.9/3), so:<br>
|
||
'int->unsigned' is subranged (negative values fall out of range) and supertype=int, subtype=unsigned.<br>
|
||
'unsigned->int' is <em>also</em> subranged (high positive values fall out of range)
|
||
and supertype=unsigned, subtype=int.<br>
|
||
In this case, the conversion is subranged in both directions and the supertype,subtype pairs
|
||
are not invariant (under inversion of direction). This indicates that none of the types can
|
||
represent all the values of the other.</p>
|
||
<p>When the supertype is the same for both 'S->T' and 'T->S', it is effectively indicating
|
||
a type which can represent all the values of the subtype.<br>
|
||
Consequently, if a conversion X->Y is not subranged, but the opposite (Y->X)
|
||
is, so that the supertype is always 'Y', it is said that the direction X->Y
|
||
is <b>correctly rounded value preserving</b>, meaning that all such conversions
|
||
are guaranteed to produce results in range and correctly rounded (even if inexact).<br>
|
||
For example, all integer to floating conversions are correctly rounded value preserving.
|
||
</p>
|
||
<HR>
|
||
<P>Back to <A HREF="index.html">Numeric Conversion library index</A></P>
|
||
<HR>
|
||
<P>Revised 23 June 2004</P>
|
||
<p><EFBFBD> Copyright Fernando Luis Cacciola Carballal, 2004</p>
|
||
<p> Use, modification, and distribution are subject to the Boost Software
|
||
License, Version 1.0. (See accompanying file <a href="../../../../LICENSE_1_0.txt">
|
||
LICENSE_1_0.txt</a> or copy at <a href="http://www.boost.org/LICENSE_1_0.txt">
|
||
www.boost.org/LICENSE_1_0.txt</a>)</p>
|
||
</body>
|
||
</HTML> |