mirror of
https://github.com/boostorg/math.git
synced 2026-01-27 19:12:08 +00:00
390 lines
17 KiB
Plaintext
390 lines
17 KiB
Plaintext
[section:implementation Additional Implementation Notes]
|
|
|
|
The majority of the implementation notes are included with the documentation
|
|
of each function or distribution. The notes here are of a more general nature,
|
|
and reflect more the general implementation philosophy used.
|
|
|
|
[h4 Implemention philosophy]
|
|
|
|
"First be right, then be fast."
|
|
|
|
There will always be potential compromises
|
|
to be made between speed and accuracy.
|
|
It may be possible to find faster methods,
|
|
particularly for certain limited ranges of arguments,
|
|
but for most applications of math functions and distributions,
|
|
we judge that speed is rarely as important as accuracy.
|
|
|
|
So our priority is accuracy.
|
|
|
|
To permit evaluation of accuracy of the special functions,
|
|
production of extremely accurate tables of test values
|
|
has received considerable effort.
|
|
|
|
(It also required much CPU effort -
|
|
there was some danger of molten plastic dripping from the bottom of JM's laptop,
|
|
so instead, PAB's Dual-core desktop was kept 50% busy for *days*
|
|
calculating some tables of test values!)
|
|
|
|
For a specific RealType, say float or double,
|
|
it may be possible to find approximations for some functions
|
|
that are simpler and thus faster, but less accurate
|
|
(perhaps because there are no refining iterations,
|
|
for example, when calculating inverse functions).
|
|
|
|
If these prove accurate enough to be "fit for his purpose",
|
|
then a user may substitute his custom specialization.
|
|
|
|
For example, there are approximations dating back from times
|
|
when computation was a *lot* more expensive:
|
|
|
|
H Goldberg and H Levine, Approximate formulas for
|
|
percentage points and normalisation of t and chi squared,
|
|
Ann. Math. Stat., 17(4), 216 - 225 (Dec 1946).
|
|
|
|
A H Carter, Approximations to percentage points of the z-distribution,
|
|
Biometrika 34(2), 352 - 358 (Dec 1947).
|
|
|
|
These could still provide sufficient accuracy for some speed-critical applications.
|
|
|
|
[h4 Accuracy and Representation of Test Values]
|
|
|
|
In order to be accurate enough for as many as possible real types,
|
|
constant values are given to 50 decimal digits if available
|
|
(though many sources proved only accurate near to 64-bit double precision).
|
|
Values are specified as long double types by appending L,
|
|
unless they are exactly representable, for example integers, or binary fractions like 0.125.
|
|
This avoids the risk of loss of accuracy converting from double, the default type.
|
|
Values are used after static_cast<RealType>(1.2345L)
|
|
to provide the appropriate RealType for spot tests.
|
|
|
|
Functions that return constants values, like kurtosis for example, are written as
|
|
|
|
`static_cast<RealType>(-3) / 5;`
|
|
|
|
to provide the most accurate value
|
|
that the compiler can compute for the real type.
|
|
(The denominator is an integer and so will be promoted exactly).
|
|
|
|
So tests for one third, *not* exactly representable with radix two floating-point,
|
|
(should) use, for example:
|
|
|
|
`static_cast<RealType>(1) / 3;`
|
|
|
|
If a function is very sensitive to changes in input,
|
|
specifying an inexact value as input (such as 0.1) can throw
|
|
the result off by a noticeable amount: 0.1f is "wrong"
|
|
by ~1e-7 for example (because 0.1 has no exact binary representation).
|
|
That is why exact binary values - halves, quarters, and eighths etc -
|
|
are used in test code along with the occasional fraction `a/b` with `b`
|
|
a power of two (in order to ensure that the result is an exactly
|
|
representable binary value).
|
|
|
|
[h4 Tolerance of Tests]
|
|
|
|
The tolerances need to be set to the maximum of:
|
|
|
|
* Some epsilon value.
|
|
* The accuracy of the data (often only near 64-bit double).
|
|
|
|
Otherwise when long double has more digits than the test data, then no
|
|
amount of tweaking an epsilon based tolerance will work.
|
|
|
|
A common problem is when tolerances that are suitable for implementations
|
|
like Microsoft VS.NET where double and long double are the same size:
|
|
tests fail on other systems where long double is more accurate than double.
|
|
Check first that the suffix L is present, and then that the tolerance is big enough.
|
|
|
|
[h4 Handling Unsuitable Arguments]
|
|
|
|
In
|
|
[@http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1665.pdf Errors in Mathematical Special Functions], J. Marraffino & M. Paterno
|
|
it is proposed that signalling a domain error is mandatory
|
|
when the argument would give an mathematically undefined result.
|
|
|
|
*Guideline 1
|
|
|
|
[:A mathematical function is said to be defined at a point a = (a1, a2, . . .)
|
|
if the limits as x = (x1, x2, . . .) 'approaches a from all directions agree'.
|
|
The defined value may be any number, or +infinity, or -infinity.]
|
|
|
|
Put crudely, if the function goes to + infinity
|
|
and then emerges 'round-the-back' with - infinity,
|
|
it is NOT defined.
|
|
|
|
[:The library function which approximates a mathematical function shall signal a domain error
|
|
whenever evaluated with argument values for which the mathematical function is undefined.]
|
|
|
|
*Guideline 2
|
|
|
|
[:The library function which approximates a mathematical function
|
|
shall signal a domain error whenever evaluated with argument values
|
|
for which the mathematical function obtains a non-real value.]
|
|
|
|
This implementation is believed to follow these proposals and to assist compatibility with
|
|
['ISO/IEC 9899:1999 Programming languages - C]
|
|
and with the
|
|
[@www.open-std.org/jtc1/sc22/wg21/docs/papers/2005/n1836.pdf Draft Technical Report on C++ Library Extensions, 2005-06-24, section 5.2.1, paragraph 5].
|
|
[link math_toolkit.special.error_handling See also domain_error].
|
|
|
|
See __policy_ref for details of the error handling policies that should allow
|
|
a user to comply with any of these recommendations, as well as other behaviour.
|
|
|
|
See [link math_toolkit.special.error_handling error handling]
|
|
for a detailed explanation of the mechanism, and
|
|
[link math_toolkit.dist.stat_tut.weg.error_eg error_handling example]
|
|
and
|
|
[@/../../example/error_handling_example.cpp error_handling_example.cpp]
|
|
|
|
[caution If you enable throw but do NOT have try & catch block,
|
|
then the program will terminate with an uncaught exception and probably abort.
|
|
Therefore to get the benefit of helpful error messages, enabling *all* exceptions
|
|
*and* using try&catch is recommended for all applications.
|
|
However, for simplicity, this is not done for most examples.]
|
|
|
|
[h4 Handling of Functions that are Not Mathematically defined]
|
|
|
|
Functions that are not mathematically defined,
|
|
like the Cauchy mean, fail to compile by default .
|
|
[__math_undefined A policy] allows control of this.
|
|
|
|
If the policy is to permit undefined functions, then calling them
|
|
throws a domain error, by default. But the error policy can be set
|
|
to not throw, and to return NaN instead. For example,
|
|
|
|
`#define BOOST_MATH_DOMAIN_ERROR_POLICY ignore_error`
|
|
|
|
appears before the first Boost include,
|
|
then if the un-implemented function is called,
|
|
mean(cauchy<>()) will return std::numeric_limits<T>::quiet_NaN().
|
|
|
|
[warning If `std::numeric_limits<T>::has_quiet_NaN` is false
|
|
(for example T is a User-defined type),
|
|
then an exception will always be thrown when a domain error occurs.
|
|
Catching exceptions is therefore strongly recommended.]
|
|
|
|
[h4 Median of distributions]
|
|
|
|
There are many distributions for which we have been unable to find an analytic formula,
|
|
and this has deterred us from implementing
|
|
[@http://en.wikipedia.org/wiki/Median median functions], the mid-point in a list of values.
|
|
|
|
However a useful median approximation for distribution `dist` may be available from
|
|
|
|
`quantile(dist, 0.5)`.
|
|
|
|
[@http://www.amstat.org/publications/jse/v13n2/vonhippel.html Mean, Median, and Skew, Paul T von Hippel]
|
|
|
|
[@http://documents.wolfram.co.jp/teachersedition/MathematicaBook/24.5.html Descriptive Statistics,]
|
|
|
|
[@http://documents.wolfram.co.jp/v5/Add-onsLinks/StandardPackages/Statistics/DescriptiveStatistics.html and ]
|
|
|
|
[@http://documents.wolfram.com/v5/TheMathematicaBook/AdvancedMathematicsInMathematica/NumericalOperationsOnData/3.8.1.html
|
|
Mathematica Basic Statistics.] give more detail, in particular for discrete distributions.
|
|
|
|
|
|
[h4 Handling of Floating-Point Infinity]
|
|
|
|
Some functions and distributions are well defined with + or - infinity as
|
|
argument(s), but after some experiments with handling infinite arguments
|
|
as special cases, we concluded that it was more useful to forbid this,
|
|
and instead to return the result of __domain_error.
|
|
|
|
Handling infinity as special cases was additionally complicated
|
|
because, unlike built-in types on most - but not all - platforms,
|
|
not all User-Defined Types are
|
|
specialized to provide `std::numeric_limits<RealType>::infinity()`
|
|
and would return zero rather than any representation of infinity.
|
|
|
|
The rationale is that non-finiteness may happen because of error
|
|
or overflow in the users code, and it will be more helpful for this
|
|
to be diagnosed promptly rather than just continuing.
|
|
The code also became much more complicated, more error-prone,
|
|
much more work to test, and much less readable.
|
|
|
|
Overflow, underflow, denorm can be handled using __error_policy.
|
|
|
|
We have also tried to catch boundary cases where the mathematical specification
|
|
would result in divide by zero or overflow and signalling these similarly.
|
|
What happens at (and near), poles can be controlled through __error_policy.
|
|
|
|
[h4 Scale, Shape and Location]
|
|
|
|
We considered adding location and scale to the list of functions, for example:
|
|
|
|
template <class RealType>
|
|
inline RealType scale(const triangular_distribution<RealType>& dist)
|
|
{
|
|
RealType lower = dist.lower();
|
|
RealType mode = dist.mode();
|
|
RealType upper = dist.upper();
|
|
RealType result; // of checks.
|
|
if(false == detail::check_triangular(BOOST_CURRENT_FUNCTION, lower, mode, upper, &result))
|
|
{
|
|
return result;
|
|
}
|
|
return (upper - lower);
|
|
}
|
|
|
|
but found that these concepts are not defined (or their definition too contentious)
|
|
for too many distributions to be generally applicable.
|
|
Because they are non-member functions, they can be added if required.
|
|
|
|
[h4 Notes on Implementation of Specific Functions & Distributions]
|
|
|
|
* Default parameters for the Triangular Distribution.
|
|
We are uncertain about the best default parameters.
|
|
Some sources suggest that the Standard Triangular Distribution has
|
|
lower = 0, mode = half and upper = 1.
|
|
However as a approximation for the normal distribution,
|
|
the most common usage, lower = -1, mode = 0 and upper = 1 would be more suitable.
|
|
|
|
[h4 Rational Approximations Used]
|
|
|
|
Some of the special functions in this library are implemented via
|
|
rational approximations. These are either taken from the literature,
|
|
or devised by John Maddock using
|
|
[link math_toolkit.toolkit.internals2.minimax our Remez code].
|
|
|
|
Rational rather than Polynomial approximations are used to ensure
|
|
accuracy: polynomial approximations are often wonderful up to
|
|
a certain level of accuracy, but then fail to provide much greater
|
|
accuracy no matter how many more terms are added.
|
|
|
|
Our own approximations were devised either for added accuracy
|
|
(to support 128-bit long doubles for example), or because
|
|
literature methods were unavailable or under non-BSL
|
|
compatible license. Our Remez code is known to produce good
|
|
agreement with literature results in fairly simple "toy" cases,
|
|
for more complex cases. All approximations were checked
|
|
for convergence and to ensure that
|
|
they were not ill-conditioned (the coefficients can give a
|
|
theoretically good solution, but the resulting rational function
|
|
may be un-computable at fixed precision).
|
|
|
|
Recomputing using different
|
|
Remez implementations may well produce differing coefficients: the
|
|
problem is well known to be ill conditioned in general, and our Remez implementation
|
|
often found a broad and ill-defined minima for many of these approximations
|
|
(of course for simple "toy" examples like approximating `exp` the minima
|
|
is well defined, and the coeffiecents should agree no matter whose Remez
|
|
implementation is used). This should not in general effect the validity
|
|
of the approximations: there's good literature supporting the idea that
|
|
coefficients can be "in error" without necessarily adversely effecting
|
|
the result. Note that "in error" has a special meaning in this context,
|
|
see [@http://front.math.ucdavis.edu/0101.5042
|
|
"Approximate construction of rational approximations and the effect
|
|
of error autocorrection.", Grigori Litvinov, eprint arXiv:math/0101042].
|
|
Therefore the coefficients still need to be accurately calculated, even if they can
|
|
be in error compared to the "true" minimax solution.
|
|
|
|
[h4 Representation of Mathematical Constants]
|
|
|
|
A macro BOOST_DEFINE_MATH_CONSTANT in constants.hpp is used
|
|
to provide high accuracy constants to mathematical functions and distributions,
|
|
since it is important to provide values uniformly for both built-in
|
|
float, double and long double types,
|
|
and for User Defined types like NTL::quad_float and NTL::RR.
|
|
|
|
To permit calculations in this Math ToolKit and its tests, (and elsewhere)
|
|
at about 100 decimal digits with NTL::RR type,
|
|
it is obviously necessary to define constants to this accuracy.
|
|
|
|
However, some compilers do not accept decimal digits strings as long as this.
|
|
So the constant is split into two parts, with the 1st containing at least
|
|
long double precision, and the 2nd zero if not needed or known.
|
|
The 3rd part permits an exponent to be provided if necessary (use zero if none) -
|
|
the other two parameters may only contain decimal digits (and sign and decimal point),
|
|
and may NOT include an exponent like 1.234E99 (nor a trailing F or L).
|
|
The second digit string is only used if T is a User-Defined Type,
|
|
when the constant is converted to a long string literal and lexical_casted to type T.
|
|
(This is necessary because you can't use a numeric constant
|
|
since even a long double might not have enough digits).
|
|
|
|
For example, pi is defined:
|
|
|
|
BOOST_DEFINE_MATH_CONSTANT(pi,
|
|
3.141592653589793238462643383279502884197169399375105820974944,
|
|
5923078164062862089986280348253421170679821480865132823066470938446095505,
|
|
0)
|
|
|
|
And used thus:
|
|
|
|
using namespace boost::math::constants;
|
|
|
|
double diameter = 1.;
|
|
double radius = diameter * pi<double>();
|
|
|
|
or boost::math::constants::pi<NTL::RR>()
|
|
|
|
Note that it is necessary (if inconvenient) to specify the type explicitly.
|
|
|
|
So you cannot write
|
|
|
|
double p = boost::math::constants::pi<>(); // could not deduce template argument for 'T'
|
|
|
|
Neither can you write:
|
|
|
|
double p = boost::math::constants::pi; // Context does not allow for disambiguation of overloaded function
|
|
double p = boost::math::constants::pi(); // Context does not allow for disambiguation of overloaded function
|
|
|
|
[h4 Thread safety]
|
|
|
|
Reporting of error by setting errno should be thread safe already
|
|
(otherwise none of the std lib math functions would be thread safe?).
|
|
If you turn on reporting of errors via exceptions, errno gets left unused anyway.
|
|
|
|
Other than that, the code is intended to be thread safe *for built in
|
|
real-number types* : so float, double and long double are all thread safe.
|
|
|
|
For non-built-in types - NTL::RR for example - initialisation of the various
|
|
constants used in the implementation is potentially *not* thread safe.
|
|
This most undesiable, but it would be a signficant challenge to fix it.
|
|
Some compilers may offer the option of having
|
|
static-constants initialised in a thread safe manner (Commeau, and maybe
|
|
others?), if that's the case then the problem is solved. This is a topic of
|
|
hot debate for the next C++ std revision, so hopefully all compilers
|
|
will be required to do the right thing here at some point.
|
|
|
|
[h4 Sources of Test Data]
|
|
|
|
We found a large number of sources of test data.
|
|
We have assumed that these are /"known good"/
|
|
if they agree with the results from our test
|
|
and only consulted other sources for their /'vote'/
|
|
in the case of serious disagreement.
|
|
The accuracy, actual and claimed, vary very widely.
|
|
Only [@http://functions.wolfram.com/ Wolfram Mathematica functions]
|
|
provided a higher accuracy than
|
|
C++ double (64-bit floating-point) and was regarded as
|
|
the most-trusted source by far.
|
|
|
|
A useful index of sources is:
|
|
[@http://www.sal.hut.fi/Teaching/Resources/ProbStat/table.html
|
|
Web-oriented Teaching Resources in Probability and Statistics]
|
|
|
|
[@http://espse.ed.psu.edu/edpsych/faculty/rhale/hale/507Mat/statlets/free/pdist.htm Statlet]:
|
|
Is a Javascript application that calculates and plots probability distributions,
|
|
and provides the most complete range of distributions:
|
|
|
|
[:Bernoulli, Binomial, discrete uniform, geometric, hypergeometric,
|
|
negative binomial, Poisson, beta, Cauchy-Lorentz, chi-sequared, Erlang,
|
|
exponential, extreme value, Fisher, gamma, Laplace, logistic,
|
|
lognormal, normal, Parteo, Student's t, triangular, uniform, and Weibull.]
|
|
|
|
It calculates pdf, cdf, survivor, log survivor, hazard, tail areas,
|
|
& critical values for 5 tail values.
|
|
|
|
It is also the only independent source found for the Weibull distribution;
|
|
unfortunately it appears to suffer from very poor accuracy in areas where
|
|
the underlying special function is known to be difficult to implement.
|
|
|
|
[endsect][/section:implementation Implementation Notes]
|
|
|
|
[/
|
|
Copyright 2006 John Maddock and Paul A. Bristow.
|
|
Distributed under the Boost Software License, Version 1.0.
|
|
(See accompanying file LICENSE_1_0.txt or copy at
|
|
http://www.boost.org/LICENSE_1_0.txt).
|
|
]
|