2
0
mirror of https://github.com/boostorg/math.git synced 2026-01-26 18:52:10 +00:00
Files
math/doc/dist_reference.qbk
2006-09-21 14:24:57 +00:00

787 lines
25 KiB
Plaintext

[section:dist_ref Statistical Distributions and Functions Reference]
[def __cdf [link math.dist.cdf Cumulative Distribution Function]]
[def __pdf [link math.dist.pdf Probability Density Function]]
[def __ccdf [link math.dist.ccdf Complement of the Cumulative Distribution Function]]
[def __quantile [link math.dist.quantile Quantile]]
[def __quantile_c [link math.dist.quantile_c Quantile from the complement of the probability]]
[def __mean [link math.dist.mean mean]]
[def __variance [link math.dist.variance variance]]
[def __sd [link math.dist.sd standard deviation]]
[def __hazard [link math.dist.hazard Hazard Function]]
[def __chf [link math.dist.chf Cumulative Hazard Function]]
[/ def names end in distrib to avoid clashes]
[def __binomial_distrib [link math_toolkit.dist.dist_ref.dists.binomial_dist Binomial Distribution]]
[def __chi_squared_distrib [link math_toolkit.dist.dist_ref.dists.chi_squared_dist Chi Squared Distribution]]
[def __normal_distrib [link math_toolkit.dist.dist_ref.dists.normal_dist Normal Distribution]]
[def __F_distrib [link math_toolkit.dist.dist_ref.dists.f_dist Fisher F Distribution]]
[def __students_t_distrib [link math_toolkit.dist.dist_ref.dists.students_t_dist Students t Distribution]]
[def __usual_accessors __cdf, __pdf, __quantile, __hazard,
__chf, __mean, __variance and __sd]
[section:nmp Non-Member Properties]
Properties that are common to all distributions are accessed via non-member
getter functions. This allows more of these functions to be added over time
as the need arises. Unfortunately the literature uses many different and
confusing names to refer to a rather small number of actual concepts; refer
to the [link concept_index concept index] to find the property you
want by the name you are most familiar with.
Or use the [link function_index function index]
to go straight to the function you want if you already know its name.
[h4 [#function_index]Function Index]
* [link math.dist.cdf cdf].
* [link math.dist.ccdf cdf complement].
* [link math.dist.chf chf].
* [link math.dist.hazard hazard].
* __mean.
* [link math.dist.pdf pdf].
* [link math.dist.quantile quantile].
* [link math.dist.quantile_c quantile from the complement].
* [link math.dist.sd standard_deviation].
* __variance.
[h4 [#concept_index]Conceptual Index]
* __ccdf.
* __cdf.
* __chf.
* [link survival_inv Inverse Survival Function].
* __hazard
* [link lower_critical Lower Critical Value].
* __mean.
* [link cdfPQ P].
* [link percent Percent Point Function].
* __pdf.
* [link pmf Probability Mass Function].
* [link cdfPQ Q].
* __quantile.
* [link math.dist.quantile_c Quantile from the complement of the probability].
* __sd
* [link survival Survival Function].
* [link upper_critical Upper Critical Value].
* __variance.
[h4 [#math.dist.cdf]Cumulative Distribution Function]
template <class RealType>
RealType cdf(const ``['Distribution-Type]``<RealType>& dist, const RealType& x);
The __cdf is the probability that
the variable takes a value less than or equal to x. It is equivalent
to the integral from -infinity to x of the __pdf.
For example the following graph shows the cdf for the
normal distribution:
[$../graphs/cdf.png]
[h4 [#math.dist.ccdf]Complement of the Cumulative Distribution Function]
template <class Distribution, class RealType>
RealType cdf(const ``['Unspecified-Complement-Type]``<Distribution, RealType>& comp);
The complement of the __cdf
is the probability that
the variable takes a value greater than x. It is equivalent
to the integral from x to infinity of the __pdf, or 1 minus the __cdf of x.
This is also known as the survival function.
In this library, it is obtained by wrapping the arguments to the `cdf`
function in a call to `complement`, for example:
// standard normal distribution object:
boost::math::normal norm;
// print survival function for x=2.0:
std::cout << cdf(complement(norm, 2.0)) << std::endl;
For example the following graph shows the complement of the cdf for the
normal distribution:
[$../graphs/survival.png]
[h4 [#math.dist.hazard]Hazard Function]
template <class RealType>
RealType hazard(const ``['Distribution-Type]``<RealType>& dist, const RealType& x);
Returns the __hazard of /x/ and distibution /dist/.
[$../equations/hazard.png]
[caution
Some authors refer to this as the conditional failure
density function rather than the hazard function.]
[h4 [#math.dist.chf]Cumulative Hazard Function]
template <class RealType>
RealType chf(const ``['Distribution-Type]``<RealType>& dist, const RealType& x);
Returns the __chf of /x/ and distibution /dist/.
[$../equations/chf.png]
[caution
Some authors refer to this as simply the "Hazard Function".]
[h4 [#math.dist.mean]mean]
template<class RealType>
RealType mean(const ``['Distribution-Type]``<RealType>& dist);
Returns the mean of the distribution /dist/.
[h4 [#math.dist.pdf]Probabilty Density Function]
template <class RealType>
RealType pdf(const ``['Distribution-Type]``<RealType>& dist, const RealType& x);
For a continuous function, the probability density function (pdf) returns
the probability that the variate has the value x.
Since for continuous distributions the probability at a single point is actually zero,
the probability is better expressed as the integral of the pdf between two points:
see the __cdf.
For a discrete distribution, the pdf is the probability that the
variate takes the value x.
For example for a standard normal distribution the pdf looks like this:
[$../graphs/pdf.png]
[h4 [#math.dist.quantile]quantile]
template <class RealType>
RealType quantile(const ``['Distribution-Type]``<RealType>& dist, const RealType& p);
The quantile is best viewed as the inverse of the __cdf, it returns
a value /x/ such that `cdf(dist, x) == p`.
This is also known as the /percent point function/, or a /percentile/.
The following graph shows the quantile function for a standard normal
distribution:
[$../graphs/quantile.png]
[h4 [#math.dist.quantile_c]Quantile from the complement of the probability.]
template <class Distribution, class RealType>
RealType quantile(const ``['Unspecified-Complement-Type]``<Distribution, RealType>& comp);
This is the inverse of the __ccdf. It is calculated by wrapping
the arguments in a call to the quantile function in a call to
/complement/. For example:
// define a standard normal distribution:
boost::math::normal norm;
// print the value of x for which the complement
// of the probability is 0.05:
std::cout << quantile(complement(norm, 0.05)) << std::endl;
The function computes a value /x/ such that
`cdf(complement(dist, x)) == q` where /q/ is complement of the
probability.
This function is also called the inverse survival function.
The following graph show the inverse survival function for the normal
distribution:
[$../graphs/survival_inv.png]
[h4 [#math.dist.sd]Standard Deviation]
template <class RealType>
RealType standard_deviation(const ``['Distribution-Type]``<RealType>& dist);
Returns the standard deviation of distribution /dist/.
[h4 [#math.dist.variance]variance]
template <class RealType>
RealType variance(const ``['Distribution-Type]``<RealType>& dist);
Returns the variance of the distribution /dist/.
[h4 [#cdfPQ]P and Q]
The terms P and Q are sometimes used to refer to the __cdf
and its [link math.dist.ccdf complement] respectively.
Lowercase p and q are sometimes used to refer to the values returned
by these functions.
[h4 [#percent]Percent Point Function]
The percent point function, also known as the percentiles, is the same as
the __quantile.
[h4 [#survival_inv]Inverse Survival Function.]
The inverse of the survival function, is the same as computing the
[link math.dist.quantile_c quantile
from the complement of the probability].
[h4 [#pmf]Probability Mass Function]
The Probability Mass Function is the same as the __pdf.
The term Mass Function is usually applied to discrete distributions,
while the term __pdf applies to continuous distributions.
[h4 [#lower_critical]Lower Critical Value.]
The lower critical value calculates the value of the random variable
given the area under the left tail of the distribution.
It is equivalent to calculating the __quantile.
[h4 [#upper_critical]Upper Critical Value.]
The upper critical value calculates the value of the random variable
given the area under the right tail of the distribution. It is equivalent to
calculating the [link math.dist.quantile_c quantile from the complement of the
probability].
[h4 [#survival]Survival Function]
Refer to the __ccdf.
[endsect][/section:nmp Non-Member Properties]
[section:dists Distributions]
[section:binomial_dist Binomial]
``#include <boost/math/distributions/binomial.hpp>``
namespace boost{ namespace math{
template <class RealType>
class binomial_distribution;
typedef binomial_distribution<double> binomial;
template <class RealType>
class binomial_distribution
{
public:
// construct:
binomial_distribution(RealType n, RealType p);
// parameter access::
RealType success_fraction() const;
RealType trials() const;
// Bounds on success fraction:
static RealType estimate_lower_bound_on_p(
RealType trials,
RealType successes,
RealType probability);
static RealType estimate_upper_bound_on_p(
RealType trials,
RealType successes,
RealType probability);
// estimate min/max number of trials:
static RealType estimate_number_of_trials(
RealType k, // number of events
RealType p, // success fraction
RealType probability); // probability threshold
template <class P1, class P2, class P3>
static RealType estimate_number_of_trials(
const ``['unspecified-complemented-type]``<P1, P2, P3>& c);
};
}} // namespaces
The class type `binomial_distribution` represents a binomial distribution:
it is used when there are exactly two mutually
exclusive outcomes of a trial. These outcomes are labelled
"success" and "failure". The binomial distribution is used to obtain
the probability of observing x successes in N trials, with the
probability of success on a single trial denoted by p. The
binomial distribution assumes that p is fixed for all trials.
[h4 Member Functions]
binomial_distribution(RealType n, RealType p);
Constructor: /n/ is the total number of trials, /p/ is the
probability of success of a single trial.
RealType success_fraction() const;
Returns the parameter /p/ from which this distribution was constructed.
RealType trials() const;
Returns the parameter /n/ from which this distribution was constructed.
static RealType estimate_lower_bound_on_p(
RealType trials,
RealType successes,
RealType alpha);
Returns a lower bound on the success fraction:
[variablelist
[[trials][The total number of trials conducted.]]
[[successes][The number of successes that occurred.]]
[[alpha][The largest acceptable probability that the true value of
the success fraction is [*less than] the value returned.]]
]
For example, if you observe /k/ successes from /n/ trials the
best estimate for the success fraction is simply ['k/n], but if you
want to be 95% sure that the true value is [*greater than] some value,
['p[sub min]], then:
p``[sub min]`` = binomial_distribution<RealType>::estimate_lower_bound_on_p(
n, k, 0.05);
[link binom_conf See worked example.]
static RealType estimate_upper_bound_on_p(
RealType trials,
RealType successes,
RealType alpha);
Returns an upper bound on the success fraction:
[variablelist
[[trials][The total number of trials conducted.]]
[[successes][The number of successes that occurred.]]
[[alpha][The largest acceptable probability that the true value of
the success fraction is [*greater than] the value returned.]]
]
For example, if you observe /k/ successes from /n/ trials the
best estimate for the success fraction is simply ['k/n], but if you
want to be 95% sure that the true value is [*less than] some value,
['p[sub max]], then:
p``[sub max]`` = binomial_distribution<RealType>::estimate_upper_bound_on_p(
n, k, 0.05);
[link binom_conf See worked example.]
static RealType estimate_number_of_trials(
RealType k, // number of events
RealType p, // success fraction
RealType alpha); // probability threshold
template <class P1, class P2, class P3>
static RealType estimate_number_of_trials(
const ``['unspecified-complemented-type]``<P1, P2, P3>& c);
These functions estimate the number of trials required to achieve a certain
probability that [*k events or fewer will be observed].
[variablelist
[[k][The number of success observed.]]
[[p][The probability of success for each trial.]]
[[alpha][The maximum acceptable probability that k events or fewer will be observed.]]
]
For example:
binomial_distribution<RealType>::estimate_number_of_trials(10, 0.5, 0.05);
Returns the smallest number of trials we must conduct to be 95% sure
of seeing 10 events that occur with frequency one half.
While:
binomial_distribution<RealType>::estimate_number_of_trials(
complement(0, 1.0/1000000, 0.05));
Returns the largest number of trials we can conduct and still be 95% certain
of not observing any events that occur with one in a million frequency.
This is typically used in failure analysis.
[link binom_size_eg See Worked Example.]
[h4 Non-member Accessors]
All the [link math_toolkit.dist.dist_ref.nmp usual non-member accessor functions]
that are generic to all distributions are supported: __usual_accessors.
However it's worth taking a moment to define what these actually mean in
the context of this distribution:
[table Meaning of the non-member accessors
[[Function][Example Code][Meaning]]
[[__pdf][``pdf(binomial(n, p), k)``]
[The probability of obtaining [*exactly k successes] from n trials
with success fraction p.]]
[[__cdf][``cdf(binomial(n, p), k)``]
[The probability of obtaining [*k successes or fewer] from n trials
with success fraction p.]]
[[__ccdf][``cdf(complement(binomial(n, p), k))``]
[The probability of obtaining [*more than k successes] from n trials
with success fraction p.]]
[[__quantile][``quantile(binomial(n, p), P)``]
[The [*greatest] number of successes that may be observed from n trials
with success fraction p, at probability P. Note that the value returned
is a real-number, and not an integer. Depending on the use case you may
want to take either the floor or ceiling of the result.]]
[[__quantile_c][``quantile(complement(binomial(n, p), P))``]
[The [*smallest] number of successes that may be observed from n trials
with success fraction p, at probability P. Note that the value returned
is a real-number, and not an integer. Depending on the use case you may
want to take either the floor or ceiling of the result.]]
]
[endsect][/section:binomial_dist Binomial]
[section:chi_squared_dist Chi Squared]
The chi-square distribution results when /v/ independent variables with
standard normal distributions are squared and summed.
``#include <boost/math/distributions/chi_squared.hpp>``
namespace boost{ namespace math{
template <class RealType>
class chi_squared_distribution;
typedef chi_squared_distribution<double> chi_squared;
template <class RealType>
class chi_squared_distribution
{
public:
typedef RealType value_type;
// Construct:
chi_squared_distribution(RealType i);
// Access parameter:
RealType degrees_of_freedom()const;
// Parameter estimation:
static RealType estimate_degrees_of_freedom(
RealType difference_from_mean,
RealType alpha,
RealType beta,
RealType sd,
RealType hint = 100);
};
}} // namespaces
[h4 Member Functions]
chi_squared_distribution(RealType v);
Constructs a Chi Squared distribution with /v/ degrees of freedom.
RealType degrees_of_freedom()const;
Returns the parameter /v/ from which this object was constructed.
static RealType estimate_degrees_of_freedom(
RealType difference_from_mean,
RealType alpha,
RealType beta,
RealType sd,
RealType hint = 100);
Under construction.
[h4 Non-member Accessors]
All the [link math_toolkit.dist.dist_ref.nmp usual non-member accessor functions]
that are generic to all distributions are supported: __usual_accessors.
[endsect][/section:chi_squared_dist Chi Squared]
[section:f_dist F distribution]
The F distribution is the ratio of two chi-squared distributions with
degrees of freedom df1 and df2, respectively, where each chi-squared has
first been divided by its degrees of freedom.
``#include <boost/math/distributions/fisher_f.hpp>``
namespace boost{ namespace math{
template <class RealType>
class fisher_f_distribution;
typedef fisher_f_distribution<double> fisher_f;
template <class RealType>
class fisher_f_distribution
{
public:
typedef RealType value_type;
// Construct:
fisher_f_distribution(const RealType& i, const RealType& j);
// Accessors:
RealType degrees_of_freedom1()const;
RealType degrees_of_freedom2()const;
};
}} //namespaces
[h4 Member Functions]
fisher_f_distribution(const RealType& df1, const RealType& df2);
Constructs an F-distribution with numerator degrees of freedom /df1/
and denominator degrees of freedom /df2/.
RealType degrees_of_freedom1()const;
Returns the numerator degrees of freedom parameter of the distribution.
RealType degrees_of_freedom2()const;
Returns the denominator degrees of freedom parameter of the distribution.
[h4 Non-member Accessors]
All the [link math_toolkit.dist.dist_ref.nmp usual non-member accessor functions]
that are generic to all distributions are supported: __usual_accessors.
[endsect][/section:f_dist F distribution]
[section:normal_dist Normal]
The normal distribution is probably the most well known statistical
distribution: it is also known as the Gaussian Distribution.
A normal distribution with mean zero and standard deviation one
is known as the ['Standard Normal Distribution].
``#include <boost/math/distributions/normal.hpp>``
namespace boost{ namespace math{
template <class RealType>
class normal_distribution;
typedef normal_distribution<double> normal;
template <class RealType>
class normal_distribution
{
public:
typedef RealType value_type;
// Construct:
normal_distribution(RealType mean = 0, RealType sd = 1);
// Accessors:
RealType mean()const;
RealType standard_deviation()const;
};
}} // namespaces
[h4 Member Functions]
normal_distribution(RealType mean = 0, RealType sd = 1);
Constructs a normal distribution with mean /mean/ and
standard deviation /sd/.
RealType mean()const;
Returns the /mean/ of this distribution.
RealType standard_deviation()const;
Returns the /standard deviation/ of this distribution.
[h4 Non-member Accessors]
All the [link math_toolkit.dist.dist_ref.nmp usual non-member accessor functions] that are generic to all
distributions are supported: __usual_accessors.
[endsect][/section:normal_dist Normal]
[section:students_t_dist Students t]
A statistical distribution published by William Gosset in 1908.
His employer, Guinness Breweries, required him to publish under a
pseudonym, so he chose "Student". Given N independent measurements, let
[$../equations/students_t_dist.png]
where /M/ is the population mean,[' ''' &#x3BC; '''] is the sample mean, and /s/ is the
sample variance.
Student's t-distribution is defined as the distribution of the random
variable t which is - very loosely - the "best" that we can do not
knowing the true standard deviation of the sample.
The Student's t-distribution takes a single parameter: the number of
degrees of freedom of the sample. When the degrees of freedom is
/one/ then this distribution is the same as the Cauchy-distribution.
As the number of degrees of freedom tends towards infinity, then this
distribution approaches the normal-distribution.
``#include <boost/math/distributions/students_t.hpp>``
namespace boost{ namespace math{
template <class RealType>
class students_t_distribution;
typedef students_t_distribution<double> students_t;
template <class RealType>
class students_t_distribution
{
typedef RealType value_type;
// Construct:
students_t_distribution(const RealType& v);
// Accessor:
RealType degrees_of_freedom()const;
// degrees of freedom estimation:
static RealType estimate_degrees_of_freedom(
RealType difference_from_mean,
RealType alpha,
RealType beta,
RealType sd,
RealType hint = 100);
};
}} // namespaces
[h4 Member Functions]
students_t_distribution(const RealType& v);
Constructs a Student's t-distribution with /v/ degrees of freedom.
RealType degrees_of_freedom()const;
Returns the number of degrees of freedom of this distribution.
static RealType estimate_degrees_of_freedom(
RealType difference_from_mean,
RealType alpha,
RealType beta,
RealType sd,
RealType hint = 100);
Returns the number of degrees of freedom required to observe a significant
result when the mean differs from the "true" mean by /difference_from_mean/.
[variablelist
[[difference_from_mean][The difference between the true mean and the sample mean
that we wish to show is significant.]]
[[alpha][The maximum acceptable probability of rejecting the null hypothesis
when it is in fact true.]]
[[beta][The maximum acceptable probability of accepting the null hypothesis
when it is in fact false.]]
[[sd][The sample standard deviation.]]
[[hint][A hint for the location to start looking for the result.]]
]
[note
Remember that for a two-sided test, you must divide alpha by two
before calling this function.]
For more information on this function see the
[@http://www.itl.nist.gov/div898/handbook/prc/section2/prc222.htm
NIST Engineering Statistics Handbook].
[h4 Non-member Accessors]
All the [link math_toolkit.dist.dist_ref.nmp usual non-member accessor functions] that are generic to all
distributions are supported: __usual_accessors.
[endsect][/section:students_t_dist Students t]
[endsect][/section:dists Distributions]
[endsect][/section:dist_ref Statistical Distributions and Functions Reference]
[section:future Extras/Future Directions]
I'm not anticipating any of the following being present in the initial
release: we've got enough to do figuring out the math !
[h4 Adding Additional Location and Scale Parameters]
In some modelling applications we require a distribution with a specific
location and scale:
often this equates to a specific mean and standard deviation, although for many
distributions the relationship between these properties and the location and
scale parameters are non-trivial.
See [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda364.htm http://www.itl.nist.gov/div898/handbook/eda/section3/eda364.htm] for more
information.
The obvious way to handle this is via an adapter template:
template <class Dist>
class scaled_distribution
{
scaled_distribution(
const Dist dist,
typename Dist::value_type location,
typename Dist::value_type scale = 0);
};
Which would then have its own set of overloads for the non-member accessor functions.
[h4 Higher Level Hypothesis Tests]
Higher-level tests roughly corresponding to the
[@http://documents.wolfram.com/mathematica/Add-onsLinks/StandardPackages/Statistics/HypothesisTests.html Mathematica Hypothesis Tests]
package could be added reasonably easily, for example:
template <class InputIterator>
typename std::iterator_traits<InputIterator>::value_type
test_equal_mean(
InputIterator a,
InputIterator b,
typename std::iterator_traits<InputIterator>::value_type expected_mean);
Returns the probability that the data in the sequence [a,b) has the mean
/expected_mean/.
[h4 Integration With Statistical Accumulators]
[@http://boost-sandbox.sourceforge.net/libs/accumulators/doc/html/index.html
Eric Niebler's accumulator framework] - also work in progress - provides the means
to calculate various statistical properties from experimental data. There is an
opportunity to integrate the statistical tests with this framework at some later date:
// define an accumulator, all required statistics to calculate the test
// are calculated automatically:
accumulator_set<double, features<tag::test_expected_mean> > acc(expected_mean=4);
// pass our data to the accumulator:
acc = std::for_each(mydata.begin(), mydata.end(), acc);
// extract the result:
double p = probability(acc);
[endsect][/section:future Extras Future Directions]
[/ dist_reference.qbk
Copyright 2006 John Maddock and Paul A. Bristow.
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at
http://www.boost.org/LICENSE_1_0.txt).
]