math/doc/dist_reference.qbk

[section:dist_ref Statistical Distributions and Functions Reference]

[def __cdf [link math.dist.cdf Cumulative Distribution Function]]
[def __pdf [link math.dist.pdf Probability Density Function]]
[def __ccdf [link math.dist.ccdf Complement of the Cumulative Distribution Function]]
[def __quantile [link math.dist.quantile Quantile]]
[def __quantile_c [link math.dist.quantile_c Quantile from the complement of the probability]]
[def __mean [link math.dist.mean mean]]
[def __variance [link math.dist.variance variance]]
[def __sd [link math.dist.sd standard deviation]]
[def __hazard [link math.dist.hazard Hazard Function]]
[def __chf [link math.dist.chf Cumulative Hazard Function]]

[/ def names end in distrib to avoid clashes]
[def __binomial_distrib [link math_toolkit.dist.dist_ref.dists.binomial_dist Binomial Distribution]]
[def __chi_squared_distrib [link math_toolkit.dist.dist_ref.dists.chi_squared_dist Chi Squared Distribution]]
[def __normal_distrib [link math_toolkit.dist.dist_ref.dists.normal_dist Normal Distribution]]
[def __F_distrib [link math_toolkit.dist.dist_ref.dists.f_dist Fisher F Distribution]]
[def __students_t_distrib [link math_toolkit.dist.dist_ref.dists.students_t_dist Students t Distribution]]

[def __usual_accessors __cdf, __pdf, __quantile, __hazard,
   __chf, __mean, __variance and __sd]

[section:nmp Non-Member Properties]

Properties that are common to all distributions are accessed via non-member
getter functions.  This allows more of these functions to be added over time
as the need arises.  Unfortunately the literature uses many different and
confusing names to refer to a rather small number of actual concepts; refer
to the [link concept_index concept index] to find the property you
want by the name you are most familiar with.
Or use the [link function_index function index]
to go straight to the function you want if you already know its name.

[h4 [#function_index]Function Index]

* [link math.dist.cdf cdf].
* [link math.dist.ccdf cdf complement].
* [link math.dist.chf chf].
* [link math.dist.hazard hazard].
* __mean.
* [link math.dist.pdf pdf].
* [link math.dist.quantile quantile].
* [link math.dist.quantile_c quantile from the complement].
* [link math.dist.sd standard_deviation].
* __variance.

[h4 [#concept_index]Conceptual Index]

* __ccdf.
* __cdf.
* __chf.
* [link survival_inv Inverse Survival Function].
* __hazard
* [link lower_critical Lower Critical Value].
* __mean.
* [link cdfPQ P].
* [link percent Percent Point Function].
* __pdf.
* [link pmf Probability Mass Function].
* [link cdfPQ Q].
* __quantile.
* [link math.dist.quantile_c Quantile from the complement of the probability].
* __sd
* [link survival Survival Function].
* [link upper_critical Upper Critical Value].
* __variance.

[h4 [#math.dist.cdf]Cumulative Distribution Function]

   template <class RealType>
   RealType cdf(const ``['Distribution-Type]``<RealType>& dist, const RealType& x);

The __cdf is the probability that
the variable takes a value less than or equal to x.  It is equivalent
to the integral from -infinity to x of the __pdf.
For example the following graph shows the cdf for the
normal distribution:

[$../graphs/cdf.png]

[h4 [#math.dist.ccdf]Complement of the Cumulative Distribution Function]

   template <class Distribution, class RealType>
   RealType cdf(const ``['Unspecified-Complement-Type]``<Distribution, RealType>& comp);

The complement of the __cdf
is the probability that
the variable takes a value greater than x.  It is equivalent
to the integral from x to infinity of the __pdf, or 1 minus the __cdf of x.

This is also known as the survival function.

In this library, it is obtained by wrapping the arguments to the `cdf`
function in a call to `complement`, for example:

   // standard normal distribution object:
   boost::math::normal norm;
   // print survival function for x=2.0:
   std::cout << cdf(complement(norm, 2.0)) << std::endl;

For example the following graph shows the complement of the cdf for the
normal distribution:

[$../graphs/survival.png]

[h4 [#math.dist.hazard]Hazard Function]

   template <class RealType>
   RealType hazard(const ``['Distribution-Type]``<RealType>& dist, const RealType& x);

Returns the __hazard of /x/ and distibution /dist/.

[$../equations/hazard.png]

[caution
Some authors refer to this as the conditional failure
density function rather than the hazard function.]

[h4 [#math.dist.chf]Cumulative Hazard Function]

   template <class RealType>
   RealType chf(const ``['Distribution-Type]``<RealType>& dist, const RealType& x);

Returns the __chf of /x/ and distibution /dist/.

[$../equations/chf.png]

[caution
Some authors refer to this as simply the "Hazard Function".]

[h4 [#math.dist.mean]mean]

   template<class RealType>
   RealType mean(const ``['Distribution-Type]``<RealType>& dist);

Returns the mean of the distribution /dist/.

[h4 [#math.dist.pdf]Probabilty Density Function]

   template <class RealType>
   RealType pdf(const ``['Distribution-Type]``<RealType>& dist, const RealType& x);

For a continuous function, the probability density function (pdf) returns
the probability that the variate has the value x.
Since for continuous distributions the probability at a single point is actually zero,
the probability is better expressed as the integral of the pdf between two points:
see the __cdf.

For a discrete distribution, the pdf is the probability that the
variate takes the value x.

For example for a standard normal distribution the pdf looks like this:

[$../graphs/pdf.png]

[h4 [#math.dist.quantile]quantile]

   template <class RealType>
   RealType quantile(const ``['Distribution-Type]``<RealType>& dist, const RealType& p);

The quantile is best viewed as the inverse of the __cdf, it returns
a value /x/ such that `cdf(dist, x) == p`.

This is also known as the /percent point function/, or a /percentile/.

The following graph shows the quantile function for a standard normal
distribution:

[$../graphs/quantile.png]

[h4 [#math.dist.quantile_c]Quantile from the complement of the probability.]

   template <class Distribution, class RealType>
   RealType quantile(const ``['Unspecified-Complement-Type]``<Distribution, RealType>& comp);

This is the inverse of the __ccdf.  It is calculated by wrapping
the arguments in a call to the quantile function in a call to
/complement/.  For example:

   // define a standard normal distribution:
   boost::math::normal norm;
   // print the value of x for which the complement
   // of the probability is 0.05:
   std::cout << quantile(complement(norm, 0.05)) << std::endl;

The function computes a value /x/ such that
`cdf(complement(dist, x)) == q` where /q/ is complement of the
probability.

This function is also called the inverse survival function.

The following graph show the inverse survival function for the normal
distribution:

[$../graphs/survival_inv.png]

[h4 [#math.dist.sd]Standard Deviation]

   template <class RealType>
   RealType standard_deviation(const ``['Distribution-Type]``<RealType>& dist);

Returns the standard deviation of distribution /dist/.

[h4 [#math.dist.variance]variance]

   template <class RealType>
   RealType variance(const ``['Distribution-Type]``<RealType>& dist);

Returns the variance of the distribution /dist/.

[h4 [#cdfPQ]P and Q]

The terms P and Q are sometimes used to refer to the __cdf
and its [link math.dist.ccdf complement] respectively.
Lowercase p and q are sometimes used to refer to the values returned
by these functions.

[h4 [#percent]Percent Point Function]

The percent point function, also known as the percentiles, is the same as
the __quantile.

[h4 [#survival_inv]Inverse Survival Function.]

The inverse of the survival function, is the same as computing the
[link math.dist.quantile_c quantile
from the complement of the probability].

[h4 [#pmf]Probability Mass Function]

The Probability Mass Function is the same as the __pdf.

The term Mass Function is usually applied to discrete distributions,
while the term __pdf applies to continuous distributions.

[h4 [#lower_critical]Lower Critical Value.]

The lower critical value calculates the value of the random variable
given the area under the left tail of the distribution.
It is equivalent to calculating the __quantile.

[h4 [#upper_critical]Upper Critical Value.]

The upper critical value calculates the value of the random variable
given the area under the right tail of the distribution.  It is equivalent to
calculating the [link math.dist.quantile_c quantile from the complement of the
probability].

[h4 [#survival]Survival Function]

Refer to the __ccdf.

[endsect][/section:nmp Non-Member Properties]

[section:dists Distributions]

[section:binomial_dist Binomial]

``#include <boost/math/distributions/binomial.hpp>``

   namespace boost{ namespace math{

   template <class RealType>
   class binomial_distribution;

   typedef binomial_distribution<double> binomial;

   template <class RealType>
   class binomial_distribution
   {
   public:
      // construct:
      binomial_distribution(RealType n, RealType p);

      // parameter access::
      RealType success_fraction() const;
      RealType trials() const;

      // Bounds on success fraction:
      static RealType estimate_lower_bound_on_p(
         RealType trials,
         RealType successes,
         RealType probability);
      static RealType estimate_upper_bound_on_p(
         RealType trials,
         RealType successes,
         RealType probability);

      // estimate min/max number of trials:
      static RealType estimate_number_of_trials(
         RealType k,     // number of events
         RealType p,     // success fraction
         RealType probability); // probability threshold

      template <class P1, class P2, class P3>
      static RealType estimate_number_of_trials(
         const ``['unspecified-complemented-type]``<P1, P2, P3>& c);
   };

   }} // namespaces

The class type `binomial_distribution` represents a binomial distribution:
it is used when there are exactly two mutually
exclusive outcomes of a trial. These outcomes are labelled
"success" and "failure". The binomial distribution is used to obtain
the probability of observing x successes in N trials, with the
probability of success on a single trial denoted by p. The
binomial distribution assumes that p is fixed for all trials.

[h4 Member Functions]

   binomial_distribution(RealType n, RealType p);

Constructor: /n/ is the total number of trials, /p/ is the
probability of success of a single trial.

   RealType success_fraction() const;

Returns the parameter /p/ from which this distribution was constructed.

   RealType trials() const;

Returns the parameter /n/ from which this distribution was constructed.

   static RealType estimate_lower_bound_on_p(
      RealType trials,
      RealType successes,
      RealType alpha);

Returns a lower bound on the success fraction:

[variablelist
[[trials][The total number of trials conducted.]]
[[successes][The number of successes that occurred.]]
[[alpha][The largest acceptable probability that the true value of
         the success fraction is [*less than] the value returned.]]
]

For example, if you observe /k/ successes from /n/ trials the
best estimate for the success fraction is simply ['k/n], but if you
want to be 95% sure that the true value is [*greater than] some value,
['p[sub min]], then:

   p``[sub min]`` = binomial_distribution<RealType>::estimate_lower_bound_on_p(
                       n, k, 0.05);

[link binom_conf See worked example.]

   static RealType estimate_upper_bound_on_p(
      RealType trials,
      RealType successes,
      RealType alpha);

Returns an upper bound on the success fraction:

[variablelist
[[trials][The total number of trials conducted.]]
[[successes][The number of successes that occurred.]]
[[alpha][The largest acceptable probability that the true value of
         the success fraction is [*greater than] the value returned.]]
]

For example, if you observe /k/ successes from /n/ trials the
best estimate for the success fraction is simply ['k/n], but if you
want to be 95% sure that the true value is [*less than] some value,
['p[sub max]], then:

   p``[sub max]`` = binomial_distribution<RealType>::estimate_upper_bound_on_p(
                       n, k, 0.05);

[link binom_conf See worked example.]

   static RealType estimate_number_of_trials(
      RealType k,     // number of events
      RealType p,     // success fraction
      RealType alpha); // probability threshold

   template <class P1, class P2, class P3>
   static RealType estimate_number_of_trials(
      const ``['unspecified-complemented-type]``<P1, P2, P3>& c);

These functions estimate the number of trials required to achieve a certain
probability that [*k events or fewer will be observed].

[variablelist
[[k][The number of success observed.]]
[[p][The probability of success for each trial.]]
[[alpha][The maximum acceptable probability that k events or fewer will be observed.]]
]

For example:

   binomial_distribution<RealType>::estimate_number_of_trials(10, 0.5, 0.05);

Returns the smallest number of trials we must conduct to be 95% sure
of seeing 10 events that occur with frequency one half.

While:

   binomial_distribution<RealType>::estimate_number_of_trials(
      complement(0, 1.0/1000000, 0.05));

Returns the largest number of trials we can conduct and still be 95% certain
of not observing any events that occur with one in a million frequency.
This is typically used in failure analysis.

[link binom_size_eg See Worked Example.]

[h4 Non-member Accessors]

All the [link math_toolkit.dist.dist_ref.nmp usual non-member accessor functions]
that are generic to all distributions are supported: __usual_accessors.

However it's worth taking a moment to define what these actually mean in
the context of this distribution:

[table Meaning of the non-member accessors
[[Function][Example Code][Meaning]]
[[__pdf][``pdf(binomial(n, p), k)``]
   [The probability of obtaining [*exactly k successes] from n trials
   with success fraction p.]]
[[__cdf][``cdf(binomial(n, p), k)``]
   [The probability of obtaining [*k successes or fewer] from n trials
   with success fraction p.]]
[[__ccdf][``cdf(complement(binomial(n, p), k))``]
   [The probability of obtaining [*more than k successes] from n trials
   with success fraction p.]]
[[__quantile][``quantile(binomial(n, p), P)``]
   [The [*greatest] number of successes that may be observed from n trials
   with success fraction p, at probability P.  Note that the value returned
   is a real-number, and not an integer.  Depending on the use case you may
   want to take either the floor or ceiling of the result.]]
[[__quantile_c][``quantile(complement(binomial(n, p), P))``]
   [The [*smallest] number of successes that may be observed from n trials
   with success fraction p, at probability P.  Note that the value returned
   is a real-number, and not an integer.  Depending on the use case you may
   want to take either the floor or ceiling of the result.]]
]

[endsect][/section:binomial_dist Binomial]

[section:chi_squared_dist Chi Squared]

The chi-square distribution results when /v/ independent variables with
standard normal distributions are squared and summed.

``#include <boost/math/distributions/chi_squared.hpp>``

   namespace boost{ namespace math{

   template <class RealType>
   class chi_squared_distribution;

   typedef chi_squared_distribution<double> chi_squared;

   template <class RealType>
   class chi_squared_distribution
   {
   public:
      typedef RealType value_type;

      // Construct:
      chi_squared_distribution(RealType i);

      // Access parameter:
      RealType degrees_of_freedom()const;

      // Parameter estimation:
      static RealType estimate_degrees_of_freedom(
         RealType difference_from_mean,
         RealType alpha,
         RealType beta,
         RealType sd,
         RealType hint = 100);
   };

   }} // namespaces

[h4 Member Functions]

      chi_squared_distribution(RealType v);

Constructs a Chi Squared distribution with /v/ degrees of freedom.

      RealType degrees_of_freedom()const;

Returns the parameter /v/ from which this object was constructed.

      static RealType estimate_degrees_of_freedom(
         RealType difference_from_mean,
         RealType alpha,
         RealType beta,
         RealType sd,
         RealType hint = 100);

Under construction.

[h4 Non-member Accessors]

All the [link math_toolkit.dist.dist_ref.nmp usual non-member accessor functions]
that are generic to all distributions are supported: __usual_accessors.

[endsect][/section:chi_squared_dist Chi Squared]

[section:f_dist F distribution]

The F distribution is the ratio of two chi-squared distributions with
degrees of freedom df1 and df2, respectively, where each chi-squared has
first been divided by its degrees of freedom.

``#include <boost/math/distributions/fisher_f.hpp>``

   namespace boost{ namespace math{

   template <class RealType>
   class fisher_f_distribution;

   typedef fisher_f_distribution<double> fisher_f;

   template <class RealType>
   class fisher_f_distribution
   {
   public:
      typedef RealType value_type;

      // Construct:
      fisher_f_distribution(const RealType& i, const RealType& j);

      // Accessors:
      RealType degrees_of_freedom1()const;
      RealType degrees_of_freedom2()const;
   };

   }} //namespaces

[h4 Member Functions]

      fisher_f_distribution(const RealType& df1, const RealType& df2);

Constructs an F-distribution with numerator degrees of freedom /df1/
and denominator degrees of freedom /df2/.

      RealType degrees_of_freedom1()const;

Returns the numerator degrees of freedom parameter of the distribution.

      RealType degrees_of_freedom2()const;

Returns the denominator degrees of freedom parameter of the distribution.

[h4 Non-member Accessors]

All the [link math_toolkit.dist.dist_ref.nmp usual non-member accessor functions]
that are generic to all distributions are supported: __usual_accessors.

[endsect][/section:f_dist F distribution]

[section:normal_dist Normal]

The normal distribution is probably the most well known statistical
distribution: it is also known as the Gaussian Distribution.
A normal distribution with mean zero and standard deviation one
is known as the ['Standard Normal Distribution].

``#include <boost/math/distributions/normal.hpp>``

   namespace boost{ namespace math{

   template <class RealType>
   class normal_distribution;

   typedef normal_distribution<double> normal;

   template <class RealType>
   class normal_distribution
   {
   public:
      typedef RealType value_type;
      // Construct:
      normal_distribution(RealType mean = 0, RealType sd = 1);
      // Accessors:
      RealType mean()const;
      RealType standard_deviation()const;
   };

   }} // namespaces

[h4 Member Functions]

   normal_distribution(RealType mean = 0, RealType sd = 1);

Constructs a normal distribution with mean /mean/ and
standard deviation /sd/.

   RealType mean()const;

Returns the /mean/ of this distribution.

   RealType standard_deviation()const;

Returns the /standard deviation/ of this distribution.

[h4 Non-member Accessors]

All the [link math_toolkit.dist.dist_ref.nmp usual non-member accessor functions] that are generic to all
distributions are supported: __usual_accessors.

[endsect][/section:normal_dist Normal]

[section:students_t_dist Students t]

A statistical distribution published by William Gosset in 1908.
His employer, Guinness Breweries, required him to publish under a
pseudonym, so he chose "Student". Given N independent measurements, let

[$../equations/students_t_dist.png]

where /M/ is the population mean,[' ''' &#x3BC; '''] is the sample mean, and /s/ is the
sample variance.

Student's t-distribution is defined as the distribution of the random
variable t which is  - very loosely - the "best" that we can do not
knowing the true standard deviation of the sample.

The Student's t-distribution takes a single parameter: the number of
degrees of freedom of the sample.  When the degrees of freedom is
/one/ then this distribution is the same as the Cauchy-distribution.
As the number of degrees of freedom tends towards infinity, then this
distribution approaches the normal-distribution.

``#include <boost/math/distributions/students_t.hpp>``

   namespace boost{ namespace math{

   template <class RealType>
   class students_t_distribution;

   typedef students_t_distribution<double> students_t;

   template <class RealType>
   class students_t_distribution
   {
      typedef RealType value_type;

      // Construct:
      students_t_distribution(const RealType& v);

      // Accessor:
      RealType degrees_of_freedom()const;

      // degrees of freedom estimation:
      static RealType estimate_degrees_of_freedom(
         RealType difference_from_mean,
         RealType alpha,
         RealType beta,
         RealType sd,
         RealType hint = 100);
   };

   }} // namespaces

[h4 Member Functions]

   students_t_distribution(const RealType& v);

Constructs a Student's t-distribution with /v/ degrees of freedom.

   RealType degrees_of_freedom()const;

Returns the number of degrees of freedom of this distribution.

   static RealType estimate_degrees_of_freedom(
      RealType difference_from_mean,
      RealType alpha,
      RealType beta,
      RealType sd,
      RealType hint = 100);

Returns the number of degrees of freedom required to observe a significant
result when the mean differs from the "true" mean by /difference_from_mean/.

[variablelist
[[difference_from_mean][The difference between the true mean and the sample mean
                        that we wish to show is significant.]]
[[alpha][The maximum acceptable probability of rejecting the null hypothesis
        when it is in fact true.]]
[[beta][The maximum acceptable probability of accepting the null hypothesis
        when it is in fact false.]]
[[sd][The sample standard deviation.]]
[[hint][A hint for the location to start looking for the result.]]
]

[note
Remember that for a two-sided test, you must divide alpha by two
before calling this function.]

For more information on this function see the
[@http://www.itl.nist.gov/div898/handbook/prc/section2/prc222.htm
NIST Engineering Statistics Handbook].

[h4 Non-member Accessors]

All the [link math_toolkit.dist.dist_ref.nmp usual non-member accessor functions] that are generic to all
distributions are supported: __usual_accessors.

[endsect][/section:students_t_dist Students t]

[endsect][/section:dists Distributions]

[endsect][/section:dist_ref Statistical Distributions and Functions Reference]

[section:future Extras/Future Directions]

I'm not anticipating any of the following being present in the initial
release: we've got enough to do figuring out the math !

[h4 Adding Additional Location and Scale Parameters]

In some modelling applications we require a distribution with a specific
location and scale:
often this equates to a specific mean and standard deviation, although for many
distributions the relationship between these properties and the location and
scale parameters are non-trivial.
See [@http://www.itl.nist.gov/div898/handbook/eda/section3/eda364.htm http://www.itl.nist.gov/div898/handbook/eda/section3/eda364.htm] for more
information.

The obvious way to handle this is via an adapter template:

	template <class Dist>
	class scaled_distribution
	{
	   scaled_distribution(
	     const Dist dist,
	     typename Dist::value_type location,
	     typename Dist::value_type scale = 0);
	};

Which would then have its own set of overloads for the non-member accessor functions.

[h4 Higher Level Hypothesis Tests]

Higher-level tests roughly corresponding to the
[@http://documents.wolfram.com/mathematica/Add-onsLinks/StandardPackages/Statistics/HypothesisTests.html Mathematica Hypothesis Tests]
package could be added reasonably easily, for example:

	template <class InputIterator>
	typename std::iterator_traits<InputIterator>::value_type
	   test_equal_mean(
	     InputIterator a,
	     InputIterator b,
	     typename std::iterator_traits<InputIterator>::value_type expected_mean);

Returns the probability that the data in the sequence [a,b) has the mean
/expected_mean/.

[h4 Integration With Statistical Accumulators]

[@http://boost-sandbox.sourceforge.net/libs/accumulators/doc/html/index.html
Eric Niebler's accumulator framework] - also work in progress - provides the means
to calculate various statistical properties from experimental data.  There is an
opportunity to integrate the statistical tests with this framework at some later date:

	// define an accumulator, all required statistics to calculate the test
	// are calculated automatically:
	accumulator_set<double, features<tag::test_expected_mean> > acc(expected_mean=4);
	// pass our data to the accumulator:
	acc = std::for_each(mydata.begin(), mydata.end(), acc);
	// extract the result:
	double p = probability(acc);

[endsect][/section:future Extras Future Directions]

[/ dist_reference.qbk
  Copyright 2006 John Maddock and Paul A. Bristow.
  Distributed under the Boost Software License, Version 1.0.
  (See accompanying file LICENSE_1_0.txt or copy at
  http://www.boost.org/LICENSE_1_0.txt).
]