mirror of
https://github.com/boostorg/math.git
synced 2026-01-26 18:52:10 +00:00
* Kolmogorov-Smirnov distribution #421 Add a new distribution, kolmogorov_smirnov_distribution, which takes a parameter that represents the number of observations used in a Kolmogorov-Smirnov test. (The K-S test is a popular test for comparing two CDFs, but the test statistic is not implemented here.) This implementation includes Kolmogorov's original 1st order Taylor expansion. There is a literature on the distribution's other mathematical properties (higher order terms and exact version); this literature is summarized in the main header file for anyone who may want to expand the implementation later. The CDF is implemented using a Jacobi theta function, and the PDF is a hand-rolled derivative of that function. Quantiles plug the CDF and PDF into a Newton-Raphson iteration. The mean and variance have nice closed-form expressions, and the mode uses a dumb run-time maximizer. This commit includes graphs, a ULP plotter for the PDF, and the usual compilation and numerical tests. The test file is on the small side, but it integrates the distribution from zero to infinity, and covers the quantiles pretty well. As of now the numerical tests only verify self-consistency (e.g. distribution moments and CDF-quantile relations), so there's room to add some external checks. * Implement skewness for K-S distribution [CI SKIP] The third moment integrates nicely with the help of Apery's constant (zeta_three). Verify the result via quadrature. * Implement kurtosis for the K-S distribution Verify the result via quadrature.
142 lines
4.5 KiB
Plaintext
142 lines
4.5 KiB
Plaintext
[section:dist_ref Statistical Distributions Reference]
|
|
|
|
[include non_members.qbk]
|
|
|
|
[section:dists Distributions]
|
|
|
|
[include arcsine.qbk]
|
|
[include bernoulli.qbk]
|
|
[include beta.qbk]
|
|
[include binomial.qbk]
|
|
[include cauchy.qbk]
|
|
[include chi_squared.qbk]
|
|
[include empirical_cdf.qbk]
|
|
[include exponential.qbk]
|
|
[include extreme_value.qbk]
|
|
[include fisher.qbk]
|
|
[include gamma.qbk]
|
|
[include geometric.qbk]
|
|
[include hyperexponential.qbk]
|
|
[include hypergeometric.qbk]
|
|
[include inverse_chi_squared.qbk]
|
|
[include inverse_gamma.qbk]
|
|
[include inverse_gaussian.qbk]
|
|
[include kolmogorov_smirnov.qbk]
|
|
[include laplace.qbk]
|
|
[include logistic.qbk]
|
|
[include lognormal.qbk]
|
|
[include negative_binomial.qbk]
|
|
[include nc_beta.qbk]
|
|
[include nc_chi_squared.qbk]
|
|
[include nc_f.qbk]
|
|
[include nc_t.qbk]
|
|
[include normal.qbk]
|
|
[include pareto.qbk]
|
|
[include poisson.qbk]
|
|
[include rayleigh.qbk]
|
|
[include skew_normal.qbk]
|
|
[include students_t.qbk]
|
|
[include triangular.qbk]
|
|
[include uniform.qbk]
|
|
[include weibull.qbk]
|
|
[endsect] [/section:dists Distributions]
|
|
|
|
[include dist_algorithms.qbk]
|
|
|
|
[endsect] [/section:dist_ref Statistical Distributions and Functions Reference]
|
|
|
|
|
|
[section:future Extras/Future Directions]
|
|
|
|
[h4 Adding Additional Location and Scale Parameters]
|
|
|
|
In some modelling applications we require a distribution
|
|
with a specific location and scale:
|
|
often this equates to a specific mean and standard deviation, although for many
|
|
distributions the relationship between these properties and the location and
|
|
scale parameters are non-trivial. See
|
|
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda364.htm http://www.itl.nist.gov/div898/handbook/eda/section3/eda364.htm]
|
|
for more information.
|
|
|
|
The obvious way to handle this is via an adapter template:
|
|
|
|
template <class Dist>
|
|
class scaled_distribution
|
|
{
|
|
scaled_distribution(
|
|
const Dist dist,
|
|
typename Dist::value_type location,
|
|
typename Dist::value_type scale = 0);
|
|
};
|
|
|
|
Which would then have its own set of overloads for the non-member accessor functions.
|
|
|
|
[h4 An "any_distribution" class]
|
|
|
|
It is easy to add a distribution object that virtualises
|
|
the actual type of the distribution, and can therefore hold "any" object
|
|
that conforms to the conceptual requirements of a distribution:
|
|
|
|
template <class RealType>
|
|
class any_distribution
|
|
{
|
|
public:
|
|
template <class Distribution>
|
|
any_distribution(const Distribution& d);
|
|
};
|
|
|
|
// Get the cdf of the underlying distribution:
|
|
template <class RealType>
|
|
RealType cdf(const any_distribution<RealType>& d, RealType x);
|
|
// etc....
|
|
|
|
Such a class would facilitate the writing of non-template code that can
|
|
function with any distribution type.
|
|
|
|
The [@http://sourceforge.net/projects/distexplorer/ Statistical Distribution Explorer]
|
|
utility for Windows is a usage example.
|
|
|
|
It's not clear yet whether there is a compelling use case though.
|
|
Possibly tests for goodness of fit might
|
|
provide such a use case: this needs more investigation.
|
|
|
|
[h4 Higher Level Hypothesis Tests]
|
|
|
|
Higher-level tests roughly corresponding to the
|
|
[@http://documents.wolfram.com/mathematica/Add-onsLinks/StandardPackages/Statistics/HypothesisTests.html Mathematica Hypothesis Tests]
|
|
package could be added reasonably easily, for example:
|
|
|
|
template <class InputIterator>
|
|
typename std::iterator_traits<InputIterator>::value_type
|
|
test_equal_mean(
|
|
InputIterator a,
|
|
InputIterator b,
|
|
typename std::iterator_traits<InputIterator>::value_type expected_mean);
|
|
|
|
Returns the probability that the data in the sequence \[a,b) has the mean
|
|
/expected_mean/.
|
|
|
|
[h4 Integration With Statistical Accumulators]
|
|
|
|
[@http://boost-sandbox.sourceforge.net/libs/accumulators/doc/html/index.html
|
|
Eric Niebler's accumulator framework] - also work in progress - provides the means
|
|
to calculate various statistical properties from experimental data. There is an
|
|
opportunity to integrate the statistical tests with this framework at some later date:
|
|
|
|
// Define an accumulator, all required statistics to calculate the test
|
|
// are calculated automatically:
|
|
accumulator_set<double, features<tag::test_expected_mean> > acc(expected_mean=4);
|
|
// Pass our data to the accumulator:
|
|
acc = std::for_each(mydata.begin(), mydata.end(), acc);
|
|
// Extract the result:
|
|
double p = probability(acc);
|
|
|
|
[endsect] [/section:future Extras Future Directions]
|
|
|
|
[/ dist_reference.qbk
|
|
Copyright 2006, 2010 John Maddock and Paul A. Bristow.
|
|
Distributed under the Boost Software License, Version 1.0.
|
|
(See accompanying file LICENSE_1_0.txt or copy at
|
|
http://www.boost.org/LICENSE_1_0.txt).
|
|
]
|