2
0
mirror of https://github.com/boostorg/math.git synced 2026-01-26 06:42:12 +00:00
Files
math/doc/dist_tutorial.qbk
2006-09-21 14:24:57 +00:00

1176 lines
48 KiB
Plaintext

[def __alpha '''α''']
[/ def names end in distrib to avoid clashes]
[def __binomial_distrib [link math_toolkit.dist.dist_ref.dists.binomial_dist Binomial Distribution]]
[def __chi_squared_distrib [link math_toolkit.dist.dist_ref.dists.chi_squared_dist Chi Squared Distribution]]
[def __normal_distrib [link math_toolkit.dist.dist_ref.dists.normal_dist Normal Distribution]]
[def __F_distrib [link math_toolkit.dist.dist_ref.dists.f_dist Fisher F Distribution]]
[def __students_t_distrib [link math_toolkit.dist.dist_ref.dists.students_t_dist Students t Distribution]]
[section:stat_tut Statistical Functions Tutorial]
This library is centred around statistical distributions, this tutorial
will give you an overview of what they are, how they can be used, and
provides a few worked examples of applying the library to statistical tests.
[section:overview Overview]
[h4 Headers and Namespaces]
All the code in this library is inside namespace boost::math.
In order to use a distribution /my_distribution/ you will need to include
the header <boost/math/distributions/my_distribution.hpp>.
For example, to use the Students-t distribution include
<boost/math/distributions/students_t.hpp>
[h4 Distributions are Objects]
Each kind of distribution in this library is a class type: this does two
things:
* It encapsulates the kind of distribution in the C++ type system;
so, for example, Students-t distributions are always a different C++ type from
Chi-Squared distributions.
* The distribution objects store any parameters associated with the
distribution: for example, the Students-t distribution has a
['degrees of freedom] parameter that controls the shape of the distribution.
This ['degrees of freedom] parameter has to be provided
to the Students-t object when it is constructed.
Although the distribution classes in this library are templates, there
are typedefs on type /double/ that mostly take the usual name of the
distribution. Probably 95% of uses are covered by these typedefs:
using namespace boost::math;
// Construct a students_t distribution with 4 degrees of freedom:
students_t d1(4);
// Construct a binomial distribution
// with probability of success 0.3
// and 20 trials in total:
binomial d2(20, 0.3);
If you need to use the distributions with a type other than `double`,
then you can instantiate the template directly: the names of the
templates are the same as the `double` typedef but with `_distribution`
appended, for example: __students_t_distrib or __binomial_distrib:
// Construct a students_t distribution, of float type,
// with 4 degrees of freedom:
students_t_distribution<float> d3(4);
// Construct a binomial distribution, of long double type,
// with probability of success 0.3
// and 20 trials in total:
binomial_distribution<long double> d4(20, 0.3);
The parameters passed to the distributions can be accessed via getter member
functions:
d1.degrees_of_freedom(); // returns 4.0
This is all well and good, but not very useful so far. What we often want
is to be able to calculate the /cumulative distribution functions/ and
/quantiles/ etc for these distributions.
[h4 Generic operations common to all distributions are non-member functions]
Want to calculate the PDF (Probability Density Function) of a distribution?
No problem, just use:
pdf(my_dist, x); // Returns PDF (density) at point x of distribution my_dist.
Or how about the CDF (Cumulative Distribution Function):
cdf(my_dist, x); // Returns CDF (integral from -infinity to point x)
// of distribution my_dist.
And quantiles are just the same:
quantile(my_dist, p); // Returns the value of the random variable x
// such that cdf(my_dist, x) == p.
If you're wondering why these aren't member functions, it's to
make the library more easily extensible: if you want to add additional
generic operations - let's say the /n'th moment/ - then all you have to
do is add the appropriate non-member functions, overloaded for each
supported distribution type.
[h4 Complements are supported too]
Often you don't want the value of the CDF, but its complement, which is
to say `1-p` rather than `p`. You could calculate the CDF and subtract
it from `1`, but if `p` is very close to `1` then cancellation error
will cause you to loose significant digits. In extreme cases, `p` may
actually be equal to `1`, even though the true value of the complement is non-zero.
In this library, whenever you want to receive a complement, just wrap
all the function arguments in a call to `complement(...)`, for example:
students_t dist(5);
cout << "CDF at t = 1 is " << cdf(dist, 1.0) << endl;
cout << "Complement of CDF at t = 1 is " << cdf(complement(dist, 1.0)) << endl;
But wait, now that we have a complement, we have to be able to use it as well.
Any function that accepts a probability as an argument can also accept a complement
by wrapping all of its arguments in a call to `complement(...)`, for example:
students_t dist(5);
for(double i = 10; i < 1e10; i *= 10)
{
// Calculate the quantile for a 1 in i chance:
double t = quantile(complement(dist, 1/i));
// Print it out:
cout << "Quantile of students-t with 5 degrees of freedom\n"
"for a 1 in " << i << " chance is " << t << endl;
}
[h4 Parameters can be estimated]
Sometimes it's the parameters that define the distribution that you
need to find. Suppose, for example, you have conducted a Students-t test
for equal means and the result is borderline. Maybe your two samples
differ from each other, or maybe they don't; based on the result
of the test you can't be sure. A legitimate question to ask then is
"How many more measurements would I have to take before I would get
an X% probability that the difference is real?" Parameter estimators
can answer questions like this, and are necessarily different for
each distribution. They are implemented as static member functions
of the distributions, for example:
students_t::estimate_degrees_of_freedom(
5.0, // true mean
6.3, // sample mean
0.13, // sample standard deviation
0.95); // probability
Returns the number of degrees of freedom required to obtain a 95%
probability that the observed differences in means is not down to
chance alone. In the case that a borderline Students-t test result
was previously obtained, this can be used to estimate how large the sample size
would have to become before the observed difference was considered
significant. It assumes, of course, that the sample mean and standard
deviation are invariant with sample size.
[h4 Summary]
* Distributions are objects, which are constructed from whatever
parameters the distribution may have.
* Member functions allow you to retrieve the parameters of a distribution.
* Generic non-member functions provide access to the properties that
are common to all the distributions (PDF, CDF, quantile etc).
* Complements of probabilities are calculated by wrapping the function's
arguments in a call to `complement(...)`.
* Functions that accept a probability can accept a complement of the
probability as well, by wrapping the function's
arguments in a call to `complement(...)`.
* Static member functions allow the parameters of a distribution
to be estimated from other information.
Now that you have the basics, the next section looks at some worked examples.
[endsect][/section:stat_tut Statistical Functions Tutorial]
[section:weg Worked Examples]
[section:st_eg Student's t]
* [link tut_mean_intervals Calculating confidence intervals on the mean with the Students-t distribution.]
* [link tut_mean_test Testing a sample mean for systematic difference, by comparison to a "true" mean.]
* [link tut_mean_size Estimating how large a sample size would have to become
in order to give a significant Students-t test result with a single sample test.]
* [link two_sample_students_t Comparing the means of two samples with the Students-t test.]
* [link paired_st Comparing two paired samples with the Student's t distribution.]
[h4 [#tut_mean_intervals]Calculating confidence intervals on the mean with the Students-t distribution]
Let's say you have a sample mean, you may wish to know what confidence intervals
you can place on that mean. Colloquially: "I want an interval that I can be
P% sure contains the true mean". (On a technical point, note that
the interval either contains the true mean or it does not: the
meaning of the confidence level is subtly
different from this colloquialism. More background information can be found
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm on the NIST site]).
The formula for the interval can be expressed as:
[$../equations/dist_tutorial4.png]
Where, ['Y[sub s]] is the sample mean, /s/ is the sample standard deviation,
/N/ is the sample size, /__alpha/ is the desired significance level and
['t[sub (__alpha/2,N-1)]] is the upper critical value of the Students-t
distribution with /N-1/ degrees of freedom.
Confidence level (value or coefficient) is defined as 1 - __alpha (often as a percent)
[@http://www.itl.nist.gov/div898/handbook/pmd/section5/pmd511.htm on the NIST site].
From the formula it should be clear that:
* The width of the confidence interval decreases as the sample size increases.
* The width increases as the standard deviation increases.
* The width increases as the confidence level increases (0.5 towards 0.99999 - stronger).
The following example code is taken from the example program
[@../../example/students_t_single_sample.cpp students_t_single_sample.cpp].
We'll begin by defining a procedure to calculate intervals for
various confidence levels; the procedure will print these out
as a table:
// Needed includes:
#include <boost/math/distributions/students_t.hpp>
#include <iostream>
#include <iomanip>
// Bring everything into global namespace for ease of use:
using namespace boost::math;
using namespace std;
void confidence_limits_on_mean(
double Sm, // Sm = Sample Mean.
double Sd, // Sd = Sample Standard Deviation.
unsigned Sn) // Sn = Sample Size.
{
using namespace std;
using namespace boost::math;
// Print out general info:
cout <<
"__________________________________\n"
"2-Sided Confidence Limits For Mean\n"
"__________________________________\n\n";
cout << setprecision(7);
cout << setw(40) << left << "Number of Observations" << "= " << Sn << "\n";
cout << setw(40) << left << "Mean" << "= " << Sm << "\n";
cout << setw(40) << left << "Standard Deviation" << "= " << Sd << "\n";
We'll define a table of confidence levels for which we'll compute intervals:
double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
Note that these are the complements of the probability levels: 0.5, 0.75, 0.9 .. 0.99999).
Next we'll declare the distribution object we'll need, note that
the /degrees of freedom/ parameter is the sample size less one:
students_t dist(Sn - 1);
Most of what follows in the program is pretty printing, so let's focus
on the calculation of the interval. First we need the t-statistic,
computed using the /quantile/ function and our confidence level. Note
that since the confidence levels are the complement of the probability,
we have to wrap the arguments in a call to /complement(...)/:
double T = quantile(complement(dist, alpha[i] / 2));
Note that alpha was divided by two, since we'll be calculating
both the upper and lower bounds: had we been interested in a single
sided interval then we would have omitted this step.
Now complete the picture, and get the (one-sided) width of the
interval from the t-statistic
by multiplying by the standard deviation, and dividing by the square
root of the sample size:
double w = T * Sd / sqrt(double(Sn));
The two-sided interval is then the sample mean plus and minus this width.
And apart from some more pretty-printing that completes the procedure.
Let's take a look at some sample output, first using the
[@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm
Heat flow data] from the NIST site. The data set was collected
by Bob Zarr of NIST in January, 1990 from a heat flow meter
calibration and stability analysis.
[pre'''
__________________________________
2-Sided Confidence Limits For Mean
__________________________________
Number of Observations = 195
Mean = 9.26146
Standard Deviation = 0.02278881
___________________________________________________________________
Confidence T Interval Lower Upper
Value (%) Value Width Limit Limit
___________________________________________________________________
50.000 0.676 1.103e-003 9.26036 9.26256
75.000 1.154 1.883e-003 9.25958 9.26334
90.000 1.653 2.697e-003 9.25876 9.26416
95.000 1.972 3.219e-003 9.25824 9.26468
99.000 2.601 4.245e-003 9.25721 9.26571
99.900 3.341 5.453e-003 9.25601 9.26691
99.990 3.973 6.484e-003 9.25498 9.26794
99.999 4.537 7.404e-003 9.25406 9.26886
''']
As you can see the large sample size (195) and small standard deviation (0.023)
have combined to give very small intervals, indeed we can be
very confident that the true mean is 9.2.
For comparison the next example data output is taken from
['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64.
and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55
J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.]
The values result from the determination of mercury by cold-vapour
atomic absorption.
[pre'''
__________________________________
2-Sided Confidence Limits For Mean
__________________________________
Number of Observations = 3
Mean = 37.8000000
Standard Deviation = 0.9643650
___________________________________________________________________
Confidence T Interval Lower Upper
Value (%) Value Width Limit Limit
___________________________________________________________________
50.000 0.816 0.455 37.34539 38.25461
75.000 1.604 0.893 36.90717 38.69283
90.000 2.920 1.626 36.17422 39.42578
95.000 4.303 2.396 35.40438 40.19562
99.000 9.925 5.526 32.27408 43.32592
99.900 31.599 17.594 20.20639 55.39361
99.990 99.992 55.673 -17.87346 93.47346
99.999 316.225 176.067 -138.26683 213.86683
''']
This time the fact that there are only three measurements leads to
much wider intervals, indeed such large intervals that it's hard
to be very confident in the location of the mean.
[h4 [#tut_mean_test]Testing a sample mean for difference from a "true" mean]
When calibrating or comparing a scientific instrument or measurement method of some kind,
we want to be answer the question "Does an observed sample mean differ from the
"true" mean in any significant way?". If it does, then we have evidence of
a systematic difference. This question can be answered with a Students-t test:
more information can be found
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda352.htm
on the NIST site].
(Of course, the assignment of "true" to one mean may be quite arbitrary,
often a "traditional" method of measurement).
The following example code is taken from the example program
[@../../example/students_t_single_sample.cpp students_t_single_sample.cpp].
We'll begin by defining a procedure to determine which of the
possible hypothesis are accepted or rejected at a given confidence level:
// Needed includes:
#include <boost/math/distributions/students_t.hpp>
#include <iostream>
#include <iomanip>
// Bring everything into global namespace for ease of use:
using namespace boost::math;
using namespace std;
void single_sample_t_test(double M, double Sm, double Sd, unsigned Sn, double alpha)
{
//
// M = true mean.
// Sm = Sample Mean.
// Sd = Sample Standard Deviation.
// Sn = Sample Size.
// alpha = Confidence Level.
Most of the procedure is pretty-printing, so let's just focus on the
calculation, we begin by calculating the t-statistic:
// Difference in means:
double diff = Sm - M;
// Degrees of freedom:
unsigned v = Sn - 1;
// t-statistic:
double t_stat = diff * sqrt(double(Sn)) / Sd;
Finally calculate the probability from the t-statistic. If we're interested
in simply whether there is a difference (either less or greater) or not,
we don't care about the sign of the t-statistic,
and we take the complement of the probability for comparison
to the significance level:
students_t dist(v);
double q = cdf(complement(dist, fabs(t_stat)));
The procedure then prints out the results of the various tests
that can be done, these
can be summarised in the following table:
[table
[[Hypothesis][Test][Code]]
[[The Null-hypothesis: there is
*no difference* in means] [complement of CDF for |t| > confidence level / 2]
[``cdf(complement(dist, fabs(t))) > alpha / 2``]]
[[The Alternative-hypothesis: there is
*difference* in means] [complement of CDF for |t| < confidence level / 2]
[``cdf(complement(dist, fabs(t))) < alpha / 2``]]
[[The Alternative-hypothesis: the sample mean is *less* than
the true mean.] [CDF of t < confidence level]
[``cdf(dist, t) < alpha``]]
[[The Alternative-hypothesis: the sample mean is *greater* than
the true mean.] [Complement of CDF of t < confidence level]
[``cdf(complement(dist, t)) < alpha``]]
]
[note
Notice that the comparisons are against `alpha / 2` for a two-sided test
and against `alpha` for a one-sided test]
Now that we have all the parts in place let's take a look at some
sample output, first using the
[@http://www.itl.nist.gov/div898/handbook/eda/section4/eda428.htm
Heat flow data] from the NIST site. The data set was collected
by Bob Zarr of NIST in January, 1990 from a heat flow meter
calibration and stability analysis.
[pre'''
__________________________________
Student t test for a single sample
__________________________________
Number of Observations = 195
Sample Mean = 9.26146
Sample Standard Deviation = 0.02279
Expected True Mean = 5.00000
Sample Mean - Expected Test Mean = 4.26146
Degrees of Freedom = 194
T Statistic = 2611.28380
Probability that difference is due to chance = 0.000e+000
Results for Alternative Hypothesis and alpha = 0.0500'''
Alternative Hypothesis Conclusion
Mean != 5.000 ACCEPTED
Mean < 5.000 REJECTED
Mean > 5.000 ACCEPTED
]
You will note the line that says the probability that the difference is
due to chance is zero. From a philosophical point of view, of course,
the probability can never reach zero. However, in this case the calculated
probability is smaller than the smallest representable double precision number,
hence the appearance of a zero here. Whatever its "true" value is, we know it
must be extraordinarily small, so the alternative hypothesis - that there is
a difference in means - is accepted.
For comparison the next example data output is taken from
['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64.
and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55
J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.]
The values result from the determination of mercury by cold-vapour
atomic absorption.
[pre'''
__________________________________
Student t test for a single sample
__________________________________
Number of Observations = 3
Sample Mean = 37.80000
Sample Standard Deviation = 0.96437
Expected True Mean = 38.90000
Sample Mean - Expected Test Mean = -1.10000
Degrees of Freedom = 2
T Statistic = -1.97566
Probability that difference is due to chance = 9.343e-002
Results for Alternative Hypothesis and alpha = 0.0500'''
Alternative Hypothesis Conclusion
Mean != 38.900 REJECTED
Mean < 38.900 REJECTED
Mean > 38.900 REJECTED
]
As you can see the small number of measurements (3) has led to a large uncertainty
in the location of the true mean. So even though there appears to be a difference
between the sample mean and the expected true mean, we conclude that there
is no significant difference, and accept the null hypothesis. However, if we
were to lower the bar for acceptance down to alpha = 0.1 (90% confidence level)
we see a different output:
[pre'''
__________________________________
Student t test for a single sample
__________________________________
Number of Observations = 3
Sample Mean = 37.80000
Sample Standard Deviation = 0.96437
Expected True Mean = 38.90000
Sample Mean - Expected Test Mean = -1.10000
Degrees of Freedom = 2
T Statistic = -1.97566
Probability that difference is due to chance = 9.343e-002
Results for Alternative Hypothesis and alpha = 0.1000'''
Alternative Hypothesis Conclusion
Mean != 38.900 REJECTED
Mean < 38.900 ACCEPTED
Mean > 38.900 REJECTED
]
In this case we really have a borderline result, and more data should
be collected.
[h4 [#tut_mean_size]Estimating how large a sample size would have to become
in order to give a significant Students-t test result with a single sample test]
Imagine you have conducted a Students-t test on a single sample in order
to check for systematic errors in your measurements. Imagine that the
result is borderline. At this point one might go off and collect more data,
but it might be prudent to first ask the question "How much more?".
The parameter estimators of the students_t_distribution class
can provide this information.
This section is based on the example code in
[@../../example/students_t_single_sample.cpp students_t_single_sample.cpp]
and we begin by defining a procedure that will print out a table of
estimated sample sizes for various confidence levels:
// Needed includes:
#include <boost/math/distributions/students_t.hpp>
#include <iostream>
#include <iomanip>
// Bring everything into global namespace for ease of use:
using namespace boost::math;
using namespace std;
void single_sample_estimate_df(
double M, // M = true mean.
double Sm, // Sm = Sample Mean.
double Sd) // Sd = Sample Standard Deviation.
{
Next we define a table of confidence levels:
double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
Printing out the table of sample sizes required for various confidence levels
begins with the table header:
cout << "\n\n"
"_______________________________________________________________\n"
"Confidence Estimated Estimated\n"
" Value (%) Sample Size Sample Size\n"
" (one sided test) (two sided test)\n"
"_______________________________________________________________\n";
And now the important part: the sample sizes required. Class
`students_t_distribution` has a static member function
`estimate_degrees_of_freedom` that will calculate how large
a sample size needs to be in order to give a definitive result.
The first argument is the difference between the means that you
wish to be able to detect, here it's the absolute value of the
difference between the sample mean, and the true mean.
Then come two probability values: alpha and beta. Alpha is the
maximum acceptable risk of rejecting the null-hypothesis when it is
in fact true. Beta is the maximum acceptable risk of accepting
the null-hypothesis when in fact it is false.
Also note that for a two-sided test, alpha must be divided by 2.
The final parameter of the function is the standard deviation of the sample.
In this example, we assume that alpha and beta are the same, and call
`estimate_degrees_of_freedom` twice: once with alpha for a one-sided test,
and once with alpha/2 for a two-sided test.
for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
{
// Confidence value:
cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
// calculate df for single sided test:
double df = students_t::estimate_degrees_of_freedom(
fabs(M - Sm), alpha[i], alpha[i], Sd);
// convert to sample size:
double size = ceil(df) + 1;
// Print size:
cout << fixed << setprecision(0) << setw(16) << right << size;
// calculate df for two sided test:
df = students_t::estimate_degrees_of_freedom(
fabs(M - Sm), alpha[i]/2, alpha[i], Sd);
// convert to sample size:
size = ceil(df) + 1;
// Print size:
cout << fixed << setprecision(0) << setw(16) << right << size << endl;
}
cout << endl;
}
Let's now look at some sample output using data taken from
['P.K.Hou, O. W. Lau & M.C. Wong, Analyst (1983) vol. 108, p 64.
and from Statistics for Analytical Chemistry, 3rd ed. (1994), pp 54-55
J. C. Miller and J. N. Miller, Ellis Horwood ISBN 0 13 0309907.]
The values result from the determination of mercury by cold-vapour
atomic absorption.
Only three measurements were made, and the Students-t test above
gave a borderline result, so this example
will show us how many samples would need to be collected:
[pre'''
_____________________________________________________________
Estimated sample sizes required for various confidence levels
_____________________________________________________________
True Mean = 38.90000
Sample Mean = 37.80000
Sample Standard Deviation = 0.96437
_______________________________________________________________
Confidence Estimated Estimated
Value (%) Sample Size Sample Size
(one sided test) (two sided test)
_______________________________________________________________
50.000 2 3
75.000 4 5
90.000 8 10
95.000 12 14
99.000 21 23
99.900 36 38
99.990 51 54
99.999 67 69
''']
So in this case, many more measurement would have had to be made,
for example at the 95% level, 14 measurements in total for a two-sided test.
[h4 [#two_sample_students_t]Comparing the means of two samples with the Students-t test]
Imagine that we have two samples, and we wish to determine whether
their means are different or not. This situation often arises when
determining whether a new process or treatment is better than an old one.
In this example, we'll be using the
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda3531.htm
Car Mileage sample data] from the
[@http://www.itl.nist.gov NIST website]. The data compares
miles per gallon of US cars with miles per gallon of Japanese cars.
The sample code is in
[@../../example/students_t_two_samples.cpp students_t_two_samples.cpp].
There are two ways in which this test can be conducted: we can assume
that the true standard deviations of the two samples are equal or not.
If the standard deviations are assumed to be equal, then the calculation
of the t-statistic is greatly simplified, so we'll examine that case first.
In real life we should verify whether this assumption is valid with a
Chi-Squared test for equal variances.
We begin by defining a procedure that will conduct our test assuming equal
variances:
// Needed headers:
#include <boost/math/distributions/students_t.hpp>
#include <iostream>
#include <iomanip>
// Simplify usage:
using namespace boost::math;
using namespace std;
void two_samples_t_test_equal_sd(
double Sm1, // Sm1 = Sample 1 Mean.
double Sd1, // Sd1 = Sample 1 Standard Deviation.
unsigned Sn1, // Sn1 = Sample 1 Size.
double Sm2, // Sm2 = Sample 2 Mean.
double Sd2, // Sd2 = Sample 2 Standard Deviation.
unsigned Sn2, // Sn2 = Sample 2 Size.
double alpha) // alpha = Confidence Level.
{
Our procedure will begin by calculating the t-statistic, assuming
equal variances the needed formualas are:
[$../equations/dist_tutorial1.png]
where Sp is the "pooled" standard deviation of the two samples,
and /v/ is the number of degrees of freedom of the two combined
samples. We can now write the code to calculate the t-statistic:
// Degrees of freedom:
double v = Sn1 + Sn2 - 2;
cout << setw(55) << left << "Degrees of Freedom" << "= " << v << "\n";
// Pooled variance:
double sp = sqrt(((Sn1-1) * Sd1 * Sd1 + (Sn2-1) * Sd2 * Sd2) / v);
cout << setw(55) << left << "Pooled Standard Deviation" << "= " << v << "\n";
// t-statistic:
double t_stat = (Sm1 - Sm2) / (sp * sqrt(1.0 / Sn1 + 1.0 / Sn2));
cout << setw(55) << left << "T Statistic" << "= " << t_stat << "\n";
The next step is to define our distribution object, and calculate the
complement of the probability:
students_t dist(v);
double q = cdf(complement(dist, fabs(t_stat)));
cout << setw(55) << left << "Probability that difference is due to chance" << "= "
<< setprecision(3) << scientific << q << "\n\n";
Here we've used the absolute value of the t-statistic, because we initially
want to know simply whether there is a difference or not (a two-sided test).
However, we can also test whether the mean of the second sample is greater
or less than that of the first: all the possible tests are summed up
in the following table:
[table
[[Hypothesis][Test][C++ Code]]
[[The Null-hypothesis: there is
*no difference* in means] [complement of CDF for |t| > confidence level / 2]
[``cdf(complement(dist, fabs(t))) > alpha / 2``]]
[[The Alternative-hypothesis: there is a
*difference* in means] [complement of CDF for |t| < confidence level / 2]
[``cdf(complement(dist, fabs(t))) < alpha / 2``]]
[[The Alternative-hypothesis: Sample 1 Mean is *less* than
Sample 2 Mean.] [CDF of t < confidence level]
[``cdf(dist, t) < alpha``]]
[[The Alternative-hypothesis: Sample 1 Mean is *greater* than
Sample 2 Mean.] [Complement of CDF of t < confidence level]
[``cdf(complement(dist, t)) < alpha``]]
]
[note
For a two-sided test we must compare against alpha / 2 and not alpha.]
Most of the rest of the sample program is pretty-printing, so we'll
skip over that, and take a look at the sample output for alpha=0.05
(95% probability level).
[pre'''
________________________________________________
Student t test for two samples (equal variances)
________________________________________________
Number of Observations (Sample 1) = 249
Sample 1 Mean = 20.14458
Sample 1 Standard Deviation = 6.41470
Number of Observations (Sample 2) = 79
Sample 2 Mean = 30.48101
Sample 2 Standard Deviation = 6.10771
Degrees of Freedom = 326.00000
Pooled Standard Deviation = 326.00000
T Statistic = -12.62059
Probability that difference is due to chance = 2.637e-030
Results for Alternative Hypothesis and alpha = 0.0500'''
Alternative Hypothesis Conclusion
Sample 1 Mean != Sample 2 Mean ACCEPTED
Sample 1 Mean < Sample 2 Mean ACCEPTED
Sample 1 Mean > Sample 2 Mean REJECTED
]
So with a probability that the difference is due to chance of just
2.637e-030, we can safely conclude that there is indeed a difference.
The tests on the alternative hypothesis show that the Sample 1 Mean is
greater than that for Sample 2: in this case Sample 1 represents the
miles per gallon for US cars, and Sample 2 the miles per gallon for
Japanese cars, so we conclude that Japanese cars are on average more
fuel efficient.
Now that we have the simple case out of the way, let's look for a moment
at the more complex one: that the standard deviations of the two samples
are not equal. In this case the formula for the t-statistic becomes:
[$../equations/dist_tutorial2.png]
And for the combined degrees of freedom we use the
[@http://en.wikipedia.org/wiki/Welch-Satterthwaite_equation Welch-Satterthwaite]
approximation:
[$../equations/dist_tutorial3.png]
Note that this is one of the rare situations where the degrees-of-freedom
parameter to the Student's t distribution is a real number, and not an
integer value.
[note
Some statistical packages truncate the effective degrees of freedom to
an integer value: this may be necessary if you are relying on lookup tables,
but since our code fully supports non-integer degrees of freedom there is no
need to truncate in this case. Also note that when the degrees of freedom
is small then the Welch-Satterthwaite approximation may be a significant
source of error.]
Putting these formulae into code we get:
// Degrees of freedom:
double v = Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2;
v *= v;
double t1 = Sd1 * Sd1 / Sn1;
t1 *= t1;
t1 /= (Sn1 - 1);
double t2 = Sd2 * Sd2 / Sn2;
t2 *= t2;
t2 /= (Sn2 - 1);
v /= (t1 + t2);
cout << setw(55) << left << "Degrees of Freedom" << "= " << v << "\n";
// t-statistic:
double t_stat = (Sm1 - Sm2) / sqrt(Sd1 * Sd1 / Sn1 + Sd2 * Sd2 / Sn2);
cout << setw(55) << left << "T Statistic" << "= " << t_stat << "\n";
Thereafter the code and the tests are performed the same as before. Using
are car mileage data again, here's what the output looks like:
[pre'''
__________________________________________________
Student t test for two samples (unequal variances)
__________________________________________________
Number of Observations (Sample 1) = 249
Sample 1 Mean = 20.145
Sample 1 Standard Deviation = 6.4147
Number of Observations (Sample 2) = 79
Sample 2 Mean = 30.481
Sample 2 Standard Deviation = 6.1077
Degrees of Freedom = 136.87
T Statistic = -12.946
Probability that difference is due to chance = 7.855e-026
Results for Alternative Hypothesis and alpha = 0.0500'''
Alternative Hypothesis Conclusion
Sample 1 Mean != Sample 2 Mean ACCEPTED
Sample 1 Mean < Sample 2 Mean ACCEPTED
Sample 1 Mean > Sample 2 Mean REJECTED
]
This time allowing the variances in the two samples to differ has yielded
a higher likelihood that the observed difference is down to chance alone
(7.855e-026 compared to 2.637e-030 when equal variances were assumed).
However, the conclusion remains the same: US cars are less fuel efficient
than Japanese models.
[h4 [#paired_st]Comparing two paired samples with the Student's t distribution]
Imagine that we have a before and after reading for each item in the sample:
for example we might have measured blood pressure before and after administration
of a new drug. We can't pool the results and compare the means before and after
the change, because each patient will have a different baseline reading.
Instead we calculate the difference between before and after measurements
in each patient, and calculate the mean and standard deviation of the differences.
To test whether a significant change has taken place, we can then test
the null-hypothesis that the true mean is zero using the same procedure
we used in the single sample cases previously discussed.
That means we can:
* [link tut_mean_intervals Calculate confidence intervals of the mean].
If the endpoints of the interval differ in sign then we must accept
the null-hypothesis that there is no change.
* [link tut_mean_test Test whether the true mean is zero]. If the
result is consistent with a true mean of zero, then we must accept the
null-hypothesis that there is no change.
* [link tut_mean_size Calculate how many pairs of readings we would need
in order to obtain a significant result].
[endsect]
[section:cs_eg Chi Squared Distribution]
TODO
[endsect]
[section:norm_eg Normal Distribution]
TODO
[endsect]
[section:binom_eg Binomial Distribution]
[/ link binomial.binomial_distribution in dist_references.qbk is __binomial_distrib ]
(See __binomial_distrib for reference info.)
* [link binom_conf Calculating Confidence Limits on the Frequency of Occurrence]
* [link binom_size_eg Estimating Sample Sizes]
[h4 [#binom_conf]Calculating Confidence Limits on the Frequency of Occurrence]
Imagine you have a process that follows a binomial distribution: for each
trial conducted, an event either occurs or does it does not, referred
to as "successes" and "failures". If, by experiment, you want to measure the
frequency with which successes occur, the best estimate is given simply
by /k/ \/ /N/, for /k/ successes out of /N/ trials. However our confidence in that
estimate will be shaped by how many trials were conducted, and how many successes
were observed. The static member functions
`binomial_distribution<>::estimate_lower_bound_on_p` and
`binomial_distribution<>::estimate_upper_bound_on_p` allow you to calculate
the confidence intervals for your estimate of the occurrence frequency.
The sample program [@../../example/binomial_confidence_limits.cpp
binomial_confidence_limits.cpp] illustrates their use. It begins by defining
a procedure that will print a table of confidence limits for various degrees
of certainty:
#include <iostream>
#include <iomanip>
#include <boost/math/distributions/binomial.hpp>
void confidence_limits_on_frequency(unsigned trials, unsigned successes)
{
//
// trials = Total number of trials.
// successes = Total number of observed successes.
//
// Calculate confidence limits for an observed
// frequency of occurrence that follows a binomial
// distribution.
//
using namespace std;
using namespace boost::math;
// Print out general info:
cout <<
"___________________________________________\n"
"2-Sided Confidence Limits For Success Ratio\n"
"___________________________________________\n\n";
cout << setprecision(7);
cout << setw(40) << left << "Number of Observations" << "= " << trials << "\n";
cout << setw(40) << left << "Number of successes" << "= " << successes << "\n";
cout << setw(40) << left << "Sample frequency of occurance" << "= " << double(successes) / trials << "\n";
The procedure now defines a table of confidence values: these are the
probabilities that the true occurrence frequency lies outside the calculated
interval:
double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
Some pretty printing of the table header follows:
cout << "\n\n"
"___________________________________________\n"
"Confidence Lower Upper\n"
" Value (%) Limit Limit\n"
"___________________________________________\n";
And now for the important part - the intervals themselves - for each
value of /alpha/, we call `estimate_lower_bound_on_p` and
`estimate_lower_upper_on_p` to obtain lower and upper bounds
respectively. Note that since we are calculating a two-sided interval,
we must divide the value of alpha in two. Had we been calculating a
single-sided interval, for example: ['"Calculate a lower bound so that we are P%
sure that the true occurrence frequency is greater than some value"]
then we would *not* have divided by two.
for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
{
// Confidence value:
cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
// calculate bounds:
double l = binomial::estimate_lower_bound_on_p(trials, successes, alpha[i]/2);
double u = binomial::estimate_upper_bound_on_p(trials, successes, alpha[i]/2);
// Print Limits:
cout << fixed << setprecision(5) << setw(15) << right << l;
cout << fixed << setprecision(5) << setw(15) << right << u << endl;
}
cout << endl;
}
And that's all there is to it. Let's see some sample output for a 1 in 10
success ratio, first for 20 trials:
[pre'''___________________________________________
2-Sided Confidence Limits For Success Ratio
___________________________________________
Number of Observations = 20
Number of successes = 2
Sample frequency of occurrence = 0.1
___________________________________________
Confidence Lower Upper
Value (%) Limit Limit
___________________________________________
50.000 0.08701 0.18675
75.000 0.06229 0.23163
90.000 0.04217 0.28262
95.000 0.03207 0.31698
99.000 0.01764 0.38713
99.900 0.00786 0.47093
99.990 0.00358 0.54084
99.999 0.00165 0.60020
''']
As you can see, even at the 95% confidence level the bounds are
really quite wide. Compare that with the program output for
2000 trials:
[pre'''___________________________________________
2-Sided Confidence Limits For Success Ratio
___________________________________________
Number of Observations = 2000
Number of successes = 200
Sample frequency of occurrence = 0.1000000
___________________________________________
Confidence Lower Upper
Value (%) Limit Limit
___________________________________________
50.000 0.09585 0.10491
75.000 0.09277 0.10822
90.000 0.08963 0.11172
95.000 0.08767 0.11399
99.000 0.08390 0.11850
99.900 0.07966 0.12385
99.990 0.07621 0.12845
99.999 0.07325 0.13256
''']
Now even when the confidence level is very high, the limits are really
quite close to the experimentally calculated value of 0.1.
[h4 [#binom_size_eg]Estimating Sample Sizes.]
Imagine you have a critical component that you know will fail in 1 in
N "uses" (for some suitable definition of "use"). You may want to schedule
routine replacement of the component so that its chance of failure between
routine replacements is less than P%. If the failures follow a binomial
distribution (each time the component is "used" it either fails or does not)
then the static member function `binomial_distibution<>::estimate_number_of_trials`
can be used to estimate the maximum number of "uses" of that component for some
probability P.
The example program
[@../../example/binomial_sample_sizes.cpp binomial_sample_sizes.cpp]
demonstrates its usage. It centres around a routine that prints out
a table of maximum sample sizes for various probability thresholds:
void estimate_max_sample_size(
double p, // success ratio.
unsigned successes) // Total number of observed successes permitted.
{
The routine then declares a table of probability thresholds: these are the
maximum acceptable probability that /successes/ or fewer events will be
observed. In our example, /successes/ will be always zero, since we want
no component failures, but in other situations non-zero values may well
make sense.
double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
Much of the rest of the program is pretty-printing, the important part
is in the calculation of maximum number of permitted trials for each
value of alpha:
for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
{
// Confidence value:
cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
// calculate trials:
double t = binomial::estimate_number_of_trials(complement(successes, p, alpha[i]));
t = floor(t);
// Print Trials:
cout << fixed << setprecision(5) << setw(15) << right << t << endl;
}
Note that since the value of alpha is the probability of observing
[*more than /successes/ events], we have to wrap the arguments to
`estimate_number_of_trials` in a call to complement: remember the binomial
distribution deals in [*/successes/ or fewer events]. Finally, since we're
calculating the maximum number of trials permitted, we'll err on the safe
side and take the floor of the result. Had we been calculating the
/minimum/ number of trials required to observe a certain number of /successes/
then we would have taken the ceiling instead.
We'll finish off by looking at some sample output, firstly for
a 1 in 1000 chance of component failure with each use:
[pre
'''________________________
Maximum Number of Trials
________________________
Success ratio = 0.001
Maximum Number of "successes" permitted = 0
____________________________
Confidence Max Number
Value (%) Of Trials
____________________________
50.000 692
75.000 287
90.000 105
95.000 51
99.000 10
99.900 0
99.990 0
99.999 0'''
]
So 51 "uses" of the component would yield a 95% chance that no
component failures would be observed.
Compare that with a 1 in 1 million chance of component failure:
[pre'''
________________________
Maximum Number of Trials
________________________
Success ratio = 0.0000010
Maximum Number of "successes" permitted = 0
____________________________
Confidence Max Number
Value (%) Of Trials
____________________________
50.000 693146
75.000 287681
90.000 105360
95.000 51293
99.000 10050
99.900 1000
99.990 100
99.999 10'''
]
In this case, even 1000 uses of the component would still yield a
less than 1 in 1000 chance of observing a component failure
(99.9% chance of no failure).
[endsect]
[endsect]
[endsect]
[/ dist_tutorial.qbk
Copyright 2006 John Maddock and Paul A. Bristow.
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at
http://www.boost.org/LICENSE_1_0.txt).
]