2
0
mirror of https://github.com/boostorg/math.git synced 2026-02-21 15:12:28 +00:00
Files
math/doc/distributions/chi_squared_examples.qbk
John Maddock e602f59025 Added typedef to binomial: there is no longer any name clash with the "binomial_coefficient" function.
Updated distribution docs to bring them into synch with the policy based code.  Still a few "TODO" sections at present.

[SVN r7545]
2007-07-26 12:50:29 +00:00

430 lines
18 KiB
Plaintext

[section:cs_eg Chi Squared Distribution Examples]
[section:chi_sq_intervals Confidence Intervals on the Standard Deviation]
Once you have calculated the standard deviation for your data, a legitimate
question to ask is "How reliable is the calculated standard deviation?".
For this situation the Chi Squared distribution can be used to calculate
confidence intervals for the standard deviation.
The full example code is in [@../../example/chi_square_std_dev_test.cpp
chi_square_std_deviation_test.cpp].
We'll begin by defining the procedure that will calculate and print out the
confidence intervals:
void confidence_limits_on_std_deviation(
double Sd, // Sample Standard Deviation
unsigned N) // Sample size
{
We'll begin by printing out some general information:
cout <<
"________________________________________________\n"
"2-Sided Confidence Limits For Standard Deviation\n"
"________________________________________________\n\n";
cout << setprecision(7);
cout << setw(40) << left << "Number of Observations" << "= " << N << "\n";
cout << setw(40) << left << "Standard Deviation" << "= " << Sd << "\n";
and then define a table of significance levels for which we'll calculate
intervals:
double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
The distribution we'll need to calculate the confidence intervals is a
Chi Squared distribution, with N-1 degrees of freedom:
chi_squared dist(N - 1);
For each value of alpha, the formula for the confidence interval is given by:
[$../equations/chi_squ_tut1.png]
Where [$../equations/chi_squ_tut2.png] is the upper critical value, and
[$../equations/chi_squ_tut3.png] is the lower critical value of the
Chi Squared distribution.
In code we begin by printing out a table header:
cout << "\n\n"
"_____________________________________________\n"
"Confidence Lower Upper\n"
" Value (%) Limit Limit\n"
"_____________________________________________\n";
and then loop over the values of alpha and calculate the intervals
for each: remember that the lower critical value is the same as the
quantile, and the upper critical value is the same as the quantile
from the complement of the probability:
for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
{
// Confidence value:
cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
// Calculate limits:
double lower_limit = sqrt((N - 1) * Sd * Sd / quantile(complement(dist, alpha[i] / 2)));
double upper_limit = sqrt((N - 1) * Sd * Sd / quantile(dist, alpha[i] / 2));
// Print Limits:
cout << fixed << setprecision(5) << setw(15) << right << lower_limit;
cout << fixed << setprecision(5) << setw(15) << right << upper_limit << endl;
}
cout << endl;
To see some example output we'll use the
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda3581.htm
gear data] from the __handbook.
The data represents measurements of gear diameter from a manufacturing
process.
[pre'''
________________________________________________
2-Sided Confidence Limits For Standard Deviation
________________________________________________
Number of Observations = 100
Standard Deviation = 0.006278908
_____________________________________________
Confidence Lower Upper
Value (%) Limit Limit
_____________________________________________
50.000 0.00601 0.00662
75.000 0.00582 0.00685
90.000 0.00563 0.00712
95.000 0.00551 0.00729
99.000 0.00530 0.00766
99.900 0.00507 0.00812
99.990 0.00489 0.00855
99.999 0.00474 0.00895
''']
So at the 95% confidence level we conclude that the standard deviation
is between 0.00551 and 0.00729.
[endsect][/section:chi_sq_intervals Confidence Intervals on the Standard Deviation]
[section:chi_sq_test Chi-Square Test for the Standard Deviation]
We use this test to determine whether the standard deviation of a sample
differs from a specified value. Typically this occurs in process change
situations where we wish to compare the standard deviation of a new
process to an established one.
The code for this example is contained in
[@../../chi_square_std_dev_test.cpp chi_square_std_dev_test.cpp], and
we'll begin by defining the procedure that will print out the test
statistics:
void chi_squared_test(
double Sd, // Sample std deviation
double D, // True std deviation
unsigned N, // Sample size
double alpha) // Significance level
{
The procedure begins by printing a summary of the input data:
using namespace std;
using namespace boost::math;
// Print header:
cout <<
"______________________________________________\n"
"Chi Squared test for sample standard deviation\n"
"______________________________________________\n\n";
cout << setprecision(5);
cout << setw(55) << left << "Number of Observations" << "= " << N << "\n";
cout << setw(55) << left << "Sample Standard Deviation" << "= " << Sd << "\n";
cout << setw(55) << left << "Expected True Standard Deviation" << "= " << D << "\n\n";
The test statistic (T) is simply the ratio of the sample and "true" standard
deviations squared, multiplied by the number of degrees of freedom (the
sample size less one):
double t_stat = (N - 1) * (Sd / D) * (Sd / D);
cout << setw(55) << left << "Test Statistic" << "= " << t_stat << "\n";
The distribution we need to use, is a Chi Squared distribution with N-1
degrees of freedom:
chi_squared dist(N - 1);
The various hypothesis that can be tested are summarised in the following table:
[table
[[Hypothesis][Test]]
[[The null-hypothesis: there is no difference in standard deviation from the specified value]
[ Reject if T < [chi][super 2][sub (1-alpha/2; N-1)] or T > [chi][super 2][sub (alpha/2; N-1)] ]]
[[The alternative hypothesis: there is a difference in standard deviation from the specified value]
[ Reject if [chi][super 2][sub (1-alpha/2; N-1)] >= T >= [chi][super 2][sub (alpha/2; N-1)] ]]
[[The alternative hypothesis: the standard deviation is less than the specified value]
[ Reject if [chi][super 2][sub (1-alpha; N-1)] <= T ]]
[[The alternative hypothesis: the standard deviation is greater than the specified value]
[ Reject if [chi][super 2][sub (alpha; N-1)] >= T ]]
]
Where [chi][super 2][sub (alpha; N-1)] is the upper critical value of the
Chi Squared distribution, and [chi][super 2][sub (1-alpha; N-1)] is the
lower critical value.
Recall that the lower critical value is the same
as the quantile, and the upper critical value is the same as the quantile
from the complement of the probability, that gives us the following code
to calculate the critical values:
double ucv = quantile(complement(dist, alpha));
double ucv2 = quantile(complement(dist, alpha / 2));
double lcv = quantile(dist, alpha);
double lcv2 = quantile(dist, alpha / 2);
cout << setw(55) << left << "Upper Critical Value at alpha: " << "= "
<< setprecision(3) << scientific << ucv << "\n";
cout << setw(55) << left << "Upper Critical Value at alpha/2: " << "= "
<< setprecision(3) << scientific << ucv2 << "\n";
cout << setw(55) << left << "Lower Critical Value at alpha: " << "= "
<< setprecision(3) << scientific << lcv << "\n";
cout << setw(55) << left << "Lower Critical Value at alpha/2: " << "= "
<< setprecision(3) << scientific << lcv2 << "\n\n";
Now that we have the critical values, we can compare these to our test
statistic, and print out the result of each hypothesis and test:
cout << setw(55) << left <<
"Results for Alternative Hypothesis and alpha" << "= "
<< setprecision(4) << fixed << alpha << "\n\n";
cout << "Alternative Hypothesis Conclusion\n";
cout << "Standard Deviation != " << setprecision(3) << fixed << D << " ";
if((ucv2 < t_stat) || (lcv2 > t_stat))
cout << "ACCEPTED\n";
else
cout << "REJECTED\n";
cout << "Standard Deviation < " << setprecision(3) << fixed << D << " ";
if(lcv > t_stat)
cout << "ACCEPTED\n";
else
cout << "REJECTED\n";
cout << "Standard Deviation > " << setprecision(3) << fixed << D << " ";
if(ucv < t_stat)
cout << "ACCEPTED\n";
else
cout << "REJECTED\n";
cout << endl << endl;
To see some example output we'll use the
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda3581.htm
gear data] from the __handbook.
The data represents measurements of gear diameter from a manufacturing
process. The program output is deliberately designed to mirror
the DATAPLOT output shown in the
[@http://www.itl.nist.gov/div898/handbook/eda/section3/eda358.htm
NIST Handbook Example].
[pre'''
______________________________________________
Chi Squared test for sample standard deviation
______________________________________________
Number of Observations = 100
Sample Standard Deviation = 0.00628
Expected True Standard Deviation = 0.10000
Test Statistic = 0.39030
CDF of test statistic: = 1.438e-099
Upper Critical Value at alpha: = 1.232e+002
Upper Critical Value at alpha/2: = 1.284e+002
Lower Critical Value at alpha: = 7.705e+001
Lower Critical Value at alpha/2: = 7.336e+001
Results for Alternative Hypothesis and alpha = 0.0500
Alternative Hypothesis Conclusion'''
Standard Deviation != 0.100 ACCEPTED
Standard Deviation < 0.100 ACCEPTED
Standard Deviation > 0.100 REJECTED
]
In this case we are testing whether the sample standard deviation is 0.1,
and the null-hypothesis is rejected, so we conclude that the standard
deviation ['is not] 0.1.
For an alternative example, consider the
[@http://www.itl.nist.gov/div898/handbook/prc/section2/prc23.htm
silicon wafer data] again from the __handbook.
In this scenario a supplier of 100 ohm.cm silicon wafers claims
that his fabrication process can produce wafers with sufficient
consistency so that the standard deviation of resistivity for
the lot does not exceed 10 ohm.cm. A sample of N = 10 wafers taken
from the lot has a standard deviation of 13.97 ohm.cm, and the question
we ask ourselves is "Is the suppliers claim correct?".
The program output now looks like this:
[pre'''
______________________________________________
Chi Squared test for sample standard deviation
______________________________________________
Number of Observations = 10
Sample Standard Deviation = 13.97000
Expected True Standard Deviation = 10.00000
Test Statistic = 17.56448
CDF of test statistic: = 9.594e-001
Upper Critical Value at alpha: = 1.692e+001
Upper Critical Value at alpha/2: = 1.902e+001
Lower Critical Value at alpha: = 3.325e+000
Lower Critical Value at alpha/2: = 2.700e+000
Results for Alternative Hypothesis and alpha = 0.0500
Alternative Hypothesis Conclusion'''
Standard Deviation != 10.000 REJECTED
Standard Deviation < 10.000 REJECTED
Standard Deviation > 10.000 ACCEPTED
]
In this case, our null-hypothesis is that the standard deviation of
the sample is less than 10: this hypothesis is rejected in the analysis
above, and so we reject the manufacturers claim.
[endsect][/section:chi_sq_test Chi-Square Test for the Standard Deviation]
[section:chi_sq_size Estimating the Required Sample Sizes for a Chi-Square Test for the Standard Deviation]
Suppose we conduct a Chi Squared test for standard deviation and the result
is borderline, a legitimate question to ask is "How large would the sample size
have to be in order to produce a definitive result?"
The class template [link math_toolkit.dist.dist_ref.dists.chi_squared_dist
chi_squared_distribution] has a static method
`estimate_degrees_of_freedom` that will calculate this value for
some acceptable risk of type I failure /alpha/, type II failure
/beta/, and difference from the standard deviation /diff/. Please
note that the method used works on variance, and not standard deviation
as is usual for the Chi Squared Test.
The code for this example is located in [@../../chi_square_std_dev_test.cpp
chi_square_std_dev_test.cpp].
We begin by defining a procedure to print out the sample sizes required
for various risk levels:
void chi_squared_sample_sized(
double diff, // difference from variance to detect
double variance) // true variance
{
The procedure begins by printing out the input data:
using namespace std;
using namespace boost::math;
// Print out general info:
cout <<
"_____________________________________________________________\n"
"Estimated sample sizes required for various confidence levels\n"
"_____________________________________________________________\n\n";
cout << setprecision(5);
cout << setw(40) << left << "True Variance" << "= " << variance << "\n";
cout << setw(40) << left << "Difference to detect" << "= " << diff << "\n";
And defines a table of significance levels for which we'll calculate sample sizes:
double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
For each value of alpha we can calculate two sample sizes: one where the
sample variance is less than the true value by /diff/ and one
where it is greater than the true value by /diff/. Thanks to the
asymmetric nature of the Chi Squared distribution these two values will
not be the same, the difference in their calculation differs only in the
sign of /diff/ that's passed to `estimate_degrees_of_freedom`. Finally
in this example we'll simply things, and let risk level /beta/ be the
same as /alpha/:
cout << "\n\n"
"_______________________________________________________________\n"
"Confidence Estimated Estimated\n"
" Value (%) Sample Size Sample Size\n"
" (lower one (upper one\n"
" sided test) sided test)\n"
"_______________________________________________________________\n";
//
// Now print out the data for the table rows.
//
for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
{
// Confidence value:
cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
// calculate df for a lower single sided test:
double df = chi_squared::estimate_degrees_of_freedom(
-diff, alpha[i], alpha[i], variance);
// convert to sample size:
double size = ceil(df) + 1;
// Print size:
cout << fixed << setprecision(0) << setw(16) << right << size;
// calculate df for an upper single sided test:
df = chi_squared::estimate_degrees_of_freedom(
diff, alpha[i], alpha[i], variance);
// convert to sample size:
size = ceil(df) + 1;
// Print size:
cout << fixed << setprecision(0) << setw(16) << right << size << endl;
}
cout << endl;
For some example output, consider the
[@http://www.itl.nist.gov/div898/handbook/prc/section2/prc23.htm
silicon wafer data] from the __handbook.
In this scenario a supplier of 100 ohm.cm silicon wafers claims
that his fabrication process can produce wafers with sufficient
consistency so that the standard deviation of resistivity for
the lot does not exceed 10 ohm.cm. A sample of N = 10 wafers taken
from the lot has a standard deviation of 13.97 ohm.cm, and the question
we ask ourselves is "How large would our sample have to be to reliably
detect this difference?".
To use our procedure above, we have to convert the
standard deviations to variance (square them),
after which the program output looks like this:
[pre'''
_____________________________________________________________
Estimated sample sizes required for various confidence levels
_____________________________________________________________
True Variance = 100.00000
Difference to detect = 95.16090
_______________________________________________________________
Confidence Estimated Estimated
Value (%) Sample Size Sample Size
(lower one (upper one
sided test) sided test)
_______________________________________________________________
50.000 2 2
75.000 2 10
90.000 4 32
95.000 5 51
99.000 7 99
99.900 11 174
99.990 15 251
99.999 20 330'''
]
In this case we are interested in a upper single sided test.
So for example, if the maximum acceptable risk of falsely rejecting
the null-hypothesis is 0.05 (Type I error), and the maximum acceptable
risk of failing to reject the null-hypothesis is also 0.05
(Type II error), we estimate that we would need a sample size of 51.
[endsect][/section:chi_sq_size Estimating the Required Sample Sizes for a Chi-Square Test for the Standard Deviation]
[endsect][/section:cs_eg Chi Squared Distribution]