2
0
mirror of https://github.com/boostorg/math.git synced 2026-02-22 03:22:28 +00:00
Files
math/doc/distributions/negative_binomial_example.qbk
John Maddock ccfb864daf Added Jeffreys prior method to the binomial confidence limits.
Fixed a few typos in the neg binomial.


[SVN r3482]
2006-11-30 10:47:26 +00:00

264 lines
10 KiB
Plaintext

[section:neg_binom_eg Negative Binomial Distribution Examples]
(See also the reference documentation for the __negative_binomial_distrib.)
[section:neg_binom_conf Calculating Confidence Limits on the Frequency of Occurrence for the Negative Binomial Distribution]
Imagine you have a process that follows a negative binomial distribution:
for each trial conducted, an event either occurs or does it does not, referred
to as "successes" and "failures". If, by experiment, you want to measure the
frequency with which successes occur, the best estimate of success fraction is given simply
by /k/ \/ /N/, for /k/ successes out of /N/ trials. However our confidence in that
estimate will be shaped by how many trials were conducted, and how many successes
were observed. The static member functions
`negative_binomial_distribution<>::estimate_lower_bound_on_p` and
`negative_binomial_distribution<>::estimate_upper_bound_on_p` allow you to calculate
the confidence intervals for your estimate of the occurrence frequency.
The sample program [@../../example/neg_binomial_confidence_limits.cpp
neg_binomial_confidence_limits.cpp] illustrates their use. It begins by defining
a procedure that will print a table of confidence limits for various degrees
of certainty:
#include <iostream>
#include <iomanip>
#include <boost/math/distributions/negative_binomial.hpp>
void confidence_limits_on_frequency(unsigned trials, unsigned successes)
{
//
// trials = Total number of trials.
// successes = Total number of observed successes.
//
// Calculate confidence limits for an observed
// frequency of occurrence that follows a negative binomial
// distribution.
//
using namespace std;
using namespace boost::math;
// Print out general info:
cout <<
"___________________________________________\n"
"2-Sided Confidence Limits For Success Ratio\n"
"___________________________________________\n\n";
cout << setprecision(7);
cout << setw(40) << left << "Number of Observations" << "= " << trials << "\n";
cout << setw(40) << left << "Number of successes" << "= " << successes << "\n";
cout << setw(40) << left << "Sample frequency of occurance" << "= " << double(successes) / trials << "\n";
The procedure now defines a table of significance levels: these are the
probabilities that the true occurrence frequency lies outside the calculated
interval:
double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
Some pretty printing of the table header follows:
cout << "\n\n"
"___________________________________________\n"
"Confidence Lower Upper\n"
" Value (%) Limit Limit\n"
"___________________________________________\n";
And now for the important part - the intervals themselves - for each
value of /alpha/, we call `estimate_lower_bound_on_p` and
`estimate_upper_bound_on_p` to obtain lower and upper bounds
respectively. Note that since we are calculating a two-sided interval,
we must divide the value of alpha in two. Had we been calculating a
single-sided interval, for example: ['"Calculate a lower bound so that we are P%
sure that the true occurrence frequency is greater than some value"]
then we would *not* have divided by two.
for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
{
// Confidence value:
cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
// calculate bounds:
double l = binomial::estimate_lower_bound_on_p(trials, successes, alpha[i]/2);
double u = binomial::estimate_upper_bound_on_p(trials, successes, alpha[i]/2);
// Print Limits:
cout << fixed << setprecision(5) << setw(15) << right << l;
cout << fixed << setprecision(5) << setw(15) << right << u << endl;
}
cout << endl;
}
And that's all there is to it. Let's see some sample output for a 1 in 10
success ratio, first for 20 trials:
[pre'''___________________________________________
2-Sided Confidence Limits For Success Ratio
___________________________________________
Number of Observations = 20
Number of successes = 2
Sample frequency of occurrence = 0.1
___________________________________________
Confidence Lower Upper
Value (%) Limit Limit
___________________________________________
50.000 0.08701 0.18675
75.000 0.06229 0.23163
90.000 0.04217 0.28262
95.000 0.03207 0.31698
99.000 0.01764 0.38713
99.900 0.00786 0.47093
99.990 0.00358 0.54084
99.999 0.00165 0.60020
''']
As you can see, even at the 95% confidence level the bounds are
really quite wide. Compare that with the program output for
2000 trials:
[pre'''___________________________________________
2-Sided Confidence Limits For Success Ratio
___________________________________________
Number of Observations = 2000
Number of successes = 200
Sample frequency of occurrence = 0.1000000
___________________________________________
Confidence Lower Upper
Value (%) Limit Limit
___________________________________________
50.000 0.09585 0.10491
75.000 0.09277 0.10822
90.000 0.08963 0.11172
95.000 0.08767 0.11399
99.000 0.08390 0.11850
99.900 0.07966 0.12385
99.990 0.07621 0.12845
99.999 0.07325 0.13256
''']
Now even when the confidence level is very high, the limits are really
quite close to the experimentally calculated value of 0.1.
[endsect][/section:neg_binom_conf Calculating Confidence Limits on the Frequency of Occurrence]
[section:neg_binom_size_eg Estimating Sample Sizes for the Negative Binomial.]
Imagine you have an event (let's call it a "failure") that you know will
occur in 1 in N trials. You may want to know how many trials you need to
conduct to be 100P% sure of observing at least k such failures.
If the failure events follow a negative binomial
distribution (each trial either succeeds or does not)
then the static member function `negative_binomial_distibution<>::estimate_number_of_trials`
can be used to estimate the minimum number of trials required to be 100P% sure
of observing the desired number of failures.
The example program
[@../../example/neg_binomial_sample_sizes.cpp neg_binomial_sample_sizes.cpp]
demonstrates its usage. It centres around a routine that prints out
a table of minimum sample sizes for various probability thresholds:
void estimate_max_sample_size(
double p, // success fraction.
unsigned failures) // Total number of observed failures required.
{
The routine then declares a table of probability thresholds: these are the
maximum acceptable probability that /failure/ or fewer events will be
observed.
double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
Much of the rest of the program is pretty-printing, the important part
is in the calculation of minimum number of trials required for each
value of alpha:
cout << "\n""Target number of failures = " << failures;
cout << ", Success fraction = " << 100 * p << "%" << endl;
// Print table header:
cout << "\n\n"
"____________________________\n"
"Confidence Min Number\n"
" Value (%) Of Trials \n"
"____________________________\n";
// Now print out the data for the table rows.
for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
{
// Confidence values %:
cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]) << " "
// estimate_number_of_trials
<< setw(6) << right
<< ceil(negative_binomial::estimate_number_of_trials(
failures, p, alpha[i])) << endl;
}
cout << endl;
Note that since we're
calculating the /minimum/ number of trials required, we'll err on the safe
side and take the ceiling of the result. Had we been calculating the
/maximum/ number of trials permitted to observe less than a certain
number of /failures/ then we would have taken the floor instead. We
would also have wrapped the arguments to `estimate_number_of_trials` in
a call to complement like this:
floor(negative_binomial::estimate_number_of_trials(
complement(failures, p, alpha[i])))
which would give us the largest number of trials we could conduct and
still be 100P% sure of observing /failures or less/ failure events, when the
probability of success is /p/.
We'll finish off by looking at some sample output, firstly suppose
we wish to observe at least 5 "failures" with a 50/50 chance of
success or failure:
[pre
'''Target number of failures = 5, Success fraction = 50%
____________________________
Confidence Min Number
Value (%) Of Trials
____________________________
50.000 11
75.000 14
90.000 17
95.000 18
99.000 22
99.900 27
99.990 31
99.999 36
'''
]
So 18 trials or more would yield a 95% chance that at least our 5
required failures would be observed.
Compare that to what happens if the success ratio is 90%:
[pre'''Target number of failures = 5.000, Success fraction = 90.000%
____________________________
Confidence Min Number
Value (%) Of Trials
____________________________
50.000 57
75.000 73
90.000 91
95.000 103
99.000 127
99.900 159
99.990 189
99.999 217
''']
So now 103 trials are required to observe at least 5 failures with
95% certainty.
[endsect][/section:neg_binom_size_eg Estimating Sample Sizes.]
[endsect][/section:neg_binom_eg Negative Binomial Distribution Examples]