mirror of
https://github.com/boostorg/math.git
synced 2026-02-22 03:22:28 +00:00
263 lines
10 KiB
Plaintext
263 lines
10 KiB
Plaintext
[section:neg_binom_eg Negative Binomial Distribution Examples]
|
|
|
|
(See also the reference documentation for the __negative_binomial_distrib.)
|
|
|
|
[section:neg_binom_conf Calculating Confidence Limits on the Frequency of Occurrence for the Negative Binomial Distribution]
|
|
|
|
Imagine you have a process that follows a negative binomial distribution:
|
|
for each trial conducted, an event either occurs or does it does not, referred
|
|
to as "successes" and "failures". If, by experiment, you want to measure the
|
|
frequency with which successes occur, the best estimate of success fraction is given simply
|
|
by /k/ \/ /N/, for /k/ successes out of /N/ trials. However our confidence in that
|
|
estimate will be shaped by how many trials were conducted, and how many successes
|
|
were observed. The static member functions
|
|
`negative_binomial_distribution<>::estimate_lower_bound_on_p` and
|
|
`negative_binomial_distribution<>::estimate_upper_bound_on_p` allow you to calculate
|
|
the confidence intervals for your estimate of the occurrence frequency.
|
|
|
|
The sample program [@../../example/neg_binomial_confidence_limits.cpp
|
|
neg_binomial_confidence_limits.cpp] illustrates their use. It begins by defining
|
|
a procedure that will print a table of confidence limits for various degrees
|
|
of certainty:
|
|
|
|
#include <iostream>
|
|
#include <iomanip>
|
|
#include <boost/math/distributions/negative_binomial.hpp>
|
|
|
|
void confidence_limits_on_frequency(unsigned trials, unsigned successes)
|
|
{
|
|
//
|
|
// trials = Total number of trials.
|
|
// successes = Total number of observed successes.
|
|
//
|
|
// Calculate confidence limits for an observed
|
|
// frequency of occurrence that follows a negative binomial
|
|
// distribution.
|
|
//
|
|
using namespace std;
|
|
using namespace boost::math;
|
|
|
|
// Print out general info:
|
|
cout <<
|
|
"___________________________________________\n"
|
|
"2-Sided Confidence Limits For Success Ratio\n"
|
|
"___________________________________________\n\n";
|
|
cout << setprecision(7);
|
|
cout << setw(40) << left << "Number of Observations" << "= " << trials << "\n";
|
|
cout << setw(40) << left << "Number of successes" << "= " << successes << "\n";
|
|
cout << setw(40) << left << "Sample frequency of occurrence" << "= " << double(successes) / trials << "\n";
|
|
|
|
The procedure now defines a table of significance levels: these are the
|
|
probabilities that the true occurrence frequency lies outside the calculated
|
|
interval:
|
|
|
|
double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
|
|
|
|
Some pretty printing of the table header follows:
|
|
|
|
cout << "\n\n"
|
|
"___________________________________________\n"
|
|
"Confidence Lower Upper\n"
|
|
" Value (%) Limit Limit\n"
|
|
"___________________________________________\n";
|
|
|
|
|
|
And now for the important part - the intervals themselves - for each
|
|
value of /alpha/, we call `estimate_lower_bound_on_p` and
|
|
`estimate_upper_bound_on_p` to obtain lower and upper bounds
|
|
respectively. Note that since we are calculating a two-sided interval,
|
|
we must divide the value of alpha in two. Had we been calculating a
|
|
single-sided interval, for example: ['"Calculate a lower bound so that we are P%
|
|
sure that the true occurrence frequency is greater than some value"]
|
|
then we would *not* have divided by two.
|
|
|
|
for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
|
|
{
|
|
// Confidence value:
|
|
cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
|
|
// calculate bounds:
|
|
double l = binomial::estimate_lower_bound_on_p(trials, successes, alpha[i]/2);
|
|
double u = binomial::estimate_upper_bound_on_p(trials, successes, alpha[i]/2);
|
|
// Print Limits:
|
|
cout << fixed << setprecision(5) << setw(15) << right << l;
|
|
cout << fixed << setprecision(5) << setw(15) << right << u << endl;
|
|
}
|
|
cout << endl;
|
|
}
|
|
|
|
And that's all there is to it. Let's see some sample output for a 1 in 10
|
|
success ratio, first for 20 trials:
|
|
|
|
[pre'''___________________________________________
|
|
2-Sided Confidence Limits For Success Ratio
|
|
___________________________________________
|
|
|
|
Number of Observations = 20
|
|
Number of successes = 2
|
|
Sample frequency of occurrence = 0.1
|
|
|
|
|
|
___________________________________________
|
|
Confidence Lower Upper
|
|
Value (%) Limit Limit
|
|
___________________________________________
|
|
50.000 0.08701 0.18675
|
|
75.000 0.06229 0.23163
|
|
90.000 0.04217 0.28262
|
|
95.000 0.03207 0.31698
|
|
99.000 0.01764 0.38713
|
|
99.900 0.00786 0.47093
|
|
99.990 0.00358 0.54084
|
|
99.999 0.00165 0.60020
|
|
''']
|
|
|
|
As you can see, even at the 95% confidence level the bounds are
|
|
really quite wide. Compare that with the program output for
|
|
2000 trials:
|
|
|
|
[pre'''___________________________________________
|
|
2-Sided Confidence Limits For Success Ratio
|
|
___________________________________________
|
|
|
|
Number of Observations = 2000
|
|
Number of successes = 200
|
|
Sample frequency of occurrence = 0.1000000
|
|
|
|
|
|
___________________________________________
|
|
Confidence Lower Upper
|
|
Value (%) Limit Limit
|
|
___________________________________________
|
|
50.000 0.09585 0.10491
|
|
75.000 0.09277 0.10822
|
|
90.000 0.08963 0.11172
|
|
95.000 0.08767 0.11399
|
|
99.000 0.08390 0.11850
|
|
99.900 0.07966 0.12385
|
|
99.990 0.07621 0.12845
|
|
99.999 0.07325 0.13256
|
|
''']
|
|
|
|
Now even when the confidence level is very high, the limits are really
|
|
quite close to the experimentally calculated value of 0.1.
|
|
|
|
[endsect][/section:neg_binom_conf Calculating Confidence Limits on the Frequency of Occurrence]
|
|
|
|
[section:neg_binom_size_eg Estimating Sample Sizes for the Negative Binomial.]
|
|
|
|
Imagine you have an event (let's call it a "failure") that you know will
|
|
occur in 1 in N trials. You may want to know how many trials you need to
|
|
conduct to be 100P% sure of observing at least k such failures.
|
|
If the failure events follow a negative binomial
|
|
distribution (each trial either succeeds or does not)
|
|
then the static member function `negative_binomial_distibution<>::estimate_minimum_number_of_trials`
|
|
can be used to estimate the minimum number of trials required to be 100P% sure
|
|
of observing the desired number of failures.
|
|
|
|
The example program
|
|
[@../../example/neg_binomial_sample_sizes.cpp neg_binomial_sample_sizes.cpp]
|
|
demonstrates its usage. It centres around a routine that prints out
|
|
a table of minimum sample sizes for various probability thresholds:
|
|
|
|
void estimate_max_sample_size(
|
|
double p, // success fraction.
|
|
unsigned failures) // Total number of observed failures required.
|
|
{
|
|
|
|
The routine then declares a table of probability thresholds: these are the
|
|
maximum acceptable probability that /failure/ or fewer events will be
|
|
observed.
|
|
|
|
double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };
|
|
|
|
Much of the rest of the program is pretty-printing, the important part
|
|
is in the calculation of minimum number of trials required for each
|
|
value of alpha:
|
|
|
|
cout << "\n""Target number of failures = " << failures;
|
|
cout << ", Success fraction = " << 100 * p << "%" << endl;
|
|
|
|
// Print table header:
|
|
cout << "\n\n"
|
|
"____________________________\n"
|
|
"Confidence Min Number\n"
|
|
" Value (%) Of Trials \n"
|
|
"____________________________\n";
|
|
|
|
// Now print out the data for the table rows.
|
|
for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
|
|
{
|
|
// Confidence values %:
|
|
cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]) << " "
|
|
// estimate_minimum_number_of_trials
|
|
<< setw(6) << right
|
|
<< ceil(negative_binomial::estimate_minimum_number_of_trials(
|
|
failures, p, alpha[i])) << endl;
|
|
}
|
|
cout << endl;
|
|
|
|
Note that since we're
|
|
calculating the /minimum/ number of trials required, we'll err on the safe
|
|
side and take the ceiling of the result. Had we been calculating the
|
|
/maximum/ number of trials permitted to observe less than a certain
|
|
number of /failures/ then we would have taken the floor instead. We
|
|
would also have called `estimate_maximum_number_of_trials` like this:
|
|
|
|
floor(negative_binomial::estimate_maximum_number_of_trials(
|
|
failures, p, alpha[i]))
|
|
|
|
which would give us the largest number of trials we could conduct and
|
|
still be 100P% sure of observing /failures or less/ failure events, when the
|
|
probability of success is /p/.
|
|
|
|
We'll finish off by looking at some sample output, firstly suppose
|
|
we wish to observe at least 5 "failures" with a 50/50 chance of
|
|
success or failure:
|
|
|
|
[pre
|
|
'''Target number of failures = 5, Success fraction = 50%
|
|
|
|
____________________________
|
|
Confidence Min Number
|
|
Value (%) Of Trials
|
|
____________________________
|
|
50.000 11
|
|
75.000 14
|
|
90.000 17
|
|
95.000 18
|
|
99.000 22
|
|
99.900 27
|
|
99.990 31
|
|
99.999 36
|
|
'''
|
|
]
|
|
|
|
So 18 trials or more would yield a 95% chance that at least our 5
|
|
required failures would be observed.
|
|
|
|
Compare that to what happens if the success ratio is 90%:
|
|
|
|
[pre'''Target number of failures = 5.000, Success fraction = 90.000%
|
|
|
|
____________________________
|
|
Confidence Min Number
|
|
Value (%) Of Trials
|
|
____________________________
|
|
50.000 57
|
|
75.000 73
|
|
90.000 91
|
|
95.000 103
|
|
99.000 127
|
|
99.900 159
|
|
99.990 189
|
|
99.999 217
|
|
''']
|
|
|
|
So now 103 trials are required to observe at least 5 failures with
|
|
95% certainty.
|
|
|
|
|
|
[endsect][/section:neg_binom_size_eg Estimating Sample Sizes.]
|
|
|
|
[endsect][/section:neg_binom_eg Negative Binomial Distribution Examples]
|