math/doc/distributions/negative_binomial_example.qbk

[section:neg_binom_eg Negative Binomial Distribution Examples]

(See also the reference documentation for the __negative_binomial_distrib.)

[section:neg_binom_conf Calculating Confidence Limits on the Frequency of Occurrence for the Negative Binomial Distribution]

Imagine you have a process that follows a negative binomial distribution:
for each trial conducted, an event either occurs or does it does not, referred
to as "successes" and "failures".  If, by experiment, you want to measure the
frequency with which successes occur, the best estimate of success fraction is given simply
by /k/ \/ /N/, for /k/ successes out of /N/ trials.  However our confidence in that
estimate will be shaped by how many trials were conducted, and how many successes
were observed.  The static member functions
`negative_binomial_distribution<>::estimate_lower_bound_on_p` and
`negative_binomial_distribution<>::estimate_upper_bound_on_p` allow you to calculate
the confidence intervals for your estimate of the occurrence frequency.

The sample program [@../../example/neg_binomial_confidence_limits.cpp
neg_binomial_confidence_limits.cpp] illustrates their use.  It begins by defining
a procedure that will print a table of confidence limits for various degrees
of certainty:

   #include <iostream>
   #include <iomanip>
   #include <boost/math/distributions/negative_binomial.hpp>

   void confidence_limits_on_frequency(unsigned trials, unsigned successes)
   {
      //
      // trials = Total number of trials.
      // successes = Total number of observed successes.
      //
      // Calculate confidence limits for an observed
      // frequency of occurrence that follows a negative binomial
      // distribution.
      //
      using namespace std;
      using namespace boost::math;

      // Print out general info:
      cout <<
         "___________________________________________\n"
         "2-Sided Confidence Limits For Success Ratio\n"
         "___________________________________________\n\n";
      cout << setprecision(7);
      cout << setw(40) << left << "Number of Observations" << "=  " << trials << "\n";
      cout << setw(40) << left << "Number of successes" << "=  " << successes << "\n";
      cout << setw(40) << left << "Sample frequency of occurance" << "=  " << double(successes) / trials << "\n";

The procedure now defines a table of significance levels: these are the
probabilities that the true occurrence frequency lies outside the calculated
interval:

      double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };

Some pretty printing of the table header follows:

      cout << "\n\n"
              "___________________________________________\n"
              "Confidence        Lower          Upper\n"
              " Value (%)        Limit          Limit\n"
              "___________________________________________\n";


And now for the important part - the intervals themselves - for each
value of /alpha/, we call `estimate_lower_bound_on_p` and
`estimate_upper_bound_on_p` to obtain lower and upper bounds
respectively.  Note that since we are calculating a two-sided interval,
we must divide the value of alpha in two.  Had we been calculating a
single-sided interval, for example: ['"Calculate a lower bound so that we are P%
sure that the true occurrence frequency is greater than some value"]
then we would *not* have divided by two.

      for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
      {
         // Confidence value:
         cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]);
         // calculate bounds:
         double l = binomial::estimate_lower_bound_on_p(trials, successes, alpha[i]/2);
         double u = binomial::estimate_upper_bound_on_p(trials, successes, alpha[i]/2);
         // Print Limits:
         cout << fixed << setprecision(5) << setw(15) << right << l;
         cout << fixed << setprecision(5) << setw(15) << right << u << endl;
      }
      cout << endl;
   }

And that's all there is to it.  Let's see some sample output for a 1 in 10
success ratio, first for 20 trials:

[pre'''___________________________________________
2-Sided Confidence Limits For Success Ratio
___________________________________________

Number of Observations                  =  20
Number of successes                     =  2
Sample frequency of occurrence          =  0.1


___________________________________________
Confidence        Lower          Upper
 Value (%)        Limit          Limit
___________________________________________
    50.000        0.08701        0.18675
    75.000        0.06229        0.23163
    90.000        0.04217        0.28262
    95.000        0.03207        0.31698
    99.000        0.01764        0.38713
    99.900        0.00786        0.47093
    99.990        0.00358        0.54084
    99.999        0.00165        0.60020
''']

As you can see, even at the 95% confidence level the bounds are
really quite wide.  Compare that with the program output for
2000 trials:

[pre'''___________________________________________
2-Sided Confidence Limits For Success Ratio
___________________________________________

Number of Observations                  =  2000
Number of successes                     =  200
Sample frequency of occurrence          =  0.1000000


___________________________________________
Confidence        Lower          Upper
 Value (%)        Limit          Limit
___________________________________________
    50.000        0.09585        0.10491
    75.000        0.09277        0.10822
    90.000        0.08963        0.11172
    95.000        0.08767        0.11399
    99.000        0.08390        0.11850
    99.900        0.07966        0.12385
    99.990        0.07621        0.12845
    99.999        0.07325        0.13256
''']

Now even when the confidence level is very high, the limits are really
quite close to the experimentally calculated value of 0.1.

[endsect][/section:neg_binom_conf Calculating Confidence Limits on the Frequency of Occurrence]

[section:neg_binom_size_eg Estimating Sample Sizes for the Negative Binomial.]

Imagine you have an event (let's call it a "failure") that you know will
occur in 1 in N trials.  You may want to know how many trials you need to
conduct to be 100P% sure of observing at least k such failures.
If the failure events follow a negative binomial
distribution (each trial either succeeds or does not)
then the static member function `negative_binomial_distibution<>::estimate_number_of_trials`
can be used to estimate the minimum number of trials required to be 100P% sure
of observing the desired number of failures.

The example program
[@../../example/neg_binomial_sample_sizes.cpp neg_binomial_sample_sizes.cpp]
demonstrates its usage.  It centres around a routine that prints out
a table of minimum sample sizes for various probability thresholds:

   void estimate_max_sample_size(
      double p,              // success fraction.
      unsigned failures)    // Total number of observed failures required.
   {

The routine then declares a table of probability thresholds: these are the
maximum acceptable probability that /failure/ or fewer events will be
observed.

   double alpha[] = { 0.5, 0.25, 0.1, 0.05, 0.01, 0.001, 0.0001, 0.00001 };

Much of the rest of the program is pretty-printing, the important part
is in the calculation of minimum number of trials required for each
value of alpha:

   cout << "\n""Target number of failures = " << failures;
   cout << ",   Success fraction = " << 100 * p << "%" << endl;

   // Print table header:
   cout << "\n\n"
        "____________________________\n"
        "Confidence        Min Number\n"
        " Value (%)        Of Trials \n"
        "____________________________\n";

   // Now print out the data for the table rows.
   for(unsigned i = 0; i < sizeof(alpha)/sizeof(alpha[0]); ++i)
   {
      // Confidence values %:
      cout << fixed << setprecision(3) << setw(10) << right << 100 * (1-alpha[i]) << "      "
         // estimate_number_of_trials
         << setw(6) << right
         << ceil(negative_binomial::estimate_number_of_trials(
                     failures, p, alpha[i])) << endl;
   }
   cout << endl;

Note that since we're
calculating the /minimum/ number of trials required, we'll err on the safe
side and take the ceiling of the result.  Had we been calculating the
/maximum/ number of trials permitted to observe less than a certain
number of /failures/ then we would have taken the floor instead.  We
would also have wrapped the arguments to `estimate_number_of_trials` in
a call to complement like this:

   floor(negative_binomial::estimate_number_of_trials(
            complement(failures, p, alpha[i])))

which would give us the largest number of trials we could conduct and
still be 100P% sure of observing /failures or less/ failure events, when the
probability of success is /p/.

We'll finish off by looking at some sample output, firstly suppose
we wish to observe at least 5 "failures" with a 50/50 chance of
success or failure:

[pre
'''Target number of failures = 5,   Success fraction = 50%

____________________________
Confidence        Min Number
 Value (%)        Of Trials
____________________________
    50.000          11
    75.000          14
    90.000          17
    95.000          18
    99.000          22
    99.900          27
    99.990          31
    99.999          36
'''
]

So 18 trials or more would yield a 95% chance that at least our 5
required failures would be observed.

Compare that to what happens if the success ratio is 90%:

[pre'''Target number of failures = 5.000,   Success fraction = 90.000%

____________________________
Confidence        Min Number
 Value (%)        Of Trials
____________________________
    50.000          57
    75.000          73
    90.000          91
    95.000         103
    99.000         127
    99.900         159
    99.990         189
    99.999         217
''']

So now 103 trials are required to observe at least 5 failures with
95% certainty.


[endsect][/section:neg_binom_size_eg Estimating Sample Sizes.]

[endsect][/section:neg_binom_eg Negative Binomial Distribution Examples]