math/doc/distributions/negative_binomial_example.qbk

[section:neg_binom_eg Negative Binomial Distribution Examples]

(See also the reference documentation for the __negative_binomial_distrib.)

[section:neg_binom_conf Calculating Confidence Limits on the Frequency of Occurrence for the Negative Binomial Distribution]

Imagine you have a process that follows a negative binomial distribution:
for each trial conducted, an event either occurs or does it does not, referred
to as "successes" and "failures". The frequency with which successes occur
is variously referred to as the
success fraction, success ratio, success percentage, occurrence frequency, or probability of occurrence.

If, by experiment, you want to measure the
 the best estimate of success fraction is given simply
by /k/ \/ /N/, for /k/ successes out of /N/ trials.

However our confidence in that estimate will be shaped by how many trials were conducted,
and how many successes were observed.  The static member functions
`negative_binomial_distribution<>::find_lower_bound_on_p` and
`negative_binomial_distribution<>::find_upper_bound_on_p`
allow you to calculate the confidence intervals for your estimate of the success fraction.

The sample program [@../../example/neg_binomial_confidence_limits.cpp
neg_binomial_confidence_limits.cpp] illustrates their use.

[import ../../example/neg_binomial_confidence_limits.cpp]

[neg_binomial_confidence_limits]
Let's see some sample output for a 1 in 10
success ratio, first for a mere 20 trials:

[pre'''______________________________________________
2-Sided Confidence Limits For Success Fraction
______________________________________________
Number of trials                         =  20
Number of successes                      =  2
Number of failures                       =  18
Observed frequency of occurrence         =  0.1
___________________________________________
Confidence        Lower          Upper
 Value (%)        Limit          Limit
___________________________________________
    50.000        0.04812        0.13554
    75.000        0.03078        0.17727
    90.000        0.01807        0.22637
    95.000        0.01235        0.26028
    99.000        0.00530        0.33111
    99.900        0.00164        0.41802
    99.990        0.00051        0.49202
    99.999        0.00016        0.55574
''']

As you can see, even at the 95% confidence level the bounds (0.012 to 0.26) are
really very wide, and very asymmetric about the observed value 0.1.

Compare that with the program output for a mass
2000 trials:

[pre'''______________________________________________
2-Sided Confidence Limits For Success Fraction
______________________________________________
Number of trials                         =  2000
Number of successes                      =  200
Number of failures                       =  1800
Observed frequency of occurrence         =  0.1
___________________________________________
Confidence        Lower          Upper
 Value (%)        Limit          Limit
___________________________________________
    50.000        0.09536        0.10445
    75.000        0.09228        0.10776
    90.000        0.08916        0.11125
    95.000        0.08720        0.11352
    99.000        0.08344        0.11802
    99.900        0.07921        0.12336
    99.990        0.07577        0.12795
    99.999        0.07282        0.13206
''']

Now even when the confidence level is very high, the limits (at 99.999%, 0.07 to 0.13) are really
quite close and nearly symmetric to the observed value of 0.1.

[endsect][/section:neg_binom_conf Calculating Confidence Limits on the Frequency of Occurrence]

[section:neg_binom_size_eg Estimating Sample Sizes for the Negative Binomial.]

Imagine you have an event
(let's call it a "failure" - though we could equally well call it a success if we felt it was a 'good' event)
that you know will occur in 1 in N trials.  You may want to know how many trials you need to
conduct to be P% sure of observing at least k such failures.
If the failure events follow a negative binomial
distribution (each trial either succeeds or fails)
then the static member function `negative_binomial_distibution<>::find_minimum_number_of_trials`
can be used to estimate the minimum number of trials required to be P% sure
of observing the desired number of failures.

The example program
[@../../example/neg_binomial_sample_sizes.cpp neg_binomial_sample_sizes.cpp]
demonstrates its usage.

[import ../../example/neg_binomial_sample_sizes.cpp]
[neg_binomial_sample_sizes]

[note Since we're calculating the /minimum/ number of trials required,
we'll err on the safe side and take the ceiling of the result.
Had we been calculating the
/maximum/ number of trials permitted to observe less than a certain
number of /failures/ then we would have taken the floor instead.  We
would also have called `find_minimum_number_of_trials` like this:
``
   floor(negative_binomial::find_minimum_number_of_trials(failures, p, alpha[i]))
``
which would give us the largest number of trials we could conduct and
still be P% sure of observing /failures or less/ failure events, when the
probability of success is /p/.]

We'll finish off by looking at some sample output, firstly suppose
we wish to observe at least 5 "failures" with a 50/50 (0.5) chance of
success or failure:

[pre
'''Target number of failures = 5,   Success fraction = 50%

____________________________
Confidence        Min Number
 Value (%)        Of Trials
____________________________
    50.000          11
    75.000          14
    90.000          17
    95.000          18
    99.000          22
    99.900          27
    99.990          31
    99.999          36
'''
]

So 18 trials or more would yield a 95% chance that at least our 5
required failures would be observed.

Compare that to what happens if the success ratio is 90%:

[pre'''Target number of failures = 5.000,   Success fraction = 90.000%

____________________________
Confidence        Min Number
 Value (%)        Of Trials
____________________________
    50.000          57
    75.000          73
    90.000          91
    95.000         103
    99.000         127
    99.900         159
    99.990         189
    99.999         217
''']

So now 103 trials are required to observe at least 5 failures with
95% certainty.

[endsect] [/section:neg_binom_size_eg Estimating Sample Sizes.]

[section:negative_binomial_example1 Negative Binomial example 1.]

The example program
[@../../example/negative_binomial_example1.cpp negative_binomial_example1.cpp (full source code)]
demonstrates a simple use to find the probability of meeting a sale quota.

Based on [@http://en.wikipedia.org/wiki/Negative_binomial_distribution
a problem by Dr. Diane Evans,
Professor of Mathematics at Rose-Hulman Institute of Technology].

Pat is required to sell candy bars to raise money for the 6th grade field trip.
There are thirty houses in the neighborhood,
and Pat is not supposed to return home until five candy bars have been sold.
So the child goes door to door, selling candy bars.
At each house, there is a 0.4 probability (40%) of selling one candy bar
and a 0.6 probability (60%) of selling nothing.

What is the probability mass (density) function for selling the last (fifth)
candy bar at the nth house?

The Negative Binomial(r, p) distribution describes the probability of k failures
and r successes in k+r Bernoulli(p) trials with success on the last trial.
(A [@http://en.wikipedia.org/wiki/Bernoulli_distribution Bernoulli trial]
is one with only two possible outcomes, success of failure,
and p is the probability of success).
Selling five candy bars means getting five successes, so successes r = 5.
The total number of trials (n) in this case, houses visited) this takes is therefore
  = sucesses + failures or k + r = k + 5.
The random variable we are interested in is the number of houses (k)
that must be visited to sell five candy bars,
so we substitute k = n 5 into a negative_binomial(5, 0.4) mass (density) function
and obtain the following mass (density) function of the distribution of houses (for n >= 5):
Obviously, the best case is that Pat makes sales on all the first five houses.

What is the probability that Pat finishes /on the tenth house/?

  f(10) = 0.1003290624, or about 1 in 10

What is the probability that Pat finishes /on or before/ reaching the eighth house?

To finish on or before the eighth house,
Pat must finish at the fifth, sixth, seventh, or eighth house.
Sum those probabilities:

   f(5) = 0.01024
   f(6) = 0.03072
   f(7) = 0.055296
   f(8) = 0.0774144
   sum {j=5 to 8} f(j) = 0.17367

What is the probability that Pat exhausts all 30 houses in the neighborhood,
and still doesn't sell the required 5 candy bars?

1 - sum{j=5 to 30} f(j) = 1 - incomplete beta (p = 0.4)(5, 30-5+1) =~ 1 - 0.99849 = 0.00151 = 0.15%.

See also [@ http://en.wikipedia.org/wiki/Bernoulli_distribution Bernoulli distribution]
and [@http://www.math.uah.edu/stat/bernoulli/Introduction.xhtml Bernoulli applications].

In this example, we will deliberately produce a variety of calculations
and outputs to demonstrate the ways that the negative binomial distribution
can be implemented with this library,
and it is also deliberately over-commented.

[import ../../example/negative_binomial_example1.cpp]
[negative_binomial_eg1_1]

[endsect] [/section:negative_binomial_example1]

[section:negative_binomial_example2 Negative Binomial example 2.]
Example program showing output of a table of values of cdf and pdf for various k failures.
[import ../../example/negative_binomial_example2.cpp]
[neg_binomial_example2]
[neg_binomial_example2_1]
[endsect] [/section:negative_binomial_example1 Negative Binomial example 2.]

[section:negative_binomial_example3 Negative Binomial example 3.]
The example program
[@../../example/negative_binomial_example3.cpp negative_binomial_example3.cpp (full source code)]
demonstrates example from K. Krishnamoorthy.
[import ../../example/negative_binomial_example3.cpp]
[neg_binomial_example3]
[neg_binomial_example3_1]
[endsect] [/section:negative_binomial_example1 Negative Binomial example 3.]

[endsect] [/section:neg_binom_eg Negative Binomial Distribution Examples]