2
0
mirror of https://github.com/boostorg/math.git synced 2026-01-23 17:52:09 +00:00
Files
math/doc/html/math_toolkit/stat_tut/overview/complements.html
2024-08-06 13:18:32 +01:00

200 lines
14 KiB
HTML

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Complements are supported too - and when to use them</title>
<link rel="stylesheet" href="../../../math.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
<link rel="home" href="../../../index.html" title="Math Toolkit 4.2.1">
<link rel="up" href="../overview.html" title="Overview of Statistical Distributions">
<link rel="prev" href="generic.html" title="Generic operations common to all distributions are non-member functions">
<link rel="next" href="parameters.html" title="Parameters can be calculated">
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<table cellpadding="2" width="100%"><tr>
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../boost.png"></td>
<td align="center"><a href="../../../../../../../index.html">Home</a></td>
<td align="center"><a href="../../../../../../../libs/libraries.htm">Libraries</a></td>
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
<td align="center"><a href="../../../../../../../more/index.htm">More</a></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="generic.html"><img src="../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../overview.html"><img src="../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../index.html"><img src="../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="parameters.html"><img src="../../../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
<div class="section">
<div class="titlepage"><div><div><h4 class="title">
<a name="math_toolkit.stat_tut.overview.complements"></a><a class="link" href="complements.html" title="Complements are supported too - and when to use them">Complements
are supported too - and when to use them</a>
</h4></div></div></div>
<p>
Often you don't want the value of the CDF, but its complement, which is
to say <code class="computeroutput"><span class="number">1</span><span class="special">-</span><span class="identifier">p</span></code> rather than <code class="computeroutput"><span class="identifier">p</span></code>.
It is tempting to calculate the CDF and subtract it from <code class="computeroutput"><span class="number">1</span></code>, but if <code class="computeroutput"><span class="identifier">p</span></code>
is very close to <code class="computeroutput"><span class="number">1</span></code> then cancellation
error will cause you to lose accuracy, perhaps totally.
</p>
<p>
<a class="link" href="complements.html#why_complements">See below <span class="emphasis"><em>"Why and when
to use complements?"</em></span></a>
</p>
<p>
In this library, whenever you want to receive a complement, just wrap all
the function arguments in a call to <code class="computeroutput"><span class="identifier">complement</span><span class="special">(...)</span></code>, for example:
</p>
<pre class="programlisting"><span class="identifier">students_t</span> <span class="identifier">dist</span><span class="special">(</span><span class="number">5</span><span class="special">);</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">"CDF at t = 1 is "</span> <span class="special">&lt;&lt;</span> <span class="identifier">cdf</span><span class="special">(</span><span class="identifier">dist</span><span class="special">,</span> <span class="number">1.0</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">endl</span><span class="special">;</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">"Complement of CDF at t = 1 is "</span> <span class="special">&lt;&lt;</span> <span class="identifier">cdf</span><span class="special">(</span><span class="identifier">complement</span><span class="special">(</span><span class="identifier">dist</span><span class="special">,</span> <span class="number">1.0</span><span class="special">))</span> <span class="special">&lt;&lt;</span> <span class="identifier">endl</span><span class="special">;</span>
</pre>
<p>
But wait, now that we have a complement, we have to be able to use it as
well. Any function that accepts a probability as an argument can also accept
a complement by wrapping all of its arguments in a call to <code class="computeroutput"><span class="identifier">complement</span><span class="special">(...)</span></code>,
for example:
</p>
<pre class="programlisting"><span class="identifier">students_t</span> <span class="identifier">dist</span><span class="special">(</span><span class="number">5</span><span class="special">);</span>
<span class="keyword">for</span><span class="special">(</span><span class="keyword">double</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">10</span><span class="special">;</span> <span class="identifier">i</span> <span class="special">&lt;</span> <span class="number">1e10</span><span class="special">;</span> <span class="identifier">i</span> <span class="special">*=</span> <span class="number">10</span><span class="special">)</span>
<span class="special">{</span>
<span class="comment">// Calculate the quantile for a 1 in i chance:</span>
<span class="keyword">double</span> <span class="identifier">t</span> <span class="special">=</span> <span class="identifier">quantile</span><span class="special">(</span><span class="identifier">complement</span><span class="special">(</span><span class="identifier">dist</span><span class="special">,</span> <span class="number">1</span><span class="special">/</span><span class="identifier">i</span><span class="special">));</span>
<span class="comment">// Print it out:</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">"Quantile of students-t with 5 degrees of freedom\n"</span>
<span class="string">"for a 1 in "</span> <span class="special">&lt;&lt;</span> <span class="identifier">i</span> <span class="special">&lt;&lt;</span> <span class="string">" chance is "</span> <span class="special">&lt;&lt;</span> <span class="identifier">t</span> <span class="special">&lt;&lt;</span> <span class="identifier">endl</span><span class="special">;</span>
<span class="special">}</span>
</pre>
<div class="tip"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../../../../../../../doc/src/images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top">
<p>
<span class="bold"><strong>Critical values are just quantiles</strong></span>
</p>
<p>
Some texts talk about quantiles, or percentiles or fractiles, others
about critical values, the basic rule is:
</p>
<p>
<span class="emphasis"><em>Lower critical values</em></span> are the same as the quantile.
</p>
<p>
<span class="emphasis"><em>Upper critical values</em></span> are the same as the quantile
from the complement of the probability.
</p>
<p>
For example, suppose we have a Bernoulli process, giving rise to a binomial
distribution with success ratio 0.1 and 100 trials in total. The <span class="emphasis"><em>lower
critical value</em></span> for a probability of 0.05 is given by:
</p>
<p>
<code class="computeroutput"><span class="identifier">quantile</span><span class="special">(</span><span class="identifier">binomial</span><span class="special">(</span><span class="number">100</span><span class="special">,</span> <span class="number">0.1</span><span class="special">),</span> <span class="number">0.05</span><span class="special">)</span></code>
</p>
<p>
and the <span class="emphasis"><em>upper critical value</em></span> is given by:
</p>
<p>
<code class="computeroutput"><span class="identifier">quantile</span><span class="special">(</span><span class="identifier">complement</span><span class="special">(</span><span class="identifier">binomial</span><span class="special">(</span><span class="number">100</span><span class="special">,</span> <span class="number">0.1</span><span class="special">),</span> <span class="number">0.05</span><span class="special">))</span></code>
</p>
<p>
which return 4.82 and 14.63 respectively.
</p>
</td></tr>
</table></div>
<a name="why_complements"></a><div class="tip"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../../../../../../../doc/src/images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top">
<p>
<span class="bold"><strong>Why bother with complements anyway?</strong></span>
</p>
<p>
It's very tempting to dispense with complements, and simply subtract
the probability from 1 when required. However, consider what happens
when the probability is very close to 1: let's say the probability expressed
at float precision is <code class="computeroutput"><span class="number">0.999999940f</span></code>,
then <code class="computeroutput"><span class="number">1</span> <span class="special">-</span>
<span class="number">0.999999940f</span> <span class="special">=</span>
<span class="number">5.96046448e-008</span></code>, but the result
is actually accurate to just <span class="emphasis"><em>one single bit</em></span>: the
only bit that didn't cancel out!
</p>
<p>
Or to look at this another way: consider that we want the risk of falsely
rejecting the null-hypothesis in the Student's t test to be 1 in 1 billion,
for a sample size of 10,000. This gives a probability of 1 - 10<sup>-9</sup>, which
is exactly 1 when calculated at float precision. In this case calculating
the quantile from the complement neatly solves the problem, so for example:
</p>
<p>
<code class="computeroutput"><span class="identifier">quantile</span><span class="special">(</span><span class="identifier">complement</span><span class="special">(</span><span class="identifier">students_t</span><span class="special">(</span><span class="number">10000</span><span class="special">),</span> <span class="number">1e-9</span><span class="special">))</span></code>
</p>
<p>
returns the expected t-statistic <code class="computeroutput"><span class="number">6.00336</span></code>,
where as:
</p>
<p>
<code class="computeroutput"><span class="identifier">quantile</span><span class="special">(</span><span class="identifier">students_t</span><span class="special">(</span><span class="number">10000</span><span class="special">),</span> <span class="number">1</span><span class="special">-</span><span class="number">1e-9f</span><span class="special">)</span></code>
</p>
<p>
raises an overflow error, since it is the same as:
</p>
<p>
<code class="computeroutput"><span class="identifier">quantile</span><span class="special">(</span><span class="identifier">students_t</span><span class="special">(</span><span class="number">10000</span><span class="special">),</span> <span class="number">1</span><span class="special">)</span></code>
</p>
<p>
Which has no finite result.
</p>
<p>
With all distributions, even for more reasonable probability (unless
the value of p can be represented exactly in the floating-point type)
the loss of accuracy quickly becomes significant if you simply calculate
probability from 1 - p (because it will be mostly garbage digits for
p ~ 1).
</p>
<p>
So always avoid, for example, using a probability near to unity like
0.99999
</p>
<p>
<code class="computeroutput"><span class="identifier">quantile</span><span class="special">(</span><span class="identifier">my_distribution</span><span class="special">,</span>
<span class="number">0.99999</span><span class="special">)</span></code>
</p>
<p>
and instead use
</p>
<p>
<code class="computeroutput"><span class="identifier">quantile</span><span class="special">(</span><span class="identifier">complement</span><span class="special">(</span><span class="identifier">my_distribution</span><span class="special">,</span>
<span class="number">0.00001</span><span class="special">))</span></code>
</p>
<p>
since 1 - 0.99999 is not exactly equal to 0.00001 when using floating-point
arithmetic.
</p>
<p>
This assumes that the 0.00001 value is either a constant, or can be computed
by some manner other than subtracting 0.99999 from 1.
</p>
</td></tr>
</table></div>
</div>
<div class="copyright-footer">Copyright © 2006-2021 Nikhar Agrawal, Anton Bikineev, Matthew Borland,
Paul A. Bristow, Marco Guazzone, Christopher Kormanyos, Hubert Holin, Bruno
Lalande, John Maddock, Evan Miller, Jeremy Murphy, Matthew Pulver, Johan Råde,
Gautam Sewani, Benjamin Sobotta, Nicholas Thompson, Thijs van den Berg, Daryle
Walker and Xiaogang Zhang<p>
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
</p>
</div>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="generic.html"><img src="../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../overview.html"><img src="../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../index.html"><img src="../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="parameters.html"><img src="../../../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
</body>
</html>