2
0
mirror of https://github.com/boostorg/math.git synced 2026-01-19 04:22:09 +00:00
Files
math/doc/html/math_toolkit/chatterjee_correlation.html
2024-08-06 13:18:32 +01:00

133 lines
11 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Chatterjee Correlation</title>
<link rel="stylesheet" href="../math.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
<link rel="home" href="../index.html" title="Math Toolkit 4.2.1">
<link rel="up" href="../statistics.html" title="Chapter 6. Statistics">
<link rel="prev" href="linear_regression.html" title="Linear Regression">
<link rel="next" href="../vector_functionals.html" title="Chapter 7. Vector Functionals - Norms">
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<table cellpadding="2" width="100%"><tr>
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td>
<td align="center"><a href="../../../../../index.html">Home</a></td>
<td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td>
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
<td align="center"><a href="../../../../../more/index.htm">More</a></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="linear_regression.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../statistics.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../vector_functionals.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="math_toolkit.chatterjee_correlation"></a><a class="link" href="chatterjee_correlation.html" title="Chatterjee Correlation">Chatterjee Correlation</a>
</h2></div></div></div>
<h4>
<a name="math_toolkit.chatterjee_correlation.h0"></a>
<span class="phrase"><a name="math_toolkit.chatterjee_correlation.synopsis"></a></span><a class="link" href="chatterjee_correlation.html#math_toolkit.chatterjee_correlation.synopsis">Synopsis</a>
</h4>
<pre class="programlisting"><span class="preprocessor">#include</span> <span class="special">&lt;</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">math</span><span class="special">/</span><span class="identifier">statistics</span><span class="special">/</span><span class="identifier">chatterjee_correlation</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">&gt;</span>
<span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">statistics</span> <span class="special">{</span>
<span class="identifier">C</span><span class="special">++</span><span class="number">17</span><span class="special">:</span>
<span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">ExecutionPolicy</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">Container</span><span class="special">&gt;</span>
<span class="keyword">auto</span> <span class="identifier">chatterjee_correlation</span><span class="special">(</span><span class="identifier">ExecutionPolicy</span><span class="special">&amp;&amp;</span> <span class="identifier">exec</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">Container</span><span class="special">&amp;</span> <span class="identifier">u</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">Container</span><span class="special">&amp;</span> <span class="identifier">v</span><span class="special">);</span>
<span class="identifier">C</span><span class="special">++</span><span class="number">11</span><span class="special">:</span>
<span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">Container</span><span class="special">&gt;</span>
<span class="keyword">auto</span> <span class="identifier">chatterjee_correlation</span><span class="special">(</span><span class="keyword">const</span> <span class="identifier">Container</span><span class="special">&amp;</span> <span class="identifier">u</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">Container</span><span class="special">&amp;</span> <span class="identifier">v</span><span class="special">);</span>
<span class="special">}</span>
</pre>
<h4>
<a name="math_toolkit.chatterjee_correlation.h1"></a>
<span class="phrase"><a name="math_toolkit.chatterjee_correlation.description"></a></span><a class="link" href="chatterjee_correlation.html#math_toolkit.chatterjee_correlation.description">Description</a>
</h4>
<p>
The classical correlation coefficients like the Pearson's correlation are useful
primarily for distinguishing when one dataset depends linearly on another.
However, Pearson's correlation coefficient has a known weakness in that when
the dependent variable has an obvious functional relationship with the independent
variable, the value of the correlation coefficient can take on any value. As
Chatterjee says:
</p>
<p>
&gt; Ideally, one would like a coefficient that approaches its maximum value
if and only if one variable looks more and more like a noiseless function of
the other, just as Pearson correlation is close to its maximum value if and
only if one variable is close to being a noiseless linear function of the other.
</p>
<p>
This is the problem Chatterjee's coefficient solves. Let X and Y be random
variables, where Y is not constant, and let (X_i, Y_i) be samples from this
distribution. Rearrange these samples so that X_(0) &lt; X_{(1)} &lt; ... X_{(n-1)}
and create (R(X_{(i)}), R(Y_{(i)})). The Chatterjee correlation is then given
by
</p>
<p>
<span class="inlinemediaobject"><object type="image/svg+xml" data="../../equations/chatterjee_correlation.svg"></object></span>
</p>
<p>
In the limit of an infinite amount of i.i.d data, the statistic lies in [0,
1]. However, if the data is not infinite, the statistic may be negative. If
X and Y are independent, the value is zero, and if Y is a measurable function
of X, then the statistic is unity. The complexity is O(n log n).
</p>
<p>
An example is given below:
</p>
<pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">vector</span><span class="special">&lt;</span><span class="keyword">double</span><span class="special">&gt;</span> <span class="identifier">X</span><span class="special">{</span><span class="number">1</span><span class="special">,</span><span class="number">2</span><span class="special">,</span><span class="number">3</span><span class="special">,</span><span class="number">4</span><span class="special">,</span><span class="number">5</span><span class="special">};</span>
<span class="identifier">std</span><span class="special">::</span><span class="identifier">vector</span><span class="special">&lt;</span><span class="keyword">double</span><span class="special">&gt;</span> <span class="identifier">Y</span><span class="special">{</span><span class="number">1</span><span class="special">,</span><span class="number">2</span><span class="special">,</span><span class="number">3</span><span class="special">,</span><span class="number">4</span><span class="special">,</span><span class="number">5</span><span class="special">};</span>
<span class="keyword">using</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">statistics</span><span class="special">::</span><span class="identifier">chatterjee_correlation</span><span class="special">;</span>
<span class="keyword">double</span> <span class="identifier">coeff</span> <span class="special">=</span> <span class="identifier">chatterjee_correlation</span><span class="special">(</span><span class="identifier">X</span><span class="special">,</span> <span class="identifier">Y</span><span class="special">);</span>
</pre>
<p>
The implementation follows <a href="https://arxiv.org/pdf/1909.10140.pdf" target="_top">Chatterjee's
paper</a>.
</p>
<p>
<span class="emphasis"><em>Nota bene:</em></span> If the input is an integer type the output
will be a double precision type.
</p>
<h4>
<a name="math_toolkit.chatterjee_correlation.h2"></a>
<span class="phrase"><a name="math_toolkit.chatterjee_correlation.invariants"></a></span><a class="link" href="chatterjee_correlation.html#math_toolkit.chatterjee_correlation.invariants">Invariants</a>
</h4>
<p>
The function expects at least two samples, a non-constant vector Y, and the
same number of X's as Y's. If Y is constant, the result is a quiet NaN. The
data set must be sorted by X values. If there are ties in the values of X,
then the statistic is random due to the random breaking of ties. Of course,
random numbers are not used internally, but the result is not guaranteed to
be identical on different systems.
</p>
<h4>
<a name="math_toolkit.chatterjee_correlation.h3"></a>
<span class="phrase"><a name="math_toolkit.chatterjee_correlation.references"></a></span><a class="link" href="chatterjee_correlation.html#math_toolkit.chatterjee_correlation.references">References</a>
</h4>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem">
Chatterjee, Sourav. "A new coefficient of correlation." Journal
of the American Statistical Association 116.536 (2021): 2009-2022.
</li></ul></div>
</div>
<div class="copyright-footer">Copyright © 2006-2021 Nikhar Agrawal, Anton Bikineev, Matthew Borland,
Paul A. Bristow, Marco Guazzone, Christopher Kormanyos, Hubert Holin, Bruno
Lalande, John Maddock, Evan Miller, Jeremy Murphy, Matthew Pulver, Johan Råde,
Gautam Sewani, Benjamin Sobotta, Nicholas Thompson, Thijs van den Berg, Daryle
Walker and Xiaogang Zhang<p>
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
</p>
</div>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="linear_regression.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../statistics.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../vector_functionals.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
</body>
</html>