mirror of
https://github.com/boostorg/math.git
synced 2026-01-19 04:22:09 +00:00
133 lines
11 KiB
HTML
133 lines
11 KiB
HTML
<html>
|
||
<head>
|
||
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
|
||
<title>Chatterjee Correlation</title>
|
||
<link rel="stylesheet" href="../math.css" type="text/css">
|
||
<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
|
||
<link rel="home" href="../index.html" title="Math Toolkit 4.2.1">
|
||
<link rel="up" href="../statistics.html" title="Chapter 6. Statistics">
|
||
<link rel="prev" href="linear_regression.html" title="Linear Regression">
|
||
<link rel="next" href="../vector_functionals.html" title="Chapter 7. Vector Functionals - Norms">
|
||
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||
</head>
|
||
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
|
||
<table cellpadding="2" width="100%"><tr>
|
||
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../boost.png"></td>
|
||
<td align="center"><a href="../../../../../index.html">Home</a></td>
|
||
<td align="center"><a href="../../../../../libs/libraries.htm">Libraries</a></td>
|
||
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
|
||
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
|
||
<td align="center"><a href="../../../../../more/index.htm">More</a></td>
|
||
</tr></table>
|
||
<hr>
|
||
<div class="spirit-nav">
|
||
<a accesskey="p" href="linear_regression.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../statistics.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../vector_functionals.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
|
||
</div>
|
||
<div class="section">
|
||
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
|
||
<a name="math_toolkit.chatterjee_correlation"></a><a class="link" href="chatterjee_correlation.html" title="Chatterjee Correlation">Chatterjee Correlation</a>
|
||
</h2></div></div></div>
|
||
<h4>
|
||
<a name="math_toolkit.chatterjee_correlation.h0"></a>
|
||
<span class="phrase"><a name="math_toolkit.chatterjee_correlation.synopsis"></a></span><a class="link" href="chatterjee_correlation.html#math_toolkit.chatterjee_correlation.synopsis">Synopsis</a>
|
||
</h4>
|
||
<pre class="programlisting"><span class="preprocessor">#include</span> <span class="special"><</span><span class="identifier">boost</span><span class="special">/</span><span class="identifier">math</span><span class="special">/</span><span class="identifier">statistics</span><span class="special">/</span><span class="identifier">chatterjee_correlation</span><span class="special">.</span><span class="identifier">hpp</span><span class="special">></span>
|
||
|
||
<span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">statistics</span> <span class="special">{</span>
|
||
|
||
<span class="identifier">C</span><span class="special">++</span><span class="number">17</span><span class="special">:</span>
|
||
<span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">ExecutionPolicy</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">Container</span><span class="special">></span>
|
||
<span class="keyword">auto</span> <span class="identifier">chatterjee_correlation</span><span class="special">(</span><span class="identifier">ExecutionPolicy</span><span class="special">&&</span> <span class="identifier">exec</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">Container</span><span class="special">&</span> <span class="identifier">u</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">Container</span><span class="special">&</span> <span class="identifier">v</span><span class="special">);</span>
|
||
|
||
<span class="identifier">C</span><span class="special">++</span><span class="number">11</span><span class="special">:</span>
|
||
<span class="keyword">template</span> <span class="special"><</span><span class="keyword">typename</span> <span class="identifier">Container</span><span class="special">></span>
|
||
<span class="keyword">auto</span> <span class="identifier">chatterjee_correlation</span><span class="special">(</span><span class="keyword">const</span> <span class="identifier">Container</span><span class="special">&</span> <span class="identifier">u</span><span class="special">,</span> <span class="keyword">const</span> <span class="identifier">Container</span><span class="special">&</span> <span class="identifier">v</span><span class="special">);</span>
|
||
<span class="special">}</span>
|
||
</pre>
|
||
<h4>
|
||
<a name="math_toolkit.chatterjee_correlation.h1"></a>
|
||
<span class="phrase"><a name="math_toolkit.chatterjee_correlation.description"></a></span><a class="link" href="chatterjee_correlation.html#math_toolkit.chatterjee_correlation.description">Description</a>
|
||
</h4>
|
||
<p>
|
||
The classical correlation coefficients like the Pearson's correlation are useful
|
||
primarily for distinguishing when one dataset depends linearly on another.
|
||
However, Pearson's correlation coefficient has a known weakness in that when
|
||
the dependent variable has an obvious functional relationship with the independent
|
||
variable, the value of the correlation coefficient can take on any value. As
|
||
Chatterjee says:
|
||
</p>
|
||
<p>
|
||
> Ideally, one would like a coefficient that approaches its maximum value
|
||
if and only if one variable looks more and more like a noiseless function of
|
||
the other, just as Pearson correlation is close to its maximum value if and
|
||
only if one variable is close to being a noiseless linear function of the other.
|
||
</p>
|
||
<p>
|
||
This is the problem Chatterjee's coefficient solves. Let X and Y be random
|
||
variables, where Y is not constant, and let (X_i, Y_i) be samples from this
|
||
distribution. Rearrange these samples so that X_(0) < X_{(1)} < ... X_{(n-1)}
|
||
and create (R(X_{(i)}), R(Y_{(i)})). The Chatterjee correlation is then given
|
||
by
|
||
</p>
|
||
<p>
|
||
<span class="inlinemediaobject"><object type="image/svg+xml" data="../../equations/chatterjee_correlation.svg"></object></span>
|
||
</p>
|
||
<p>
|
||
In the limit of an infinite amount of i.i.d data, the statistic lies in [0,
|
||
1]. However, if the data is not infinite, the statistic may be negative. If
|
||
X and Y are independent, the value is zero, and if Y is a measurable function
|
||
of X, then the statistic is unity. The complexity is O(n log n).
|
||
</p>
|
||
<p>
|
||
An example is given below:
|
||
</p>
|
||
<pre class="programlisting"><span class="identifier">std</span><span class="special">::</span><span class="identifier">vector</span><span class="special"><</span><span class="keyword">double</span><span class="special">></span> <span class="identifier">X</span><span class="special">{</span><span class="number">1</span><span class="special">,</span><span class="number">2</span><span class="special">,</span><span class="number">3</span><span class="special">,</span><span class="number">4</span><span class="special">,</span><span class="number">5</span><span class="special">};</span>
|
||
<span class="identifier">std</span><span class="special">::</span><span class="identifier">vector</span><span class="special"><</span><span class="keyword">double</span><span class="special">></span> <span class="identifier">Y</span><span class="special">{</span><span class="number">1</span><span class="special">,</span><span class="number">2</span><span class="special">,</span><span class="number">3</span><span class="special">,</span><span class="number">4</span><span class="special">,</span><span class="number">5</span><span class="special">};</span>
|
||
<span class="keyword">using</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">statistics</span><span class="special">::</span><span class="identifier">chatterjee_correlation</span><span class="special">;</span>
|
||
<span class="keyword">double</span> <span class="identifier">coeff</span> <span class="special">=</span> <span class="identifier">chatterjee_correlation</span><span class="special">(</span><span class="identifier">X</span><span class="special">,</span> <span class="identifier">Y</span><span class="special">);</span>
|
||
</pre>
|
||
<p>
|
||
The implementation follows <a href="https://arxiv.org/pdf/1909.10140.pdf" target="_top">Chatterjee's
|
||
paper</a>.
|
||
</p>
|
||
<p>
|
||
<span class="emphasis"><em>Nota bene:</em></span> If the input is an integer type the output
|
||
will be a double precision type.
|
||
</p>
|
||
<h4>
|
||
<a name="math_toolkit.chatterjee_correlation.h2"></a>
|
||
<span class="phrase"><a name="math_toolkit.chatterjee_correlation.invariants"></a></span><a class="link" href="chatterjee_correlation.html#math_toolkit.chatterjee_correlation.invariants">Invariants</a>
|
||
</h4>
|
||
<p>
|
||
The function expects at least two samples, a non-constant vector Y, and the
|
||
same number of X's as Y's. If Y is constant, the result is a quiet NaN. The
|
||
data set must be sorted by X values. If there are ties in the values of X,
|
||
then the statistic is random due to the random breaking of ties. Of course,
|
||
random numbers are not used internally, but the result is not guaranteed to
|
||
be identical on different systems.
|
||
</p>
|
||
<h4>
|
||
<a name="math_toolkit.chatterjee_correlation.h3"></a>
|
||
<span class="phrase"><a name="math_toolkit.chatterjee_correlation.references"></a></span><a class="link" href="chatterjee_correlation.html#math_toolkit.chatterjee_correlation.references">References</a>
|
||
</h4>
|
||
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; "><li class="listitem">
|
||
Chatterjee, Sourav. "A new coefficient of correlation." Journal
|
||
of the American Statistical Association 116.536 (2021): 2009-2022.
|
||
</li></ul></div>
|
||
</div>
|
||
<div class="copyright-footer">Copyright © 2006-2021 Nikhar Agrawal, Anton Bikineev, Matthew Borland,
|
||
Paul A. Bristow, Marco Guazzone, Christopher Kormanyos, Hubert Holin, Bruno
|
||
Lalande, John Maddock, Evan Miller, Jeremy Murphy, Matthew Pulver, Johan Råde,
|
||
Gautam Sewani, Benjamin Sobotta, Nicholas Thompson, Thijs van den Berg, Daryle
|
||
Walker and Xiaogang Zhang<p>
|
||
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
||
file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
|
||
</p>
|
||
</div>
|
||
<hr>
|
||
<div class="spirit-nav">
|
||
<a accesskey="p" href="linear_regression.html"><img src="../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../statistics.html"><img src="../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../vector_functionals.html"><img src="../../../../../doc/src/images/next.png" alt="Next"></a>
|
||
</div>
|
||
</body>
|
||
</html>
|