2
0
mirror of https://github.com/boostorg/math.git synced 2026-01-19 04:22:09 +00:00

Chatterjee Correlation Coefficient (#770)

* Implement rank vector

[ci skip]

* Add documentation. Admittedly terrible.

* Add unit tests.

* Cleanup method of detecting if execution policies are valid or not

[ci skip]

* Implement and test chatterjee correlation

[ci skip]

* Add spot checks and special handling for constant Y

[ci skip]

* Add performance file

[ci skip]

* Add execution policy support to rank

[ci skip]

* Remove duplicates from v when generating the order vector

[ci skip]

* Fix macro error for use of <execution>

[ci skip]

* Use explicit types instead of auto to avoid warnings 

[ci skip]

* Add execution policy testing to rank

[ci skip]

* Add threaded implementation

[ci skip]

* Added threaded testing

* Fix formatting and ASCII issues in test

* Fix more ASCII issues

* refactoring

* Fix threaded impl

* Remove non-ASCII apostrophe

[ci skip]

* Doc fixes and add test comparing generally to paper values

* Significantly tighten tolerance around expected values from paper

* Change tolerance for sin comparison

Co-authored-by: Nick Thompson <nathompson7@protonmail.com>
This commit is contained in:
Matt Borland
2022-05-25 08:13:24 -07:00
committed by GitHub
parent 4d5cc6972e
commit e5eae18f14
13 changed files with 719 additions and 15 deletions

View File

@@ -0,0 +1,74 @@
[/
Copyright 2022 Matt Borland
Distributed under the Boost Software License, Version 1.0.
(See accompanying file LICENSE_1_0.txt or copy at
http://www.boost.org/LICENSE_1_0.txt).
]
[section:chatterjee_correlation Chatterjee Correlation]
[heading Synopsis]
``
#include <boost/math/statistics/chatterjee_correlation.hpp>
namespace boost::math::statistics {
C++17:
template <typename ExecutionPolicy, typename Container>
auto chatterjee_correlation(ExecutionPolicy&& exec, const Container& u, const Container& v);
C++11:
template <typename Container>
auto chatterjee_correlation(const Container& u, const Container& v);
}
``
[heading Description]
The classical correlation coefficients like the Pearson's correlation are useful primarily for distinguishing when one dataset depends linearly on another.
However, Pearson's correlation coefficient has a known weakness in that when the dependent variable has an obvious functional relationship with the independent variable, the value of the correlation coefficient can take on any value.
As Chatterjee says:
> Ideally, one would like a coefficient that approaches
its maximum value if and only if one variable looks more and more like a
noiseless function of the other, just as Pearson correlation is close to its maximum value if and only if one variable is close to being a noiseless linear function of the other.
This is the problem Chatterjee's coefficient solves.
Let X and Y be random variables, where Y is not constant, and let (X_i, Y_i) be samples from this distribution.
Rearrange these samples so that X_(0) < X_{(1)} < ... X_{(n-1)} and create (R(X_{(i)}), R(Y_{(i)})).
The Chatterjee correlation is then given by
[$../equations/chatterjee_correlation.svg]
In the limit of an infinite amount of i.i.d data, the statistic lies in [0, 1].
However, if the data is not infinite, the statistic may be negative.
If X and Y are independent, the value is zero, and if Y is a measurable function of X, then the statistic is unity.
The complexity is O(n log n).
An example is given below:
std::vector<double> X{1,2,3,4,5};
std::vector<double> Y{1,2,3,4,5};
using boost::math::statistics::chatterjee_correlation;
double coeff = chatterjee_correlation(X, Y);
The implementation follows [@https://arxiv.org/pdf/1909.10140.pdf Chatterjee's paper].
/Nota bene:/ If the input is an integer type the output will be a double precision type.
[heading Invariants]
The function expects at least two samples, a non-constant vector Y, and the same number of X's as Y's.
If Y is constant, the result is a quiet NaN.
The data set must be sorted by X values.
If there are ties in the values of X, then the statistic is random due to the random breaking of ties.
Of course, random numbers are not used internally, but the result is not guaranteed to be identical on different systems.
[heading References]
* Chatterjee, Sourav. "A new coefficient of correlation." Journal of the American Statistical Association 116.536 (2021): 2009-2022.
[endsect]
[/section:chatterjee_correlation Chatterjee Correlation]

View File

@@ -18,13 +18,10 @@
#include <boost/math/tools/assert.hpp>
#include <boost/math/tools/config.hpp>
// Support compilers with P0024R2 implemented without linking TBB
// https://en.cppreference.com/w/cpp/compiler_support
#if !defined(BOOST_NO_CXX17_HDR_EXECUTION) && defined(BOOST_HAS_THREADS)
#ifdef BOOST_MATH_EXEC_COMPATIBLE
#include <execution>
#include <future>
#include <thread>
#define EXEC_COMPATIBLE
#endif
namespace boost{ namespace math{ namespace statistics { namespace detail {
@@ -60,7 +57,7 @@ ReturnType means_and_covariance_seq_impl(ForwardIterator u_begin, ForwardIterato
return std::make_tuple(mu_u, mu_v, cov/i, Real(i));
}
#ifdef EXEC_COMPATIBLE
#ifdef BOOST_MATH_EXEC_COMPATIBLE
// Numerically stable parallel computation of (co-)variance
// https://dl.acm.org/doi/10.1145/3221269.3223036
@@ -154,7 +151,7 @@ ReturnType means_and_covariance_parallel_impl(ForwardIterator u_begin, ForwardIt
return std::make_tuple(mu_u_a, mu_v_a, cov_a, n_a);
}
#endif // EXEC_COMPATIBLE
#endif // BOOST_MATH_EXEC_COMPATIBLE
template<typename ReturnType, typename ForwardIterator>
ReturnType correlation_coefficient_seq_impl(ForwardIterator u_begin, ForwardIterator u_end, ForwardIterator v_begin, ForwardIterator v_end)
@@ -204,7 +201,7 @@ ReturnType correlation_coefficient_seq_impl(ForwardIterator u_begin, ForwardIter
return std::make_tuple(mu_u, Qu, mu_v, Qv, cov, rho, Real(i));
}
#ifdef EXEC_COMPATIBLE
#ifdef BOOST_MATH_EXEC_COMPATIBLE
// Numerically stable parallel computation of (co-)variance:
// https://dl.acm.org/doi/10.1145/3221269.3223036
@@ -324,11 +321,11 @@ ReturnType correlation_coefficient_parallel_impl(ForwardIterator u_begin, Forwar
return std::make_tuple(mu_u_a, Qu_a, mu_v_a, Qv_a, cov_a, rho, n_a);
}
#endif // EXEC_COMPATIBLE
#endif // BOOST_MATH_EXEC_COMPATIBLE
} // namespace detail
#ifdef EXEC_COMPATIBLE
#ifdef BOOST_MATH_EXEC_COMPATIBLE
template<typename ExecutionPolicy, typename Container, typename Real = typename Container::value_type>
inline auto means_and_covariance(ExecutionPolicy&& exec, Container const & u, Container const & v)

View File

@@ -0,0 +1,159 @@
// (C) Copyright Matt Borland 2022.
// Use, modification and distribution are subject to the
// Boost Software License, Version 1.0. (See accompanying file
// LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
#ifndef BOOST_MATH_STATISTICS_CHATTERJEE_CORRELATION_HPP
#define BOOST_MATH_STATISTICS_CHATTERJEE_CORRELATION_HPP
#include <cstdint>
#include <cmath>
#include <algorithm>
#include <iterator>
#include <vector>
#include <limits>
#include <utility>
#include <type_traits>
#include <boost/math/tools/assert.hpp>
#include <boost/math/tools/config.hpp>
#include <boost/math/statistics/detail/rank.hpp>
#ifdef BOOST_MATH_EXEC_COMPATIBLE
#include <execution>
#include <future>
#include <thread>
#endif
namespace boost { namespace math { namespace statistics {
namespace detail {
template <typename BDIter>
std::size_t chatterjee_transform(BDIter begin, BDIter end)
{
std::size_t sum = 0;
while(++begin != end)
{
if(*begin > *std::prev(begin))
{
sum += *begin - *std::prev(begin);
}
else
{
sum += *std::prev(begin) - *begin;
}
}
return sum;
}
template <typename ReturnType, typename ForwardIterator>
ReturnType chatterjee_correlation_seq_impl(ForwardIterator u_begin, ForwardIterator u_end, ForwardIterator v_begin, ForwardIterator v_end)
{
using std::abs;
BOOST_MATH_ASSERT_MSG(std::is_sorted(u_begin, u_end), "The x values must be sorted in order to use this functionality");
const std::vector<std::size_t> rank_vector = rank(v_begin, v_end);
std::size_t sum = chatterjee_transform(rank_vector.begin(), rank_vector.end());
ReturnType result = static_cast<ReturnType>(1) - (static_cast<ReturnType>(3 * sum) / static_cast<ReturnType>(rank_vector.size() * rank_vector.size() - 1));
// If the result is 1 then Y is constant and all the elements must be ties
if (abs(result - static_cast<ReturnType>(1)) < std::numeric_limits<ReturnType>::epsilon())
{
return std::numeric_limits<ReturnType>::quiet_NaN();
}
return result;
}
} // Namespace detail
template <typename Container, typename Real = typename Container::value_type,
typename ReturnType = typename std::conditional<std::is_integral<Real>::value, double, Real>::type>
inline ReturnType chatterjee_correlation(const Container& u, const Container& v)
{
return detail::chatterjee_correlation_seq_impl<ReturnType>(std::begin(u), std::end(u), std::begin(v), std::end(v));
}
}}} // Namespace boost::math::statistics
#ifdef BOOST_MATH_EXEC_COMPATIBLE
namespace boost::math::statistics {
namespace detail {
template <typename ReturnType, typename ExecutionPolicy, typename ForwardIterator>
ReturnType chatterjee_correlation_par_impl(ExecutionPolicy&& exec, ForwardIterator u_begin, ForwardIterator u_end,
ForwardIterator v_begin, ForwardIterator v_end)
{
using std::abs;
BOOST_MATH_ASSERT_MSG(std::is_sorted(std::forward<ExecutionPolicy>(exec), u_begin, u_end), "The x values must be sorted in order to use this functionality");
auto rank_vector = rank(std::forward<ExecutionPolicy>(exec), v_begin, v_end);
const auto num_threads = std::thread::hardware_concurrency() == 0 ? 2u : std::thread::hardware_concurrency();
std::vector<std::future<std::size_t>> future_manager {};
const auto elements_per_thread = std::ceil(static_cast<double>(rank_vector.size()) / num_threads);
auto it = rank_vector.begin();
auto end = rank_vector.end();
for(std::size_t i {}; i < num_threads - 1; ++i)
{
future_manager.emplace_back(std::async(std::launch::async | std::launch::deferred, [it, elements_per_thread]() -> std::size_t
{
return chatterjee_transform(it, std::next(it, elements_per_thread));
}));
it = std::next(it, elements_per_thread - 1);
}
future_manager.emplace_back(std::async(std::launch::async | std::launch::deferred, [it, end]() -> std::size_t
{
return chatterjee_transform(it, end);
}));
std::size_t sum {};
for(std::size_t i {}; i < future_manager.size(); ++i)
{
sum += future_manager[i].get();
}
ReturnType result = static_cast<ReturnType>(1) - (static_cast<ReturnType>(3 * sum) / static_cast<ReturnType>(rank_vector.size() * rank_vector.size() - 1));
// If the result is 1 then Y is constant and all the elements must be ties
if (abs(result - static_cast<ReturnType>(1)) < std::numeric_limits<ReturnType>::epsilon())
{
return std::numeric_limits<ReturnType>::quiet_NaN();
}
return result;
}
} // Namespace detail
template <typename ExecutionPolicy, typename Container, typename Real = typename Container::value_type,
typename ReturnType = std::conditional_t<std::is_integral_v<Real>, double, Real>>
inline ReturnType chatterjee_correlation(ExecutionPolicy&& exec, const Container& u, const Container& v)
{
if constexpr (std::is_same_v<std::remove_reference_t<decltype(exec)>, decltype(std::execution::seq)>)
{
return detail::chatterjee_correlation_seq_impl<ReturnType>(std::cbegin(u), std::cend(u),
std::cbegin(v), std::cend(v));
}
else
{
return detail::chatterjee_correlation_par_impl<ReturnType>(std::forward<ExecutionPolicy>(exec),
std::cbegin(u), std::cend(u),
std::cbegin(v), std::cend(v));
}
}
} // Namespace boost::math::statistics
#endif
#endif // BOOST_MATH_STATISTICS_CHATTERJEE_CORRELATION_HPP

View File

@@ -0,0 +1,140 @@
// (C) Copyright Matt Borland 2022
// Use, modification and distribution are subject to the
// Boost Software License, Version 1.0. (See accompanying file
// LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
#ifndef BOOST_MATH_STATISTICS_DETAIL_RANK_HPP
#define BOOST_MATH_STATISTICS_DETAIL_RANK_HPP
#include <cstdint>
#include <vector>
#include <numeric>
#include <utility>
#include <iterator>
#include <algorithm>
#include <boost/math/tools/config.hpp>
#ifdef BOOST_MATH_EXEC_COMPATIBLE
#include <execution>
#endif
namespace boost { namespace math { namespace statistics { namespace detail {
struct pair_equal
{
template <typename T1, typename T2>
bool operator()(const std::pair<T1, T2>& a, const std::pair<T1, T2>& b) const
{
return a.first == b.first;
}
};
}}}} // Namespaces
#ifndef BOOST_MATH_EXEC_COMPATIBLE
namespace boost { namespace math { namespace statistics { namespace detail {
template <typename ForwardIterator, typename T = typename std::iterator_traits<ForwardIterator>::value_type>
auto rank(ForwardIterator first, ForwardIterator last) -> std::vector<std::size_t>
{
std::size_t elements = std::distance(first, last);
std::vector<std::pair<T, std::size_t>> rank_vector(elements);
std::size_t i = 0;
while (first != last)
{
rank_vector[i] = std::make_pair(*first, i);
++i;
++first;
}
std::sort(rank_vector.begin(), rank_vector.end());
// Remove duplicates
rank_vector.erase(std::unique(rank_vector.begin(), rank_vector.end(), pair_equal()), rank_vector.end());
elements = rank_vector.size();
std::pair<T, std::size_t> rank;
std::vector<std::size_t> result(elements);
for (i = 0; i < elements; ++i)
{
if (rank_vector[i].first != rank.first)
{
rank = std::make_pair(rank_vector[i].first, i);
}
result[rank_vector[i].second] = rank.second;
}
return result;
}
template <typename Container>
inline auto rank(const Container& c) -> std::vector<std::size_t>
{
return rank(std::begin(c), std::end(c));
}
}}}} // Namespaces
#else
namespace boost::math::statistics::detail {
template <typename ExecutionPolicy, typename ForwardIterator, typename T = typename std::iterator_traits<ForwardIterator>::value_type>
auto rank(ExecutionPolicy&& exec, ForwardIterator first, ForwardIterator last)
{
std::size_t elements = std::distance(first, last);
std::vector<std::pair<T, std::size_t>> rank_vector(elements);
std::size_t i = 0;
while (first != last)
{
rank_vector[i] = std::make_pair(*first, i);
++i;
++first;
}
std::sort(exec, rank_vector.begin(), rank_vector.end());
// Remove duplicates
rank_vector.erase(std::unique(exec, rank_vector.begin(), rank_vector.end(), pair_equal()), rank_vector.end());
elements = rank_vector.size();
std::pair<T, std::size_t> rank;
std::vector<std::size_t> result(elements);
for (i = 0; i < elements; ++i)
{
if (rank_vector[i].first != rank.first)
{
rank = std::make_pair(rank_vector[i].first, i);
}
result[rank_vector[i].second] = rank.second;
}
return result;
}
template <typename ExecutionPolicy, typename Container>
inline auto rank(ExecutionPolicy&& exec, const Container& c)
{
return rank(exec, std::cbegin(c), std::cend(c));
}
template <typename ForwardIterator, typename T = typename std::iterator_traits<ForwardIterator>::value_type>
inline auto rank(ForwardIterator first, ForwardIterator last)
{
return rank(std::execution::seq, first, last);
}
template <typename Container>
inline auto rank(const Container& c)
{
return rank(std::execution::seq, std::cbegin(c), std::cend(c));
}
} // Namespaces
#endif // BOOST_MATH_EXEC_COMPATIBLE
#endif // BOOST_MATH_STATISTICS_DETAIL_RANK_HPP

View File

@@ -20,9 +20,7 @@
#include <numeric>
#include <list>
// Support compilers with P0024R2 implemented without linking TBB
// https://en.cppreference.com/w/cpp/compiler_support
#if !defined(BOOST_NO_CXX17_HDR_EXECUTION) && defined(BOOST_HAS_THREADS)
#ifdef BOOST_MATH_EXEC_COMPATIBLE
#include <execution>
namespace boost::math::statistics {
@@ -672,7 +670,7 @@ inline auto mode(Container & v)
} // Namespace boost::math::statistics
#else // Backwards compatible bindings for C++11
#else // Backwards compatible bindings for C++11 or execution is not implemented
namespace boost { namespace math { namespace statistics {

View File

@@ -79,6 +79,12 @@
#endif // BOOST_MATH_STANDALONE
// Support compilers with P0024R2 implemented without linking TBB
// https://en.cppreference.com/w/cpp/compiler_support
#if !defined(BOOST_NO_CXX17_HDR_EXECUTION) && defined(BOOST_HAS_THREADS)
# define BOOST_MATH_EXEC_COMPATIBLE
#endif
#include <algorithm> // for min and max
#include <limits>
#include <cmath>

View File

@@ -12,8 +12,8 @@
namespace boost { namespace math {
// To stress test, set global_seed = 0, global_size = huge.
static const std::size_t global_seed = 0;
static const std::size_t global_size = 128;
static constexpr std::size_t global_seed = 0;
static constexpr std::size_t global_size = 128;
template<typename T, typename std::enable_if<std::is_floating_point<T>::value, bool>::type = true>
std::vector<T> generate_random_vector(std::size_t size, std::size_t seed)
@@ -35,6 +35,28 @@ std::vector<T> generate_random_vector(std::size_t size, std::size_t seed)
return v;
}
template<typename T, typename std::enable_if<std::is_floating_point<T>::value, bool>::type = true>
std::vector<T> generate_random_uniform_vector(std::size_t size, std::size_t seed, T lower_bound = T(0), T upper_bound = T(1))
{
if (seed == 0)
{
std::random_device rd;
seed = rd();
}
std::vector<T> v(size);
std::mt19937 gen(seed);
std::uniform_real_distribution<T> dis(lower_bound, upper_bound);
for (auto& i : v)
{
i = dis(gen);
}
return v;
}
template<typename T, typename std::enable_if<std::is_floating_point<T>::value, bool>::type = true>
std::vector<T> generate_random_vector(std::size_t size, std::size_t seed, T mean, T stddev)
{

View File

@@ -0,0 +1,34 @@
// (C) Copyright Matt Borland 2022.
// Use, modification and distribution are subject to the
// Boost Software License, Version 1.0. (See accompanying file
// LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
#include <vector>
#include <algorithm>
#include <boost/math/tools/random_vector.hpp>
#include <boost/math/statistics/chatterjee_correlation.hpp>
#include <benchmark/benchmark.h>
using boost::math::generate_random_vector;
template <typename T>
void chatterjee_correlation(benchmark::State& state)
{
constexpr std::size_t seed {};
const std::size_t size = state.range(0);
std::vector<T> u = generate_random_vector<T>(size, seed);
std::vector<T> v = generate_random_vector<T>(size, seed);
std::sort(u.begin(), u.end());
for (auto _ : state)
{
benchmark::DoNotOptimize(boost::math::statistics::chatterjee_correlation(u, v));
}
state.SetComplexityN(state.range(0));
}
BENCHMARK_TEMPLATE(chatterjee_correlation, float)->RangeMultiplier(2)->Range(1 << 6, 1 << 20)->Complexity()->UseRealTime();
BENCHMARK_TEMPLATE(chatterjee_correlation, double)->RangeMultiplier(2)->Range(1 << 6, 1 << 20)->Complexity()->UseRealTime();
BENCHMARK_MAIN();

View File

@@ -1012,6 +1012,8 @@ test-suite misc :
[ run bivariate_statistics_test.cpp : : : [ requires cxx11_hdr_forward_list cxx11_hdr_atomic cxx11_hdr_thread cxx11_hdr_tuple cxx11_hdr_future cxx11_sfinae_expr ] [ check-target-builds ../config//is_cygwin_run "Cygwin CI run" : <build>no ] ]
[ run linear_regression_test.cpp : : : [ requires cxx11_hdr_forward_list cxx11_hdr_atomic cxx11_hdr_thread cxx11_hdr_tuple cxx11_hdr_future cxx11_sfinae_expr ] ]
[ run test_runs_test.cpp : : : [ requires cxx17_if_constexpr cxx17_std_apply ] ]
[ run test_chatterjee_correlation.cpp ../../test/build//boost_unit_test_framework ]
[ run test_rank.cpp ../../test/build//boost_unit_test_framework ]
[ run lanczos_smoothing_test.cpp ../../test/build//boost_unit_test_framework : : : [ requires cxx17_if_constexpr cxx17_std_apply ] ]
[ run condition_number_test.cpp ../../test/build//boost_unit_test_framework : : : [ check-target-builds ../config//has_float128 "GCC libquadmath and __float128 support" : <linkflags>"-Bstatic -lquadmath -Bdynamic" ] [ requires cxx17_if_constexpr cxx17_std_apply ] ]
[ run test_real_concept.cpp ../../test/build//boost_unit_test_framework ]

View File

@@ -0,0 +1,9 @@
// Copyright Matt Borland 2021.
// Use, modification and distribution are subject to the
// Boost Software License, Version 1.0. (See accompanying file
// LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
//
// Basic sanity check that header
// #includes all the files that it needs to.
//
#include <boost/math/statistics/chatterjee_correlation.hpp>

View File

@@ -384,6 +384,8 @@ int report_errors()
#define CHECK_ULP_CLOSE(X, Y, Z) boost::math::test::check_ulp_close((X), (Y), (Z), __FILE__, __func__, __LINE__)
#define CHECK_GE(X, Y) boost::math::test::check_le((Y), (X), __FILE__, __func__, __LINE__)
#define CHECK_LE(X, Y) boost::math::test::check_le((X), (Y), __FILE__, __func__, __LINE__)
#define CHECK_NAN(X) boost::math::test::check_nan((X), __FILE__, __func__, __LINE__)

View File

@@ -0,0 +1,159 @@
// (C) Copyright Matt Borland 2022.
// Use, modification and distribution are subject to the
// Boost Software License, Version 1.0. (See accompanying file
// LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
#include <cstdint>
#include <cmath>
#include <vector>
#include <random>
#include <algorithm>
#include <utility>
#include <boost/math/statistics/chatterjee_correlation.hpp>
#include <boost/math/tools/random_vector.hpp>
#include <boost/math/constants/constants.hpp>
#include "math_unit_test.hpp"
// The Chatterjee correlation is invariant under:
// - Shuffles. (X_i, Y_i) -> (X_sigma(i), Y_sigma(i)), where sigma is a permutation.
// - Strictly monotone transformations: (X_i, Y_i) -> (f(X_i), g(Y_i)) where f' > 0 and g' > 0.
//
using boost::math::statistics::chatterjee_correlation;
template <typename Real>
void properties()
{
std::size_t vector_size = 256;
std::mt19937_64 mt(123521);
std::uniform_real_distribution<Real> unif(-1, 1);
std::vector<Real> X(vector_size);
std::vector<Real> Y(vector_size);
for (std::size_t i = 0; i < vector_size; ++i)
{
X[i] = unif(mt);
Y[i] = unif(mt);
}
std::sort(X.begin(), X.end());
Real coeff1 = chatterjee_correlation(X, Y);
// The minimum possible value of En(X, Y) is -1/2 + O(1/n)
CHECK_GE(coeff1, Real(-0.5));
CHECK_LE(coeff1, Real(1));
// Now apply a monotone function to the data
for (std::size_t i = 0; i < vector_size; ++i)
{
X[i] = Real(2.3)*X[i] - Real(7.3);
Y[i] = Real(7.6)*Y[i] - Real(8.6);
}
auto coeff3 = chatterjee_correlation(X, Y);
CHECK_EQUAL(coeff1, coeff3);
// If there are no ties among the Yis, the maximum possible value of Xi(X, Y) is (n - 2)/(n + 1), which is attained if Yi = Xi for all i
auto coeff = chatterjee_correlation(X, X);
// These floating point numbers are computed by two different methods, so we can expect some floating point error:
const auto n = X.size();
CHECK_ULP_CLOSE(coeff, Real(n-2)/Real(n+1), 1);
std::sort(Y.begin(), Y.end());
coeff = chatterjee_correlation(Y, Y);
CHECK_ULP_CLOSE(coeff, Real(n-2)/Real(n+1), 1);
}
template <typename Real>
void test_spots()
{
// Rank Order: Result will be 1 - 3*3 / (4^2 - 1) = 1 - 9/15 = 0.6
std::vector<Real> x = {1, 2, 3, 4};
std::vector<Real> y = {1, 2, 3, 4};
CHECK_ULP_CLOSE(chatterjee_correlation(x, y), 1 - Real(9)/15, 1);
// Reverse rank order should be the same as above
y = {4, 3, 2, 1};
CHECK_ULP_CLOSE(chatterjee_correlation(x, y), 1 - Real(9)/15, 1);
// Alternating order: 1 - 3*5 / (4^2 - 1) = 1 - 15/15 = 0
y = {1, 3, 2, 4};
CHECK_ULP_CLOSE(chatterjee_correlation(x, y), Real(0), 1);
// All ties will yield quiet NaN
y = {1, 1, 1, 1};
CHECK_NAN(chatterjee_correlation(x, y));
}
#ifdef BOOST_MATH_EXEC_COMPATIBLE
template <typename Real, typename ExecutionPolicy>
void test_threaded(ExecutionPolicy&& exec)
{
std::vector<Real> x = boost::math::generate_random_vector<Real>(1024, 0);
std::vector<Real> y = boost::math::generate_random_vector<Real>(1024, 1);
std::sort(std::forward<ExecutionPolicy>(exec), x.begin(), x.end());
auto seq_ans = chatterjee_correlation(x, y);
auto par_ans = chatterjee_correlation(exec, x, y);
CHECK_ULP_CLOSE(seq_ans, par_ans, 1);
};
#endif // BOOST_MATH_EXEC_COMPATIBLE
template <typename Real>
void test_paper()
{
constexpr Real two_pi = boost::math::constants::two_pi<Real>();
// Page 9 figure (a) y = x
std::vector<Real> x = boost::math::generate_random_uniform_vector<Real>(100, 0, -two_pi, two_pi);
std::sort(x.begin(), x.end());
auto result = chatterjee_correlation(x, x);
CHECK_MOLLIFIED_CLOSE(result, Real(0.970), 0.005);
// Page 9 figure (d) y = x^2
std::vector<Real> y = x;
for (auto& i : y)
{
i *= i;
}
result = chatterjee_correlation(x, y);
CHECK_MOLLIFIED_CLOSE(result, Real(0.941), 0.005);
// Page 9 figure (g) y = sin(x)
for (std::size_t i {}; i < x.size(); ++i)
{
y[i] = std::sin(x[i]);
}
result = chatterjee_correlation(x, y);
CHECK_MOLLIFIED_CLOSE(result, Real(0.885), 0.01);
}
int main(void)
{
properties<float>();
properties<double>();
properties<long double>();
test_spots<float>();
test_spots<double>();
test_spots<long double>();
#ifdef BOOST_MATH_EXEC_COMPATIBLE
test_threaded<float>(std::execution::par);
test_threaded<double>(std::execution::par);
test_threaded<long double>(std::execution::par);
test_threaded<float>(std::execution::par_unseq);
test_threaded<double>(std::execution::par_unseq);
test_threaded<long double>(std::execution::par_unseq);
#endif // BOOST_MATH_EXEC_COMPATIBLE
test_paper<float>();
test_paper<double>();
test_paper<long double>();
return boost::math::test::report_errors();
}

102
test/test_rank.cpp Normal file
View File

@@ -0,0 +1,102 @@
// (C) Copyright Matt Borland 2022.
// Use, modification and distribution are subject to the
// Boost Software License, Version 1.0. (See accompanying file
// LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
#include <cstdint>
#include <vector>
#include <boost/math/statistics/detail/rank.hpp>
#include <boost/math/tools/config.hpp>
#include "math_unit_test.hpp"
template <typename T>
void test()
{
std::vector<T> test_vals {T(1.0), T(3.2), T(2.4), T(5.6), T(4.1)};
auto rank_vector = boost::math::statistics::detail::rank(test_vals.begin(), test_vals.end());
CHECK_EQUAL(static_cast<std::size_t>(0), rank_vector[0]);
CHECK_EQUAL(static_cast<std::size_t>(2), rank_vector[1]);
CHECK_EQUAL(static_cast<std::size_t>(1), rank_vector[2]);
CHECK_EQUAL(static_cast<std::size_t>(4), rank_vector[3]);
CHECK_EQUAL(static_cast<std::size_t>(3), rank_vector[4]);
// Remove duplicates
test_vals.push_back(T(4.1));
test_vals.push_back(T(2.4));
rank_vector = boost::math::statistics::detail::rank(test_vals.begin(), test_vals.end());
// Check the size is correct and the ordering is not disrupted
CHECK_EQUAL(static_cast<std::size_t>(5), rank_vector.size());
CHECK_EQUAL(static_cast<std::size_t>(0), rank_vector[0]);
CHECK_EQUAL(static_cast<std::size_t>(2), rank_vector[1]);
CHECK_EQUAL(static_cast<std::size_t>(1), rank_vector[2]);
CHECK_EQUAL(static_cast<std::size_t>(4), rank_vector[3]);
CHECK_EQUAL(static_cast<std::size_t>(3), rank_vector[4]);
}
template <typename T>
void container_test()
{
std::vector<T> test_vals {T(1.0), T(3.2), T(2.4), T(5.6), T(4.1)};
auto rank_vector = boost::math::statistics::detail::rank(test_vals);
CHECK_EQUAL(static_cast<std::size_t>(0), rank_vector[0]);
CHECK_EQUAL(static_cast<std::size_t>(2), rank_vector[1]);
CHECK_EQUAL(static_cast<std::size_t>(1), rank_vector[2]);
CHECK_EQUAL(static_cast<std::size_t>(4), rank_vector[3]);
CHECK_EQUAL(static_cast<std::size_t>(3), rank_vector[4]);
}
#ifdef BOOST_MATH_EXEC_COMPATIBLE
#include <execution>
template<typename T, typename ExecutionPolicy>
void execution_test(ExecutionPolicy&& exec)
{
std::vector<T> test_vals {T(1.0), T(3.2), T(2.4), T(5.6), T(4.1)};
auto rank_vector = boost::math::statistics::detail::rank(exec, test_vals.begin(), test_vals.end());
CHECK_EQUAL(static_cast<std::size_t>(0), rank_vector[0]);
CHECK_EQUAL(static_cast<std::size_t>(2), rank_vector[1]);
CHECK_EQUAL(static_cast<std::size_t>(1), rank_vector[2]);
CHECK_EQUAL(static_cast<std::size_t>(4), rank_vector[3]);
CHECK_EQUAL(static_cast<std::size_t>(3), rank_vector[4]);
// Remove duplicates
test_vals.push_back(T(4.1));
test_vals.push_back(T(2.4));
rank_vector = boost::math::statistics::detail::rank(exec, test_vals.begin(), test_vals.end());
// Check the size is correct and the ordering is not disrupted
CHECK_EQUAL(static_cast<std::size_t>(5), rank_vector.size());
CHECK_EQUAL(static_cast<std::size_t>(0), rank_vector[0]);
CHECK_EQUAL(static_cast<std::size_t>(2), rank_vector[1]);
CHECK_EQUAL(static_cast<std::size_t>(1), rank_vector[2]);
CHECK_EQUAL(static_cast<std::size_t>(4), rank_vector[3]);
CHECK_EQUAL(static_cast<std::size_t>(3), rank_vector[4]);
}
#endif // BOOST_MATH_EXEC_COMPATIBLE
int main(void)
{
test<float>();
test<double>();
test<long double>();
container_test<float>();
container_test<double>();
container_test<long double>();
#ifdef BOOST_MATH_EXEC_COMPATIBLE
execution_test<float>(std::execution::par);
execution_test<double>(std::execution::par);
execution_test<long double>(std::execution::par);
#endif // BOOST_MATH_EXEC_COMPATIBLE
return boost::math::test::report_errors();
}