![]() |
Home | Libraries | People | FAQ | More |
#include <boost/math/optimization/gradient_descent.hpp> template<typename ArgumentContainer, typename RealType, class Objective, class InitializationPolicy, class ObjectiveEvalPolicy, class GradEvalPolicy> class gradient_descent { public: void step(); } /* Convenience overloads */ /* make gradient descent by providing ** objective function ** variables to optimize over ** optionally learing rate * * requires that code is written using boost::math::differentiation::rvar */ template<class Objective, typename ArgumentContainer, typename RealType> auto make_gradient_descent(Objective&& obj, ArgumentContainer& x, RealType lr = RealType{ 0.01 }); /* make gradient descent by providing * objective function ** variables to optimize over ** learning rate (not optional) ** initialization policy * * requires that code is written using boost::math::differentiation::rvar */ template<class Objective, typename ArgumentContainer, typename RealType, class InitializationPolicy> auto make_gradient_descent(Objective&& obj, ArgumentContainer& x, RealType lr, InitializationPolicy&& ip); /* make gradient descent by providing ** objective function ** variables to optimize over ** learning rate (not optional) ** variable initialization policy ** objective evaluation policy ** gradient evaluation policy * * code does not have to use boost::math::differentiation::rvar */ template<typename ArgumentContainer, typename RealType, class Objective, class InitializationPolicy, class ObjectiveEvalPolicy, class GradEvalPolicy> auto make_gradient_descent(Objective&& obj, ArgumentContainer& x, RealType& lr, InitializationPolicy&& ip, ObjectiveEvalPolicy&& oep, GradEvalPolicy&& gep)
Gradient descent iteratively updates parameters x
in the direction opposite to the gradient of the objective function (minimizing
the objective).
x[i] -= lr * g[i]
where lr is a user defined
learning rate. For a more complete description of the theoretical principle
check the wikipedia
page
The implementation delegates: - the initialization of differentiable variables to an initialization policy - objective evaluation to an objective evaluation policy - the gradient computation to a gradient evaluation policy - the parameter updates to an update policy
The interface is intended to be pytorch-like, where a optimizer object is
constructed and progressed with a step() method. A helper minimize
method is also provided.
Objective&&
obj : objective function to
minimize
ArgumentContainer&
x : variables to optimize over
RealType&
lr : learning rate. A larger
value takes larger steps during descent, leading to faster, but more
unstable convergence. Conversely, small values are more stable but take
longer to converge.
InitializationPolicy&& ip
: Initialization policy for optimizer state and variables. Users may
supply a custom initialization policy to control how the argument container
and any AD specific runtime state : i.e. reverse-mode tape attachment/reset
are initialized. By default, the optimizer uses the user-provided initial
values in x and performs the standard reverse mode AD initialization
required for gradient evaluation. Custom initialization policies are
useful for randomized starts, non rvar AD types, or when gradients are
supplied externally. See the reverse-mode autodiff policy documentation
for the required initialization policy interface when writing custom
policies.
ObjectiveEvalPolicy&&
oep : tells the optimizer how
to evaluate the objective function. By default reverse_mode_function_eval_policy<RealType>.
GradEvalPolicy&&
gep : tells the optimizer how
to evaluate the gradient of the objective function. By default reverse_mode_gradient_evaluation_policy<RealType>
In this section we will present an example for finding optimal configurations
of electrically charged particles confined to a R
= 1
sphere. This problem is also known as the Thomson
problem. In summary, we are looking for the configuration of an N-electron
system subject to the Coulomb potential confined to the $S^2$ sphere. The
Coulomb potential is given by:
The code below manually minimizes the above potential energy function for N particles over their two angular positions.
#include <boost/math/differentiation/autodiff_reverse.hpp> #include <boost/math/optimization/gradient_descent.hpp> #include <boost/math/optimization/minimizer.hpp> #include <cmath> #include <fstream> #include <iostream> #include <random> #include <string> namespace rdiff = boost::math::differentiation::reverse_mode; namespace bopt = boost::math::optimization; double random_double(double min = 0.0, double max = 1.0) { static thread_local std::mt19937 rng{std::random_device{}()}; std::uniform_real_distribution<double> dist(min, max); return dist(rng); } template<typename S> struct vec3 { /** * @brief R^3 coordinates of particle on Thomson Sphere */ S x, y, z; }; template<class S> static inline vec3<S> sph_to_xyz(const S& theta, const S& phi) { /** * convenience overload to convert from [theta,phi] -> x, y, z */ return {sin(theta) * cos(phi), sin(theta) * sin(phi), cos(theta)}; } template<typename T> T thomson_energy(std::vector<T>& r) { const size_t N = r.size() / 2; const T tiny = T(1e-12); T E = 0; for (size_t i = 0; i < N; ++i) { const T& theta_i = r[2 * i + 0]; const T& phi_i = r[2 * i + 1]; auto ri = sph_to_xyz(theta_i, phi_i); for (size_t j = i + 1; j < N; ++j) { const T& theta_j = r[2 * j + 0]; const T& phi_j = r[2 * j + 1]; auto rj = sph_to_xyz(theta_j, phi_j); T dx = ri.x - rj.x; T dy = ri.y - rj.y; T dz = ri.z - rj.z; T d2 = dx * dx + dy * dy + dz * dz + tiny; E += 1.0 / sqrt(d2); } } return E; } template<class T> std::vector<rdiff::rvar<T, 1>> init_theta_phi_uniform(size_t N, unsigned seed = 12345) { const T pi = T(3.1415926535897932384626433832795); std::mt19937 rng(seed); std::uniform_real_distribution<T> unif01(T(0), T(1)); std::uniform_real_distribution<T> unifm11(T(-1), T(1)); std::vector<rdiff::rvar<T, 1>> u; u.reserve(2 * N); for (size_t i = 0; i < N; ++i) { T z = unifm11(rng); T phi = (T(2) * pi) * unif01(rng) - pi; T theta = std::acos(z); u.emplace_back(theta); u.emplace_back(phi); } return u; } int main(int argc, char* argv[]) { if (argc != 2) { std::cerr << "Usage: " << argv[0] << " <N>\n"; return 1; } const int N = std::stoi(argv[1]); const int NSTEPS = 100000; const double lr = 1e-3; auto u_ad = init_theta_phi_uniform<double>(N); auto gdopt = bopt::make_gradient_descent(&thomson_energy<rdiff::rvar<double, 1>>, u_ad, lr); // filenames std::string pos_filename = "thomson_" + std::to_string(N) + ".csv"; std::string energy_filename = "energy_" + std::to_string(N) + ".csv"; std::ofstream pos_out(pos_filename); std::ofstream energy_out(energy_filename); pos_out << "step,particle,x,y,z\n"; energy_out << "step,energy\n"; for (int step = 0; step < NSTEPS; ++step) { gdopt.step(); for (int pi = 0; pi < N; ++pi) { double theta = u_ad[2 * pi + 0].item(); double phi = u_ad[2 * pi + 1].item(); auto r = sph_to_xyz(theta, phi); pos_out << step << "," << pi << "," << r.x << "," << r.y << "," << r.z << "\n"; } auto E = gdopt.objective_value(); energy_out << step << "," << E << "\n"; } pos_out.close(); energy_out.close(); return 0; }
The variable
const int N = std::stoi(argv[1]);
is the number of particles read from the command line
const int NSTEPS = 100000;
is the number of optimization steps
const double lr = 1e-3;
is the optimizer learning rate. Using the code the way its written, the optimizer runs for 100000 steps. Running the program with
./thomson_sphere N
optimizes the N particle system. Below is a plot of several optimal configurations for N=2,...8 particles.
Below is a plot of the final energy of the system, and its deviation from the theoretically predicted values. The table of theoretical energy values for the problem is from wikipedia.
[gbo_graphi thomson_energy_error_gradient_descent.svg]
Often, we don't want to actually implement our own stepping function, i.e. we care about certain convergence criteria. In the above example, we need to include the minimizer.hpp header:
#include <boost/math/optimization/minimizer.hpp>
and replace the optimization loop:
for (int step = 0; step < NSTEPS; ++step) { gdopt.step(); for (int pi = 0; pi < N; ++pi) { double theta = u_ad[2 * pi + 0].item(); double phi = u_ad[2 * pi + 1].item(); auto r = sph_to_xyz(theta, phi); pos_out << step << "," << pi << "," << r.x << "," << r.y << "," << r.z << "\n"; } auto E = gdopt.objective_value(); energy_out << step << "," << E << "\n"; }
with
auto result = minimize(gdopt);
minimize returns a
optimization_result<typename Optimizer::real_type_t>
, a struct with the following fields:
size_t num_iter; RealType objective_value; std::vector<RealType> objective_history; bool converged;
where num_iter is the number
of iterations the optimizer went through, objective_value
is the final objective value, objective_history
are the intermediate objective values, and converged
is whether the convergence criterion was satisfied. By default, minimize(optimizer)
uses a gradient norm convergence criterion. If norm(gradient_vector) <
1e-3, the criterion is satisfied. Maximum number of iterations is set at
100000. For more info on how to use minimize
check the minimize docs. With default parameters, gradient descent solves
the N=2 problem in 93799
steps.