2
0
mirror of https://github.com/boostorg/math.git synced 2026-02-25 16:32:15 +00:00
Files
math/doc/html/math_toolkit/gd_opt/introduction.html
2026-01-30 19:48:00 -05:00

754 lines
19 KiB
HTML

<html>
<head>
<meta charset="UTF-8">
<title>Introduction</title>
<link rel="stylesheet" href="../../math.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets Vsnapshot">
<link rel="home" href="../../index.html" title="Math Toolkit 4.2.1">
<link rel="up" href="../gd_opt.html" title="Gradient Based Optimizers">
<link rel="prev" href="../gd_opt.html" title="Gradient Based Optimizers">
<link rel="next" href="gradient_descent.html" title="Gradient Descent">
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<table cellpadding="2" width="100%"><tr>
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td>
<td align="center"><a href="../../../../../../index.html">Home</a></td>
<td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td>
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
<td align="center"><a href="../../../../../../more/index.htm">More</a></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="../gd_opt.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../gd_opt.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="gradient_descent.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="math_toolkit.gd_opt.introduction"></a><a class="link" href="introduction.html" title="Introduction">Introduction</a>
</h3></div></div></div>
<p>
Gradient based optimizers are algorithms that use the gradient of a function
to iteratively find locally extreme points of functions over a set of parameters.
This sections provides a description of a set of gradient optimizers. The
optimizers are written with <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">differentiation</span><span class="special">::</span><span class="identifier">reverse_mode</span><span class="special">::</span><span class="identifier">rvar</span></code>
in mind, however if a way to evaluate the funciton and its gradient is provided,
the optimizers should work in exactly the same way.
</p>
<p>
Below is a table that summarizes the intended usage patterns of the provided
optimizers and policies, and is meant as a practical guide rather than a
strict prescription:
</p>
<h2>
<a name="math_toolkit.gd_opt.introduction.h0"></a>
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.table-optimizers"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.table-optimizers">List
of Optimizers</a>
</h2>
<div class="informaltable"><table class="table">
<colgroup>
<col>
<col>
<col>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>
<p>
Optimizer
</p>
</th>
<th>
<p>
Order
</p>
</th>
<th>
<p>
Uses Curvature
</p>
</th>
<th>
<p>
Memory Cost
</p>
</th>
<th>
<p>
Intended Problem Class
</p>
</th>
<th>
<p>
When to Use
</p>
</th>
</tr></thead>
<tbody>
<tr>
<td>
<p>
gradient descent
</p>
</td>
<td>
<p>
first
</p>
</td>
<td>
<p>
no
</p>
</td>
<td>
<p>
low
</p>
</td>
<td>
<p>
Smooth, well-scaled objectives
</p>
</td>
<td>
<p>
Baseline method; debugging; when behavior transparency matters
</p>
</td>
</tr>
<tr>
<td>
<p>
nesterov accelerated gradient
</p>
</td>
<td>
<p>
first
</p>
</td>
<td>
<p>
no
</p>
</td>
<td>
<p>
low
</p>
</td>
<td>
<p>
Ill-conditioned or narrow-valley problems
</p>
</td>
<td>
<p>
When plain gradient descent converges slowly or oscillates
</p>
</td>
</tr>
<tr>
<td>
<p>
L-BFGS
</p>
</td>
<td>
<p>
quasi second order
</p>
</td>
<td>
<p>
approximate
</p>
</td>
<td>
<p>
medium
</p>
</td>
<td>
<p>
Smooth, deterministic objectives
</p>
</td>
<td>
<p>
When gradients are reliable and faster convergence is needed
</p>
</td>
</tr>
</tbody>
</table></div>
<h2>
<a name="math_toolkit.gd_opt.introduction.h1"></a>
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.table-optimizer-policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.table-optimizer-policies">Optimizer
Policies</a>
</h2>
<h5>
<a name="math_toolkit.gd_opt.introduction.h2"></a>
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.initialization_policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.initialization_policies">Initialization
Policies</a>
</h5>
<div class="informaltable"><table class="table">
<colgroup>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>
<p>
Policy
</p>
</th>
<th>
<p>
Use case
</p>
</th>
<th>
<p>
Responsibilities
</p>
</th>
</tr></thead>
<tbody>
<tr>
<td>
<p>
tape_initializer_rvar
</p>
</td>
<td>
<p>
User initialzes all varibles manually
</p>
</td>
<td>
<p>
initializes tape
</p>
</td>
</tr>
<tr>
<td>
<p>
random_uniform_initializer_rvar
</p>
</td>
<td>
<p>
Initializes all variables with a random number between a min and
max value
</p>
</td>
<td>
<p>
Initializes variables. Initializes tape.
</p>
</td>
</tr>
<tr>
<td>
<p>
costant_initializer_rvar
</p>
</td>
<td>
<p>
Initializes all variables with a constant
</p>
</td>
<td>
<p>
Initializes variables. Initializes tape.
</p>
</td>
</tr>
</tbody>
</table></div>
<h5>
<a name="math_toolkit.gd_opt.introduction.h3"></a>
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.evaluation_policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.evaluation_policies">Evaluation
Policies</a>
</h5>
<div class="informaltable"><table class="table">
<colgroup>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>
<p>
Policy
</p>
</th>
<th>
<p>
Use case
</p>
</th>
<th>
<p>
Responsibilities
</p>
</th>
</tr></thead>
<tbody>
<tr>
<td>
<p>
reverse_mode_function_eval_policy
</p>
</td>
<td>
<p>
Default. User with boost reverse mode autodiff
</p>
</td>
<td>
<p>
tells the optimizer how to evaluate the objective
</p>
</td>
</tr>
<tr>
<td>
<p>
reverse_mode_gradient_evaluation_policy
</p>
</td>
<td>
<p>
Default. User with boost reverse mode autodiff
</p>
</td>
<td>
<p>
tells the optimizer how to evaluate the gradients of an objective
</p>
</td>
</tr>
</tbody>
</table></div>
<p>
These policies are intended to use with boost reverse mode autodiff. If you
need to use the optimizers with a custom AD variable, or by providing the
gradient of an objective manually, check the docs for policies to see how
the policies are implemented.
</p>
<h2>
<a name="math_toolkit.gd_opt.introduction.h4"></a>
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.line-search-policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.line-search-policies">LBFGS line
search policies</a>
</h2>
<p>
the table below summarizes the two line search policies provided for use
with LBFGS.
</p>
<div class="informaltable"><table class="table">
<colgroup>
<col>
<col>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>
<p>
Policy
</p>
</th>
<th>
<p>
Enforced Conditions
</p>
</th>
<th>
<p>
Per iteration cost
</p>
</th>
<th>
<p>
Convergence
</p>
</th>
<th>
<p>
Use case
</p>
</th>
</tr></thead>
<tbody>
<tr>
<td>
<p>
Strong Wolfe
</p>
</td>
<td>
<p>
function decrease. curvature condition
</p>
</td>
<td>
<p>
higher
</p>
</td>
<td>
<p>
faster
</p>
</td>
<td>
<p>
most of the time
</p>
</td>
</tr>
<tr>
<td>
<p>
Armijo
</p>
</td>
<td>
<p>
function decrease only
</p>
</td>
<td>
<p>
lower
</p>
</td>
<td>
<p>
slower
</p>
</td>
<td>
<p>
you know what you're doing
</p>
</td>
</tr>
</tbody>
</table></div>
<h2>
<a name="math_toolkit.gd_opt.introduction.h5"></a>
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.minimizer-policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.minimizer-policies">Minimizer
Policies</a>
</h2>
<h5>
<a name="math_toolkit.gd_opt.introduction.h6"></a>
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.convergence_policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.convergence_policies">Convergence
Policies</a>
</h5>
<div class="informaltable"><table class="table">
<colgroup>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>
<p>
Policy
</p>
</th>
<th>
<p>
Criterion
</p>
</th>
<th>
<p>
When to Use
</p>
</th>
</tr></thead>
<tbody>
<tr>
<td>
<p>
gradient_norm_convergence_policy
</p>
</td>
<td>
<p>
gradient norm &lt; tol
</p>
</td>
<td>
<p>
Default. Stationarity based condition
</p>
</td>
</tr>
<tr>
<td>
<p>
objective_tol_convergence_policy
</p>
</td>
<td>
<p>
absolute difference between objective steps is small
</p>
</td>
<td>
<p>
Well-scaled objectives
</p>
</td>
</tr>
<tr>
<td>
<p>
relative_objective_tol_policy
</p>
</td>
<td>
<p>
relative difference between objective steps is small
</p>
</td>
<td>
<p>
Scale-invariant convergence
</p>
</td>
</tr>
<tr>
<td>
<p>
combined_convergence_policy
</p>
</td>
<td>
<p>
logical combination OR
</p>
</td>
<td>
<p>
you need a combination of convergence conditions
</p>
</td>
</tr>
</tbody>
</table></div>
<h5>
<a name="math_toolkit.gd_opt.introduction.h7"></a>
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.termination_policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.termination_policies">Termination
Policies</a>
</h5>
<div class="informaltable"><table class="table">
<colgroup>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>
<p>
Policy
</p>
</th>
<th>
<p>
Controls
</p>
</th>
<th>
<p>
When to Use
</p>
</th>
</tr></thead>
<tbody>
<tr>
<td>
<p>
max_iter_termination_policy
</p>
</td>
<td>
<p>
iteration count
</p>
</td>
<td>
<p>
Hard safety bound (almost always recommended)
</p>
</td>
</tr>
<tr>
<td>
<p>
wallclock_termination_policy
</p>
</td>
<td>
<p>
wall clock time
</p>
</td>
<td>
<p>
benchmarking, real-time constraints
</p>
</td>
</tr>
</tbody>
</table></div>
<h5>
<a name="math_toolkit.gd_opt.introduction.h8"></a>
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.constraint_and_projection_polici"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.constraint_and_projection_polici">Constraint
and Projection Policies</a>
</h5>
<div class="informaltable"><table class="table">
<colgroup>
<col>
<col>
</colgroup>
<thead><tr>
<th>
<p>
Policy
</p>
</th>
<th>
<p>
Constraint Type
</p>
</th>
</tr></thead>
<tbody>
<tr>
<td>
<p>
unconstrained_policy
</p>
</td>
<td>
<p>
No constraint
</p>
</td>
</tr>
<tr>
<td>
<p>
box_constraints
</p>
</td>
<td>
<p>
upper/lower bound clip
</p>
</td>
</tr>
<tr>
<td>
<p>
nonnegativity_constraint
</p>
</td>
<td>
<p>
set everything below 0, to 0
</p>
</td>
</tr>
<tr>
<td>
<p>
l2_ball_constraint
</p>
</td>
<td>
<p>
2-norm(x) &lt; r
</p>
</td>
</tr>
<tr>
<td>
<p>
l1_ball_constraint
</p>
</td>
<td>
<p>
1-norm(x) &lt; r
</p>
</td>
</tr>
<tr>
<td>
<p>
simplex_constraint
</p>
</td>
<td>
<p>
Probability simplex
</p>
</td>
</tr>
<tr>
<td>
<p>
function_constraint
</p>
</td>
<td>
<p>
custom user provided function wrapper
</p>
</td>
</tr>
<tr>
<td>
<p>
unit_sphere_constraint
</p>
</td>
<td>
<p>
2-norm(x) = 1
</p>
</td>
</tr>
</tbody>
</table></div>
</div>
<div class="copyright-footer">Copyright © 2006-2021 Nikhar Agrawal, Anton Bikineev, Matthew Borland,
Paul A. Bristow, Marco Guazzone, Christopher Kormanyos, Hubert Holin, Bruno
Lalande, John Maddock, Evan Miller, Jeremy Murphy, Matthew Pulver, Johan Råde,
Gautam Sewani, Benjamin Sobotta, Nicholas Thompson, Thijs van den Berg, Daryle
Walker, Xiaogang Zhang, and Maksym Zhelyeznyakov<p>
Distributed under the Boost Software License, Version 1.0. (See accompanying
file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
</p>
</div>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="../gd_opt.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../gd_opt.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="gradient_descent.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
</body>
</html>