mirror of
https://github.com/boostorg/math.git
synced 2026-02-25 04:22:15 +00:00
754 lines
19 KiB
HTML
754 lines
19 KiB
HTML
<html>
|
|
<head>
|
|
<meta charset="UTF-8">
|
|
<title>Introduction</title>
|
|
<link rel="stylesheet" href="../../math.css" type="text/css">
|
|
<meta name="generator" content="DocBook XSL Stylesheets Vsnapshot">
|
|
<link rel="home" href="../../index.html" title="Math Toolkit 4.2.1">
|
|
<link rel="up" href="../gd_opt.html" title="Gradient Based Optimizers">
|
|
<link rel="prev" href="../gd_opt.html" title="Gradient Based Optimizers">
|
|
<link rel="next" href="gradient_descent.html" title="Gradient Descent">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1">
|
|
</head>
|
|
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
|
|
<table cellpadding="2" width="100%"><tr>
|
|
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td>
|
|
<td align="center"><a href="../../../../../../index.html">Home</a></td>
|
|
<td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td>
|
|
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
|
|
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
|
|
<td align="center"><a href="../../../../../../more/index.htm">More</a></td>
|
|
</tr></table>
|
|
<hr>
|
|
<div class="spirit-nav">
|
|
<a accesskey="p" href="../gd_opt.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../gd_opt.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="gradient_descent.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
|
|
</div>
|
|
<div class="section">
|
|
<div class="titlepage"><div><div><h3 class="title">
|
|
<a name="math_toolkit.gd_opt.introduction"></a><a class="link" href="introduction.html" title="Introduction">Introduction</a>
|
|
</h3></div></div></div>
|
|
<p>
|
|
Gradient based optimizers are algorithms that use the gradient of a function
|
|
to iteratively find locally extreme points of functions over a set of parameters.
|
|
This sections provides a description of a set of gradient optimizers. The
|
|
optimizers are written with <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">differentiation</span><span class="special">::</span><span class="identifier">reverse_mode</span><span class="special">::</span><span class="identifier">rvar</span></code>
|
|
in mind, however if a way to evaluate the funciton and its gradient is provided,
|
|
the optimizers should work in exactly the same way.
|
|
</p>
|
|
<p>
|
|
Below is a table that summarizes the intended usage patterns of the provided
|
|
optimizers and policies, and is meant as a practical guide rather than a
|
|
strict prescription:
|
|
</p>
|
|
<h2>
|
|
<a name="math_toolkit.gd_opt.introduction.h0"></a>
|
|
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.table-optimizers"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.table-optimizers">List
|
|
of Optimizers</a>
|
|
</h2>
|
|
<div class="informaltable"><table class="table">
|
|
<colgroup>
|
|
<col>
|
|
<col>
|
|
<col>
|
|
<col>
|
|
<col>
|
|
<col>
|
|
</colgroup>
|
|
<thead><tr>
|
|
<th>
|
|
<p>
|
|
Optimizer
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
Order
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
Uses Curvature
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
Memory Cost
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
Intended Problem Class
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
When to Use
|
|
</p>
|
|
</th>
|
|
</tr></thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
gradient descent
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
first
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
no
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
low
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
Smooth, well-scaled objectives
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
Baseline method; debugging; when behavior transparency matters
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
nesterov accelerated gradient
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
first
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
no
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
low
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
Ill-conditioned or narrow-valley problems
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
When plain gradient descent converges slowly or oscillates
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
L-BFGS
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
quasi second order
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
approximate
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
medium
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
Smooth, deterministic objectives
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
When gradients are reliable and faster convergence is needed
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table></div>
|
|
<h2>
|
|
<a name="math_toolkit.gd_opt.introduction.h1"></a>
|
|
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.table-optimizer-policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.table-optimizer-policies">Optimizer
|
|
Policies</a>
|
|
</h2>
|
|
<h5>
|
|
<a name="math_toolkit.gd_opt.introduction.h2"></a>
|
|
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.initialization_policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.initialization_policies">Initialization
|
|
Policies</a>
|
|
</h5>
|
|
<div class="informaltable"><table class="table">
|
|
<colgroup>
|
|
<col>
|
|
<col>
|
|
<col>
|
|
</colgroup>
|
|
<thead><tr>
|
|
<th>
|
|
<p>
|
|
Policy
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
Use case
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
Responsibilities
|
|
</p>
|
|
</th>
|
|
</tr></thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
tape_initializer_rvar
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
User initialzes all varibles manually
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
initializes tape
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
random_uniform_initializer_rvar
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
Initializes all variables with a random number between a min and
|
|
max value
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
Initializes variables. Initializes tape.
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
costant_initializer_rvar
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
Initializes all variables with a constant
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
Initializes variables. Initializes tape.
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table></div>
|
|
<h5>
|
|
<a name="math_toolkit.gd_opt.introduction.h3"></a>
|
|
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.evaluation_policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.evaluation_policies">Evaluation
|
|
Policies</a>
|
|
</h5>
|
|
<div class="informaltable"><table class="table">
|
|
<colgroup>
|
|
<col>
|
|
<col>
|
|
<col>
|
|
</colgroup>
|
|
<thead><tr>
|
|
<th>
|
|
<p>
|
|
Policy
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
Use case
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
Responsibilities
|
|
</p>
|
|
</th>
|
|
</tr></thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
reverse_mode_function_eval_policy
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
Default. User with boost reverse mode autodiff
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
tells the optimizer how to evaluate the objective
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
reverse_mode_gradient_evaluation_policy
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
Default. User with boost reverse mode autodiff
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
tells the optimizer how to evaluate the gradients of an objective
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table></div>
|
|
<p>
|
|
These policies are intended to use with boost reverse mode autodiff. If you
|
|
need to use the optimizers with a custom AD variable, or by providing the
|
|
gradient of an objective manually, check the docs for policies to see how
|
|
the policies are implemented.
|
|
</p>
|
|
<h2>
|
|
<a name="math_toolkit.gd_opt.introduction.h4"></a>
|
|
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.line-search-policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.line-search-policies">LBFGS line
|
|
search policies</a>
|
|
</h2>
|
|
<p>
|
|
the table below summarizes the two line search policies provided for use
|
|
with LBFGS.
|
|
</p>
|
|
<div class="informaltable"><table class="table">
|
|
<colgroup>
|
|
<col>
|
|
<col>
|
|
<col>
|
|
<col>
|
|
<col>
|
|
</colgroup>
|
|
<thead><tr>
|
|
<th>
|
|
<p>
|
|
Policy
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
Enforced Conditions
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
Per iteration cost
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
Convergence
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
Use case
|
|
</p>
|
|
</th>
|
|
</tr></thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
Strong Wolfe
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
function decrease. curvature condition
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
higher
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
faster
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
most of the time
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
Armijo
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
function decrease only
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
lower
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
slower
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
you know what you're doing
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table></div>
|
|
<h2>
|
|
<a name="math_toolkit.gd_opt.introduction.h5"></a>
|
|
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.minimizer-policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.minimizer-policies">Minimizer
|
|
Policies</a>
|
|
</h2>
|
|
<h5>
|
|
<a name="math_toolkit.gd_opt.introduction.h6"></a>
|
|
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.convergence_policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.convergence_policies">Convergence
|
|
Policies</a>
|
|
</h5>
|
|
<div class="informaltable"><table class="table">
|
|
<colgroup>
|
|
<col>
|
|
<col>
|
|
<col>
|
|
</colgroup>
|
|
<thead><tr>
|
|
<th>
|
|
<p>
|
|
Policy
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
Criterion
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
When to Use
|
|
</p>
|
|
</th>
|
|
</tr></thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
gradient_norm_convergence_policy
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
gradient norm < tol
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
Default. Stationarity based condition
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
objective_tol_convergence_policy
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
absolute difference between objective steps is small
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
Well-scaled objectives
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
relative_objective_tol_policy
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
relative difference between objective steps is small
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
Scale-invariant convergence
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
combined_convergence_policy
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
logical combination OR
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
you need a combination of convergence conditions
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table></div>
|
|
<h5>
|
|
<a name="math_toolkit.gd_opt.introduction.h7"></a>
|
|
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.termination_policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.termination_policies">Termination
|
|
Policies</a>
|
|
</h5>
|
|
<div class="informaltable"><table class="table">
|
|
<colgroup>
|
|
<col>
|
|
<col>
|
|
<col>
|
|
</colgroup>
|
|
<thead><tr>
|
|
<th>
|
|
<p>
|
|
Policy
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
Controls
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
When to Use
|
|
</p>
|
|
</th>
|
|
</tr></thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
max_iter_termination_policy
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
iteration count
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
Hard safety bound (almost always recommended)
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
wallclock_termination_policy
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
wall clock time
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
benchmarking, real-time constraints
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table></div>
|
|
<h5>
|
|
<a name="math_toolkit.gd_opt.introduction.h8"></a>
|
|
<span class="phrase"><a name="math_toolkit.gd_opt.introduction.constraint_and_projection_polici"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.constraint_and_projection_polici">Constraint
|
|
and Projection Policies</a>
|
|
</h5>
|
|
<div class="informaltable"><table class="table">
|
|
<colgroup>
|
|
<col>
|
|
<col>
|
|
</colgroup>
|
|
<thead><tr>
|
|
<th>
|
|
<p>
|
|
Policy
|
|
</p>
|
|
</th>
|
|
<th>
|
|
<p>
|
|
Constraint Type
|
|
</p>
|
|
</th>
|
|
</tr></thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
unconstrained_policy
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
No constraint
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
box_constraints
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
upper/lower bound clip
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
nonnegativity_constraint
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
set everything below 0, to 0
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
l2_ball_constraint
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
2-norm(x) < r
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
l1_ball_constraint
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
1-norm(x) < r
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
simplex_constraint
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
Probability simplex
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
function_constraint
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
custom user provided function wrapper
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<p>
|
|
unit_sphere_constraint
|
|
</p>
|
|
</td>
|
|
<td>
|
|
<p>
|
|
2-norm(x) = 1
|
|
</p>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table></div>
|
|
</div>
|
|
<div class="copyright-footer">Copyright © 2006-2021 Nikhar Agrawal, Anton Bikineev, Matthew Borland,
|
|
Paul A. Bristow, Marco Guazzone, Christopher Kormanyos, Hubert Holin, Bruno
|
|
Lalande, John Maddock, Evan Miller, Jeremy Murphy, Matthew Pulver, Johan Råde,
|
|
Gautam Sewani, Benjamin Sobotta, Nicholas Thompson, Thijs van den Berg, Daryle
|
|
Walker, Xiaogang Zhang, and Maksym Zhelyeznyakov<p>
|
|
Distributed under the Boost Software License, Version 1.0. (See accompanying
|
|
file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
|
|
</p>
|
|
</div>
|
|
<hr>
|
|
<div class="spirit-nav">
|
|
<a accesskey="p" href="../gd_opt.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../gd_opt.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="gradient_descent.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
|
|
</div>
|
|
</body>
|
|
</html>
|