math/doc/html/math_toolkit/gd_opt/introduction.html

<html>
<head>
<meta charset="UTF-8">
<title>Introduction</title>
<link rel="stylesheet" href="../../math.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets Vsnapshot">
<link rel="home" href="../../index.html" title="Math Toolkit 4.2.1">
<link rel="up" href="../gd_opt.html" title="Gradient Based Optimizers">
<link rel="prev" href="../gd_opt.html" title="Gradient Based Optimizers">
<link rel="next" href="gradient_descent.html" title="Gradient Descent">
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<table cellpadding="2" width="100%"><tr>
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../boost.png"></td>
<td align="center"><a href="../../../../../../index.html">Home</a></td>
<td align="center"><a href="../../../../../../libs/libraries.htm">Libraries</a></td>
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
<td align="center"><a href="../../../../../../more/index.htm">More</a></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="../gd_opt.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../gd_opt.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="gradient_descent.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="math_toolkit.gd_opt.introduction"></a><a class="link" href="introduction.html" title="Introduction">Introduction</a>
</h3></div></div></div>
<p>
        Gradient based optimizers are algorithms that use the gradient of a function
        to iteratively find locally extreme points of functions over a set of parameters.
        This sections provides a description of a set of gradient optimizers. The
        optimizers are written with <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">::</span><span class="identifier">differentiation</span><span class="special">::</span><span class="identifier">reverse_mode</span><span class="special">::</span><span class="identifier">rvar</span></code>
        in mind, however if a way to evaluate the funciton and its gradient is provided,
        the optimizers should work in exactly the same way.
      </p>
<p>
        Below is a table that summarizes the intended usage patterns of the provided
        optimizers and policies, and is meant as a practical guide rather than a
        strict prescription:
      </p>
<h2>
<a name="math_toolkit.gd_opt.introduction.h0"></a>
        <span class="phrase"><a name="math_toolkit.gd_opt.introduction.table-optimizers"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.table-optimizers">List
        of Optimizers</a>
      </h2>
<div class="informaltable"><table class="table">
<colgroup>
<col>
<col>
<col>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>
                <p>
                  Optimizer
                </p>
              </th>
<th>
                <p>
                  Order
                </p>
              </th>
<th>
                <p>
                  Uses Curvature
                </p>
              </th>
<th>
                <p>
                  Memory Cost
                </p>
              </th>
<th>
                <p>
                  Intended Problem Class
                </p>
              </th>
<th>
                <p>
                  When to Use
                </p>
              </th>
</tr></thead>
<tbody>
<tr>
<td>
                <p>
                  gradient descent
                </p>
              </td>
<td>
                <p>
                  first
                </p>
              </td>
<td>
                <p>
                  no
                </p>
              </td>
<td>
                <p>
                  low
                </p>
              </td>
<td>
                <p>
                  Smooth, well-scaled objectives
                </p>
              </td>
<td>
                <p>
                  Baseline method; debugging; when behavior transparency matters
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  nesterov accelerated gradient
                </p>
              </td>
<td>
                <p>
                  first
                </p>
              </td>
<td>
                <p>
                  no
                </p>
              </td>
<td>
                <p>
                  low
                </p>
              </td>
<td>
                <p>
                  Ill-conditioned or narrow-valley problems
                </p>
              </td>
<td>
                <p>
                  When plain gradient descent converges slowly or oscillates
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  L-BFGS
                </p>
              </td>
<td>
                <p>
                  quasi second order
                </p>
              </td>
<td>
                <p>
                  approximate
                </p>
              </td>
<td>
                <p>
                  medium
                </p>
              </td>
<td>
                <p>
                  Smooth, deterministic objectives
                </p>
              </td>
<td>
                <p>
                  When gradients are reliable and faster convergence is needed
                </p>
              </td>
</tr>
</tbody>
</table></div>
<h2>
<a name="math_toolkit.gd_opt.introduction.h1"></a>
        <span class="phrase"><a name="math_toolkit.gd_opt.introduction.table-optimizer-policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.table-optimizer-policies">Optimizer
        Policies</a>
      </h2>
<h5>
<a name="math_toolkit.gd_opt.introduction.h2"></a>
        <span class="phrase"><a name="math_toolkit.gd_opt.introduction.initialization_policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.initialization_policies">Initialization
        Policies</a>
      </h5>
<div class="informaltable"><table class="table">
<colgroup>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>
                <p>
                  Policy
                </p>
              </th>
<th>
                <p>
                  Use case
                </p>
              </th>
<th>
                <p>
                  Responsibilities
                </p>
              </th>
</tr></thead>
<tbody>
<tr>
<td>
                <p>
                  tape_initializer_rvar
                </p>
              </td>
<td>
                <p>
                  User initialzes all varibles manually
                </p>
              </td>
<td>
                <p>
                  initializes tape
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  random_uniform_initializer_rvar
                </p>
              </td>
<td>
                <p>
                  Initializes all variables with a random number between a min and
                  max value
                </p>
              </td>
<td>
                <p>
                  Initializes variables. Initializes tape.
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  costant_initializer_rvar
                </p>
              </td>
<td>
                <p>
                  Initializes all variables with a constant
                </p>
              </td>
<td>
                <p>
                  Initializes variables. Initializes tape.
                </p>
              </td>
</tr>
</tbody>
</table></div>
<h5>
<a name="math_toolkit.gd_opt.introduction.h3"></a>
        <span class="phrase"><a name="math_toolkit.gd_opt.introduction.evaluation_policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.evaluation_policies">Evaluation
        Policies</a>
      </h5>
<div class="informaltable"><table class="table">
<colgroup>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>
                <p>
                  Policy
                </p>
              </th>
<th>
                <p>
                  Use case
                </p>
              </th>
<th>
                <p>
                  Responsibilities
                </p>
              </th>
</tr></thead>
<tbody>
<tr>
<td>
                <p>
                  reverse_mode_function_eval_policy
                </p>
              </td>
<td>
                <p>
                  Default. User with boost reverse mode autodiff
                </p>
              </td>
<td>
                <p>
                  tells the optimizer how to evaluate the objective
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  reverse_mode_gradient_evaluation_policy
                </p>
              </td>
<td>
                <p>
                  Default. User with boost reverse mode autodiff
                </p>
              </td>
<td>
                <p>
                  tells the optimizer how to evaluate the gradients of an objective
                </p>
              </td>
</tr>
</tbody>
</table></div>
<p>
        These policies are intended to use with boost reverse mode autodiff. If you
        need to use the optimizers with a custom AD variable, or by providing the
        gradient of an objective manually, check the docs for policies to see how
        the policies are implemented.
      </p>
<h2>
<a name="math_toolkit.gd_opt.introduction.h4"></a>
        <span class="phrase"><a name="math_toolkit.gd_opt.introduction.line-search-policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.line-search-policies">LBFGS line
        search policies</a>
      </h2>
<p>
        the table below summarizes the two line search policies provided for use
        with LBFGS.
      </p>
<div class="informaltable"><table class="table">
<colgroup>
<col>
<col>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>
                <p>
                  Policy
                </p>
              </th>
<th>
                <p>
                  Enforced Conditions
                </p>
              </th>
<th>
                <p>
                  Per iteration cost
                </p>
              </th>
<th>
                <p>
                  Convergence
                </p>
              </th>
<th>
                <p>
                  Use case
                </p>
              </th>
</tr></thead>
<tbody>
<tr>
<td>
                <p>
                  Strong Wolfe
                </p>
              </td>
<td>
                <p>
                  function decrease. curvature condition
                </p>
              </td>
<td>
                <p>
                  higher
                </p>
              </td>
<td>
                <p>
                  faster
                </p>
              </td>
<td>
                <p>
                  most of the time
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  Armijo
                </p>
              </td>
<td>
                <p>
                  function decrease only
                </p>
              </td>
<td>
                <p>
                  lower
                </p>
              </td>
<td>
                <p>
                  slower
                </p>
              </td>
<td>
                <p>
                  you know what you're doing
                </p>
              </td>
</tr>
</tbody>
</table></div>
<h2>
<a name="math_toolkit.gd_opt.introduction.h5"></a>
        <span class="phrase"><a name="math_toolkit.gd_opt.introduction.minimizer-policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.minimizer-policies">Minimizer
        Policies</a>
      </h2>
<h5>
<a name="math_toolkit.gd_opt.introduction.h6"></a>
        <span class="phrase"><a name="math_toolkit.gd_opt.introduction.convergence_policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.convergence_policies">Convergence
        Policies</a>
      </h5>
<div class="informaltable"><table class="table">
<colgroup>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>
                <p>
                  Policy
                </p>
              </th>
<th>
                <p>
                  Criterion
                </p>
              </th>
<th>
                <p>
                  When to Use
                </p>
              </th>
</tr></thead>
<tbody>
<tr>
<td>
                <p>
                  gradient_norm_convergence_policy
                </p>
              </td>
<td>
                <p>
                  gradient norm &lt; tol
                </p>
              </td>
<td>
                <p>
                  Default. Stationarity based condition
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  objective_tol_convergence_policy
                </p>
              </td>
<td>
                <p>
                  absolute difference between objective steps is small
                </p>
              </td>
<td>
                <p>
                  Well-scaled objectives
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  relative_objective_tol_policy
                </p>
              </td>
<td>
                <p>
                  relative difference between objective steps is small
                </p>
              </td>
<td>
                <p>
                  Scale-invariant convergence
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  combined_convergence_policy
                </p>
              </td>
<td>
                <p>
                  logical combination OR
                </p>
              </td>
<td>
                <p>
                  you need a combination of convergence conditions
                </p>
              </td>
</tr>
</tbody>
</table></div>
<h5>
<a name="math_toolkit.gd_opt.introduction.h7"></a>
        <span class="phrase"><a name="math_toolkit.gd_opt.introduction.termination_policies"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.termination_policies">Termination
        Policies</a>
      </h5>
<div class="informaltable"><table class="table">
<colgroup>
<col>
<col>
<col>
</colgroup>
<thead><tr>
<th>
                <p>
                  Policy
                </p>
              </th>
<th>
                <p>
                  Controls
                </p>
              </th>
<th>
                <p>
                  When to Use
                </p>
              </th>
</tr></thead>
<tbody>
<tr>
<td>
                <p>
                  max_iter_termination_policy
                </p>
              </td>
<td>
                <p>
                  iteration count
                </p>
              </td>
<td>
                <p>
                  Hard safety bound (almost always recommended)
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  wallclock_termination_policy
                </p>
              </td>
<td>
                <p>
                  wall clock time
                </p>
              </td>
<td>
                <p>
                  benchmarking, real-time constraints
                </p>
              </td>
</tr>
</tbody>
</table></div>
<h5>
<a name="math_toolkit.gd_opt.introduction.h8"></a>
        <span class="phrase"><a name="math_toolkit.gd_opt.introduction.constraint_and_projection_polici"></a></span><a class="link" href="introduction.html#math_toolkit.gd_opt.introduction.constraint_and_projection_polici">Constraint
        and Projection Policies</a>
      </h5>
<div class="informaltable"><table class="table">
<colgroup>
<col>
<col>
</colgroup>
<thead><tr>
<th>
                <p>
                  Policy
                </p>
              </th>
<th>
                <p>
                  Constraint Type
                </p>
              </th>
</tr></thead>
<tbody>
<tr>
<td>
                <p>
                  unconstrained_policy
                </p>
              </td>
<td>
                <p>
                  No constraint
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  box_constraints
                </p>
              </td>
<td>
                <p>
                  upper/lower bound clip
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  nonnegativity_constraint
                </p>
              </td>
<td>
                <p>
                  set everything below 0, to 0
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  l2_ball_constraint
                </p>
              </td>
<td>
                <p>
                  2-norm(x) &lt; r
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  l1_ball_constraint
                </p>
              </td>
<td>
                <p>
                  1-norm(x) &lt; r
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  simplex_constraint
                </p>
              </td>
<td>
                <p>
                  Probability simplex
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  function_constraint
                </p>
              </td>
<td>
                <p>
                  custom user provided function wrapper
                </p>
              </td>
</tr>
<tr>
<td>
                <p>
                  unit_sphere_constraint
                </p>
              </td>
<td>
                <p>
                  2-norm(x) = 1
                </p>
              </td>
</tr>
</tbody>
</table></div>
</div>
<div class="copyright-footer">Copyright © 2006-2021 Nikhar Agrawal, Anton Bikineev, Matthew Borland,
      Paul A. Bristow, Marco Guazzone, Christopher Kormanyos, Hubert Holin, Bruno
      Lalande, John Maddock, Evan Miller, Jeremy Murphy, Matthew Pulver, Johan Råde,
      Gautam Sewani, Benjamin Sobotta, Nicholas Thompson, Thijs van den Berg, Daryle
      Walker, Xiaogang Zhang, and Maksym Zhelyeznyakov<p>
        Distributed under the Boost Software License, Version 1.0. (See accompanying
        file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
      </p>
</div>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="../gd_opt.html"><img src="../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../gd_opt.html"><img src="../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../index.html"><img src="../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="gradient_descent.html"><img src="../../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
</body>
</html>