From 2a63c305bbbd8ec146b18eac6c9206d4f14ac430 Mon Sep 17 00:00:00 2001
From: Maksym Zhelyeznyakov Generic operations common to all distributions are non-member functions make_gradient_descent make_lbfgs make_nag mapairy_distribution norm Graphing, Profiling, and Generating Test Data for Special Functions step unif01 abstract_optimizer gradient_descent gradient_norm_convergence_policy nesterov_accelerated_gradient nesterov_update_policy nonfinite_num_get rvar abstract_optimizer accuracy Generic operations common to all distributions are non-member functions Gradient Desccent Gradient Descent gradient_descent gradient_norm_convergence_policy L-BFGS make_gradient_descent make_lbfgs make_nag makima Nesterov Accelerated Gradient Desccent Nesterov Accelerated Gradient Descent nesterov_accelerated_gradient nesterov_update_policy newton_raphson_iterate norm rvar Graphing, Profiling, and Generating Test Data for Special Functions step unif01
where
- The code below manually minimizes the abover potential energy function for
- N particles over their two angular pozitions.
+ The code below manually minimizes the above potential energy function for
+ N particles over their two angular positions.
is the optimizer learning rate. Using the code the way its written, the optimizer
- runs for 100000 steps. Running tthe program with
+ runs for 100000 steps. Running the program with
-
+
-
@@ -101,7 +101,7 @@
lr is a user defined
- learning rate. For a more complete decription of the theoretical principle
+ learning rate. For a more complete description of the theoretical principle
check the wikipedia
page
Objective&&
- obj : objective funciton to
+ obj : objective function to
minimize
RealType&
lr : learning rate. A larger
value takes larger steps during descent, leading to faster, but more
- unstable convergence. Conversely, small vaues are more stable but take
+ unstable convergence. Conversely, small values are more stable but take
longer to converge.
InitializationPolicy&& ip
- : Initialization policy for ArgumentContainer,
- or the initial guess. By default it is set to tape_initializer_rvar<RealType> which lets the user provide the "initial
- guess" by setting the values of x
- manually. For more info check the Policies section.
+ : Initialization policy for optimizer state and variables. Users may
+ supply a custom initialization policy to control how the argument container
+ and any AD specific runtime state : i.e. reverse-mode tape attachment/reset
+ are initialized. By default, the optimizer uses the user-provided initial
+ values in x and performs the standard reverse mode AD initialization
+ required for gradient evaluation. Custom initialization policies are
+ useful for randomized starts, non rvar AD types, or when gradients are
+ supplied externally. See the reverse-mode autodiff policy documentation
+ for the required initialization policy interface when writing custom
+ policies.
ObjectiveEvalPolicy&&
@@ -151,7 +157,7 @@
GradEvalPolicy&&
- gep : tells the optimzier how
+ gep : tells the optimizer how
to evaluate the gradient of the objective function. By default reverse_mode_gradient_evaluation_policy<RealType>
#include <boost/math/differentiation/autodiff_reverse.hpp>
#include <boost/math/optimization/gradient_descent.hpp>
@@ -319,7 +325,7 @@
const double lr = 1e-3;
./thomson_sphere N
@@ -332,7 +338,7 @@
Below is a plot of the final energy of the system, and its deviation from - the theoretically predicted values. The table of theorical energy values + the theoretically predicted values. The table of theoretical energy values for the problem is from wikipedia.
@@ -346,7 +352,7 @@
Often, we don't want to actually implement our own stepping function, i.e. we care about certain convergence criteria. In the above example, we need - to include the minimier.hpp header: + to include the minimizer.hpp header:
#include <boost/math/optimization/minimizer.hpp>diff --git a/doc/html/math_toolkit/gd_opt/introduction.html b/doc/html/math_toolkit/gd_opt/introduction.html index a3a301e17..dd3794f43 100644 --- a/doc/html/math_toolkit/gd_opt/introduction.html +++ b/doc/html/math_toolkit/gd_opt/introduction.html @@ -7,7 +7,7 @@ - + @@ -28,13 +28,713 @@ Introduction
- Gradient based optimizers are algorithms that use the gradient of a funciton
+ Gradient based optimizers are algorithms that use the gradient of a function
to iteratively find locally extreme points of functions over a set of parameters.
This sections provides a description of a set of gradient optimizers. The
optimizers are written with boost::math::differentiation::reverse_mode::rvar
in mind, however if a way to evaluate the funciton and its gradient is provided,
the optimizers should work in exactly the same way.
+ Below is a table that summarizes the intended usage patterns of the provided + optimizers and policies, and is meant as a practical guide rather than a + strict prescription: +
+|
+ + Optimizer + + |
+
+ + Order + + |
+
+ + Uses Curvature + + |
+
+ + Memory Cost + + |
+
+ + Intended Problem Class + + |
+
+ + When to Use + + |
+
|---|---|---|---|---|---|
|
+ + gradient descent + + |
+
+ + first + + |
+
+ + no + + |
+
+ + low + + |
+
+ + Smooth, well-scaled objectives + + |
+
+ + Baseline method; debugging; when behavior transparency matters + + |
+
|
+ + nesterov accelerated gradient + + |
+
+ + first + + |
+
+ + no + + |
+
+ + low + + |
+
+ + Ill-conditioned or narrow-valley problems + + |
+
+ + When plain gradient descent converges slowly or oscillates + + |
+
|
+ + L-BFGS + + |
+
+ + quasi second order + + |
+
+ + approximate + + |
+
+ + medium + + |
+
+ + Smooth, deterministic objectives + + |
+
+ + When gradients are reliable and faster convergence is needed + + |
+
|
+ + Policy + + |
+
+ + Use case + + |
+
+ + Responsibilities + + |
+
|---|---|---|
|
+ + tape_initializer_rvar + + |
+
+ + User initialzes all varibles manually + + |
+
+ + initializes tape + + |
+
|
+ + random_uniform_initializer_rvar + + |
+
+ + Initializes all variables with a random number between a min and + max value + + |
+
+ + Initializes variables. Initializes tape. + + |
+
|
+ + costant_initializer_rvar + + |
+
+ + Initializes all variables with a constant + + |
+
+ + Initializes variables. Initializes tape. + + |
+
|
+ + Policy + + |
+
+ + Use case + + |
+
+ + Responsibilities + + |
+
|---|---|---|
|
+ + reverse_mode_function_eval_policy + + |
+
+ + Default. User with boost reverse mode autodiff + + |
+
+ + tells the optimizer how to evaluate the objective + + |
+
|
+ + reverse_mode_gradient_evaluation_policy + + |
+
+ + Default. User with boost reverse mode autodiff + + |
+
+ + tells the optimizer how to evaluate the gradients of an objective + + |
+
+ These policies are intended to use with boost reverse mode autodiff. If you + need to use the optimizers with a custom AD variable, or by providing the + gradient of an objective manually, check the docs for policies to see how + the policies are implemented. +
++ the table below summarizes the two line search policies provided for use + with LBFGS. +
+|
+ + Policy + + |
+
+ + Enforced Conditions + + |
+
+ + Per iteration cost + + |
+
+ + Convergence + + |
+
+ + Use case + + |
+
|---|---|---|---|---|
|
+ + Strong Wolfe + + |
+
+ + function decrease. curvature condition + + |
+
+ + higher + + |
+
+ + faster + + |
+
+ + most of the time + + |
+
|
+ + Armijo + + |
+
+ + function decrease only + + |
+
+ + lower + + |
+
+ + slower + + |
+
+ + you know what you're doing + + |
+
|
+ + Policy + + |
+
+ + Criterion + + |
+
+ + When to Use + + |
+
|---|---|---|
|
+ + gradient_norm_convergence_policy + + |
+
+ + gradient norm < tol + + |
+
+ + Default. Stationarity based condition + + |
+
|
+ + objective_tol_convergence_policy + + |
+
+ + absolute difference between objective steps is small + + |
+
+ + Well-scaled objectives + + |
+
|
+ + relative_objective_tol_policy + + |
+
+ + relative difference between objective steps is small + + |
+
+ + Scale-invariant convergence + + |
+
|
+ + combined_convergence_policy + + |
+
+ + logical combination OR + + |
+
+ + you need a combination of convergence conditions + + |
+
|
+ + Policy + + |
+
+ + Controls + + |
+
+ + When to Use + + |
+
|---|---|---|
|
+ + max_iter_termination_policy + + |
+
+ + iteration count + + |
+
+ + Hard safety bound (almost always recommended) + + |
+
|
+ + wallclock_termination_policy + + |
+
+ + wall clock time + + |
+
+ + benchmarking, real-time constraints + + |
+
|
+ + Policy + + |
+
+ + Constraint Type + + |
+
|---|---|
|
+ + unconstrained_policy + + |
+
+ + No constraint + + |
+
|
+ + box_constraints + + |
+
+ + upper/lower bound clip + + |
+
|
+ + nonnegativity_constraint + + |
+
+ + set everything below 0, to 0 + + |
+
|
+ + l2_ball_constraint + + |
+
+ + 2-norm(x) < r + + |
+
|
+ + l1_ball_constraint + + |
+
+ + 1-norm(x) < r + + |
+
|
+ + simplex_constraint + + |
+
+ + Probability simplex + + |
+
|
+ + function_constraint + + |
+
+ + custom user provided function wrapper + + |
+
|
+ + unit_sphere_constraint + + |
+
+ + 2-norm(x) = 1 + + |
+
RealType lr
: learning rate. Larger values take larger steps (faster but potentially
- unsable). Smaller values are more stable but converge more slowly.
+ unstable). Smaller values are more stable but converge more slowly.
RealType mu
@@ -184,10 +184,15 @@
InitializationPolicy&& ip
- : initialization policy for the optimizer state and variables. For NAG,
- this also initializes the internal momentum/velocity state. By default
- the optimizer uses the same initializer as gradient descent and initializes
- velocity to zero.
+ : Initialization policy for optimizer state and variables. Users may
+ supply a custom initialization policy to control how the argument container
+ and any AD-specific runtime state : i.e. reverse-mode tape attachment/reset
+ are initialized. By default, the optimizer uses the same initialization
+ as gradient descent, taking the user provided initial values in x and
+ initializing the internal momentum/velocity state to zero. Custom initialization
+ policies are useful for randomized starts, non rvar AD types, or when
+ gradients are supplied externally. Check the docs for Reverse Mode autodiff
+ policies for initialization policy structure to write custom policies.
ObjectiveEvalPolicy&&
diff --git a/doc/html/optimization.html b/doc/html/optimization.html
index b7f7ef96e..56db63645 100644
--- a/doc/html/optimization.html
+++ b/doc/html/optimization.html
@@ -37,9 +37,9 @@
Gradient Based Optimizers