Boost C++ Libraries Home Libraries People FAQ More

PrevUpHomeNext

Introduction

Gradient based optimizers are algorithms that use the gradient of a function to iteratively find locally extreme points of functions over a set of parameters. This sections provides a description of a set of gradient optimizers. The optimizers are written with boost::math::differentiation::reverse_mode::rvar in mind, however if a way to evaluate the funciton and its gradient is provided, the optimizers should work in exactly the same way.

Below is a table that summarizes the intended usage patterns of the provided optimizers and policies, and is meant as a practical guide rather than a strict prescription:

List of Optimizers

Optimizer

Order

Uses Curvature

Memory Cost

Intended Problem Class

When to Use

gradient descent

first

no

low

Smooth, well-scaled objectives

Baseline method; debugging; when behavior transparency matters

nesterov accelerated gradient

first

no

low

Ill-conditioned or narrow-valley problems

When plain gradient descent converges slowly or oscillates

L-BFGS

quasi second order

approximate

medium

Smooth, deterministic objectives

When gradients are reliable and faster convergence is needed

Optimizer Policies

Initialization Policies

Policy

Use case

Responsibilities

tape_initializer_rvar

User initialzes all varibles manually

initializes tape

random_uniform_initializer_rvar

Initializes all variables with a random number between a min and max value

Initializes variables. Initializes tape.

costant_initializer_rvar

Initializes all variables with a constant

Initializes variables. Initializes tape.

Evaluation Policies

Policy

Use case

Responsibilities

reverse_mode_function_eval_policy

Default. User with boost reverse mode autodiff

tells the optimizer how to evaluate the objective

reverse_mode_gradient_evaluation_policy

Default. User with boost reverse mode autodiff

tells the optimizer how to evaluate the gradients of an objective

These policies are intended to use with boost reverse mode autodiff. If you need to use the optimizers with a custom AD variable, or by providing the gradient of an objective manually, check the docs for policies to see how the policies are implemented.

LBFGS line search policies

the table below summarizes the two line search policies provided for use with LBFGS.

Policy

Enforced Conditions

Per iteration cost

Convergence

Use case

Strong Wolfe

function decrease. curvature condition

higher

faster

most of the time

Armijo

function decrease only

lower

slower

you know what you're doing

Minimizer Policies

Convergence Policies

Policy

Criterion

When to Use

gradient_norm_convergence_policy

gradient norm < tol

Default. Stationarity based condition

objective_tol_convergence_policy

absolute difference between objective steps is small

Well-scaled objectives

relative_objective_tol_policy

relative difference between objective steps is small

Scale-invariant convergence

combined_convergence_policy

logical combination OR

you need a combination of convergence conditions

Termination Policies

Policy

Controls

When to Use

max_iter_termination_policy

iteration count

Hard safety bound (almost always recommended)

wallclock_termination_policy

wall clock time

benchmarking, real-time constraints

Constraint and Projection Policies

Policy

Constraint Type

unconstrained_policy

No constraint

box_constraints

upper/lower bound clip

nonnegativity_constraint

set everything below 0, to 0

l2_ball_constraint

2-norm(x) < r

l1_ball_constraint

1-norm(x) < r

simplex_constraint

Probability simplex

function_constraint

custom user provided function wrapper

unit_sphere_constraint

2-norm(x) = 1


PrevUpHomeNext