Solvers

Available optimization algorithms and how to use them

General description

Solvers (that is, optimization algorithms) are functions in Manopt. Built-in solvers are located in /manopt/solvers. In principle, all solvers accept the following basic call format:

x = mysolver(problem)

The returned value x is a point on the manifold problem.M. Depending on the properties of your problem and on the guarantees of the solver, x is more or less close to a good minimizer of the cost function described in the problem structure.

Bear in mind that we are dealing with usually nonconvex, and possibly nonsmooth or derivative-free optimization, so that it is in general not guaranteed that x is a global minimizer of the cost. For smooth problems with gradient information though, most decent algorithms guarantee that x is (approximately) a critical point. Typically, we even expect an approximate local minimizer, but even that is usually not guaranteed in all cases: this is a fundamental limitation of continuous optimization.

Min or max?

All provided solvers are minimization algorithms. If you want to maximize your objective function, multiply it by -1 (and likewise for the derivatives of the objective function), as we did in the first example.

All solvers also admit a more complete call format:

[x, xcost, info, options] = mysolver(problem, x0, options)

Inputs:

  • The problem structure specifies the manifold and describes the cost function.
  • Optionally, x0 is an initial guess, or initial iterate, for the solver. It is typically a point on the manifold problem.M, but may be something else depending on the solver. It can be omitted by passing the empty matrix [] instead.
  • Optionally, the options structure is used to fine tune the behavior of the optimization algorithm. On top of hosting the algorithmic parameters, it manages the stopping criteria as well as what information needs to be displayed and / or logged during execution.

Outputs:

  • x is the returned point on the manifold.
  • xcost is the value of the cost function at x.
  • The info struct-array is described below, and contains information collected at each iteration of the solver’s progress.
  • The options structure is returned too, so you can see what default values the solver used on top of the options you (possibly) specified.

Available solvers

The toolbox comes with a handful of solvers.

The most trust-worthy is the trust-regions algorithm. Originally, it is a modification of the code of GenRTR.

The toolbox was designed to accommodate many more solvers though: we have since then added BFGS-style solvers, stochastic gradient descent and more. In particular, we look forward to proposing algorithms for nonsmooth cost functions (which notably arise when L1 penalties are at play). You can also propose your own solvers.

Table of minimization algorithms available in Manopt. The symbol ★ means the algorithm uses that part of \(f\). If it is not available, Manopt automatically tries its best to make up for it (e.g., by calling a provided approximation, or with finite differences). The symbol ☆ means the algorithm can benefit from that feature if it is available, but it is not required. \(P\) means preconditioner. The ★’ for SGD indicates it only needs a partial gradient.
Name \(f\) \(\nabla f\) \(\nabla^2 f\) \(P\) Comment
Trust-regions (RTR) #1 choice for smooth optimization. Still good without true Hessian.
Adaptive regularization by cubics (ARC) Alternative to RTR. Still good without true Hessian.
Steepest descent Gradient descent (GD) with line-search (default is backtracking).
Conjugate gradient Often better than steepest descent. Requires M.transp.
Barzilai-Borwein Gradient descent with BB step-size heuristic. Requires M.transp.
BFGS Limited-memory version of BFGS. Requires M.transp.
SGD ★’ Stochastic gradient algorithm for optimization of large sums.
Particle swarm (PSO) Derivative-free optimizer (DFO) based on a population of points.
Nelder-Mead DFO based on a simplex; requires M.pairmean.

The options structure

In Manopt, all options are optional. Standard options are assigned a default value at the toolbox level in /manopt/core/getGlobalDefaults.m (it’s a core tool, best not to edit it). Solvers then overwrite and complement these options with solver-specific fields. These options are in turn overwritten by the user-specified options, if any.

The subsections below list commonly used options. See each solver’s documentation for specific information (e.g., type help trustregions).

Generic options

The following table lists options to control text output during solver execution, whether additional debugging checks are run, and a (deprecated) option to control memory usage from caching.

Table of generic options used by most solvers.
Field name (options."...") Value type Description
verbosity integer Controls how much information a solver outputs during execution; 0: no output; 1 : output at init and at exit; 2: light output at each iteration; 3+: all you can read.
debug integer If larger than 0, the solver may perform additional computations for debugging purposes.
storedepth integer Maximum number of store structures that may be kept in memory (see the cost description page). As of Manopt 5.0, this is mostly irrelevant because the main solvers do a much better job of discarding stale information on the go.

Options for stopping criterion

Solvers stop running when one of several stopping criteria triggers. The following table lists options for some standard criteria.

Table of generic stopping criterion options used by most solvers.
Field name (options."...") Value type Description
maxiter integer Limits the number of iterations of the solver.
maxtime double Limits the execution time of the solver, in seconds.
tolcost double Stop as soon as the cost drops below this tolerance.
tolgradnorm double Stop as soon as the norm of the gradient drops below this tolerance.
stopfun fun. handle Custom stopping criterion: see below.

Some solvers offer additional stopping criteria: type help solvername to learn more.

Users can also define custom stopping criteria with options.stopfun. The general usage pattern is as follows:

% define your problem structure, then:
options.stopfun = @mystopfun;
x = mysolver(problem, [], options);

function [stopnow, reason] = mystopfun(problem, x, info, last)
    stopnow = false;
    reason = '';
    if xyz % decide if solver should stop based on inputs
        stopnow = true;
        reason = 'This optional message explains why we stopped.';
    end
end

The function handle options.stopfun is called after each iteration completes. As input, it receives:

  • the problem structure,
  • the current point x,
  • the whole info struct-array built so far, and
  • an index last such that info(last) is the structure pertaining to the current iteration (this is because info is pre-allocated, so that info(end) typically does not refer to the current iteration).

The outputs are:

  • a Boolean stopnow: the solver terminates if this is true;
  • optionally, a string reason which explains why the criterion triggered. This is displayed by the solver if options.verbosity > 0.

Consider the following example:

options.stopfun = @mystopfun;
function stopnow = mystopfun(problem, x, info, last)
    stopnow = (last >= 3 && info(last-2).cost - info(last).cost < 1e-3);
end

This tells the solver to exit as soon as two successive iterations combined have decreased the cost by less than \(10^{-3}\).

See also the tools page for a list of built-in ways to use stopfun to good effect. For example:

  • Stop a solver on demand by closing a figure or by deleting a file. These allow to gracefully interrupt a solver interactively without breaking program execution. The second one is especially useful when running code remotely without GUI.
  • Stop a solver using Manopt counters. This makes it easy to stop after a certain budget of function calls, matrix-vector products etc. has been exceeded.

Statsfun option for recording info at each iteration

The function handle options.statsfun is called after each iteration completes. It provides an opportunity to process information about each iteration. The main purpose is to store information, but this option can also be used to print, save, plot, etc.

Here is an example:

% define your problem structure, then:
options.statsfun = @mystatsfun;
[x, fx, info] = mysolver(problem, [], options);

function stats = mystatsfun(problem, x, stats, store)
    stats.current_point = x;
end

This logs all the points visited during the optimization process in the info struct-array returned by the solver. One could also write x to disk during this call, or update a plot (if that is useful).

As input, statsfun receives:

  • the problem structure,
  • the current point x,
  • the stats structure for the current iteration as recorded in the info struct-array, and
  • (optionally) the store structure with the cache for x (and the common cache in store.shared).

The output is the stats structure. This function gives you a chance to modify the stats structure, hence to add fields if you want to.

Bear in mind that structures in a struct-array must all have the same fields: if statsfun adds a certain field to a stats structure, it must do so for all iterations. (In some instances, it is even important that the fields be created in the same order.)

Time spent in statsfun is discounted from execution time recorded in [info.time], as this is typically only used for prototyping, debugging, plotting etc.

If you include store as an input for the statsfun, this lets you access the data you cached, both for that particular iterate. You also get access to the cache in store.shared, common to all points x visited by the solver so far. Note that statsfuncan read but it cannot write to store (modifications are lost).

An alternative to creating a statsfun by hand is to use the statsfunhelper tool. This is sometimes simpler (and allows to pass inline functions). The example above simplifies to:

options.statsfun = statsfunhelper('current_point', @(x) x);

The helper can also be used to log more than one metric, by passing it a structure. In the example below, x_reference is a certain point on the manifold problem.M. The stats structures will include fields current_point and dist_to_ref. Notice how the function handles can take different inputs. See the help of that tool for more info.

metrics.current_point = @(x) x;
metrics.dist_to_ref = @(problem, x) problem.M.dist(x, x_reference);
options.statsfun = statsfunhelper(metrics);

See also how to use Manopt counters to keep track of things such as the number of calls to cost, gradient and Hessian, or other special operations such as matrix-vector products. These counters are registered at every iteration and available in the returned stats structure. They can also be used in a stopping criterion.

statsfun is called after each iteration completes

Upon completion of an iteration, options.statsfun is called with the point x that was reached last, even if that x did not change compared to the previous iteration. For example, with trustregions, this triggers after the accept/reject decision. If the step was accepted, we get that new x, optionally with its store. If the step was rejected, we get the same x as previously, likewise with its store. The nuance is this: there is a stats structure for each iteration, but there is a store structure for each point.

The info struct-array

The various solvers log information at each iteration about their progress. This information is returned in the output info.

This is a struct-array, that is, an array of structures. Read this MathWorks blog post for more on this data container in Matlab.

For example, to extract a vector containing the cost value at each iteration, call [info.cost] with the brackets. To plot the cost function value against computation time, call

plot([info.time], [info.cost]);

The following table lists some indicators that might be present in the info output.

Table of typical information tracked at each iteration and returned in the info struct-array. Specific solvers typically include additional fields: check out their documentation.
Field name ([info."..."]) Value type Description
iter integer Iteration number (0 corresponds to the initial guess).
time double Elapsed execution time until completion of the iterate, in seconds.
cost double Attained value of the cost function.
gradnorm double Attained value for the norm of the Riemannian gradient.

A specific solver may not populate all of these fields and may provide additional fields, which would then be described in the solver’s documentation.

Need to log problem-specific information at each iteration?

Include a statsfun in your options structure: read more here. Also read about Manopt counters to keep track of things such as function calls, Hessian calls, matrix products, etc.

About execution time recorded in info.time

The execution time is logged without incorporating time spent in statsfun, as it usually performs computations that are not needed to solve the optimization problem. If, however, your statsfun influences the final output of the solver (e.g., because it logs information that you then use in your stopfun criterion), then you should add the relevant computation time to the stats.time field.