# Solvers

Available optimization algorithms and how to use them

## General description

Solvers (that is, optimization algorithms) are functions in Manopt. Built-in solvers are located in `/manopt/solvers`

. In principle, all solvers accept the following basic call format:

`x = mysolver(problem)`

The returned value `x`

is a point on the manifold `problem.M`

. Depending on the properties of your problem and on the guarantees of the solver, `x`

is more or less close to a good minimizer of the cost function described in the `problem`

structure.

Bear in mind that we are dealing with usually nonconvex, and possibly nonsmooth or derivative-free optimization, so that it is in general not guaranteed that `x`

is a global minimizer of the cost. For smooth problems with gradient information though, most decent algorithms guarantee that `x`

is (approximately) a critical point. Typically, we even expect an approximate local minimizer, but even that is usually not guaranteed in all cases: this is a fundamental limitation of continuous optimization.

All provided solvers are *minimization* algorithms. If you want to *maximize* your objective function, multiply it by -1 (and likewise for the derivatives of the objective function), as we did in the first example.

All solvers also admit a more complete call format:

`x, xcost, info, options] = mysolver(problem, x0, options) [`

**Inputs:**

- The
`problem`

structure specifies the manifold and describes the cost function. - Optionally,
`x0`

is an initial guess, or initial iterate, for the solver. It is typically a point on the manifold`problem.M`

, but may be something else depending on the solver. It can be omitted by passing the empty matrix`[]`

instead. - Optionally, the
`options`

structure is used to fine tune the behavior of the optimization algorithm. On top of hosting the algorithmic parameters, it manages the stopping criteria as well as what information needs to be displayed and / or logged during execution.

**Outputs:**

`x`

is the returned point on the manifold.`xcost`

is the value of the cost function at`x`

.- The
`info`

struct-array is described below, and contains information collected at each iteration of the solver’s progress. - The
`options`

structure is returned too, so you can see what default values the solver used on top of the options you (possibly) specified.

## Available solvers

The toolbox comes with a handful of solvers.

The most trust-worthy is the trust-regions algorithm. Originally, it is a modification of the code of GenRTR.

The toolbox was designed to accommodate many more solvers though: we have since then added BFGS-style solvers, stochastic gradient descent and more. In particular, we look forward to proposing algorithms for nonsmooth cost functions (which notably arise when L1 penalties are at play). You can also propose your own solvers.

Name | Requires (benefits of) | Comment | Call |
---|---|---|---|

Trust-regions (RTR) | Cost, gradient (Hessian, approximate Hessian, preconditioner) | #1 choice for smooth optimization; uses finite differences (FD) of the gradient in the absence of Hessian. | `trustregions` |

Adaptive regularization by cubics (ARC) | Cost, gradient (Hessian, approximate Hessian) | Alternative to RTR; uses FD of the gradient in the absence of Hessian. | `arc` |

Steepest descent | Cost, gradient | Gradient descent (GD) with line-search (default is backtracking-based). | `steepestdescent` |

Conjugate gradient | Cost, gradient (preconditioner) | Often performs better than steepest descent. | `conjugategradient` |

Barzilai-Borwein | Cost, gradient | Gradient descent with BB step size heuristic. | `barzilaiborwein` |

BFGS | Cost, gradient | Limited-memory version of BFGS. | `rlbfgs` |

SGD | Partial gradient (preconditioner) [Cost not needed] | Stochastic gradient algorithm for optimization of large sums. | `stochasticgradient` |

Particle swarm (PSO) | Cost | Derivative-free optimizer (DFO) based on a population of points. | `pso` |

Nelder-Mead | Cost | DFO based on a simplex; requires `M.pairmean` ; limited to (very) low-dimensional problems. |
`neldermead` |

## The options structure

In Manopt, *all* options are optional. Standard options are assigned a default value at the toolbox level in `/manopt/core/getGlobalDefaults.m`

(it’s a core tool, best not to edit it). Solvers then overwrite and complement these options with solver-specific fields. These options are in turn overwritten by the user-specified options, if any.

The subsections below list commonly used options. See each solver’s documentation for specific information (e.g., type `help trustregions`

).

### Generic options

The following table lists options to control text output during solver execution, whether additional debugging checks are run, and a (deprecated) option to control memory usage from caching.

Field name (`options."..."` ) |
Value type | Description |
---|---|---|

`verbosity` |
integer | Controls how much information a solver outputs during execution; 0: no output; 1 : output at init and at exit; 2: light output at each iteration; 3+: all you can read. |

`debug` |
integer | If larger than 0, the solver may perform additional computations for debugging purposes. |

`storedepth` |
integer | Maximum number of `store` structures that may be kept in memory (see the cost description page). As of Manopt 5.0, this is mostly irrelevant because the main solvers do a much better job of discarding stale information on the go. |

### Options for stopping criterion

Solvers stop running when one of several stopping criteria triggers. The following table lists options for some standard criteria.

Field name (`options."..."` ) |
Value type | Description |
---|---|---|

`maxiter` |
integer | Limits the number of iterations of the solver. |

`maxtime` |
double | Limits the execution time of the solver, in seconds. |

`tolcost` |
double | Stop as soon as the cost drops below this tolerance. |

`tolgradnorm` |
double | Stop as soon as the norm of the gradient drops below this tolerance. |

`stopfun` |
fun. handle | Custom stopping criterion: see below. |

Some solvers offer additional stopping criteria.

Users can also define **custom stopping criteria** with `options.stopfun`

. The general usage pattern is as follows:

```
% define your problem structure, then:
options.stopfun = @mystopfun;
x = mysolver(problem, [], options);
function [stopnow, reason] = mystopfun(problem, x, info, last)
if xyz % decide if should stop based on inputs
stopnow = true;
reason = 'This optional message explains why we stopped.';
else
stopnow = false;
reason = '';
end
end
```

The function handle `options.stopfun`

is called after each iteration completes. As input, it receives:

- the
`problem`

structure, - the current point
`x`

, - the whole
`info`

struct-array built so far, and - an index
`last`

such that`info(last)`

is the structure pertaining to the current iteration (this is because`info`

is pre-allocated, so that`info(end)`

typically does*not*refer to the current iteration).

The outputs are:

- a Boolean
`stopnow`

: the solver terminates if this is true; - optionally, a string
`reason`

which explains why the criterion triggered. This is displayed by the solver if`options.verbosity > 0`

.

Consider the following example:

```
options.stopfun = @mystopfun;
function stopnow = mystopfun(problem, x, info, last)
stopnow = (last >= 3 && info(last-2).cost - info(last).cost < 1e-3);
end
```

This tells the solver to exit as soon as two successive iterations combined have decreased the cost by less than \(10^{-3}\).

See also the tools page for a list of built-in ways to use `stopfun`

to good effect. For example:

- Stop a solver on demand by closing a figure or by deleting a file. These allow to gracefully interrupt a solver interactively without breaking program execution. The second one is especially useful when running code remotely without GUI.
- Stop a solver using Manopt counters. This makes it easy to stop after a certain budget of function calls, matrix-vector products etc. has been exceeded.

### Statsfun option for recording info at each iteration

The function handle `options.statsfun`

is called after each iteration completes. It provides an opportunity to process information about each iteration. The main purpose is to store information, but this option can also be used to print, save, plot, etc.

Here is an example:

```
% define your problem structure, then:
options.statsfun = @mystatsfun;
x, fx, info] = mysolver(problem, [], options);
[
function stats = mystatsfun(problem, x, stats, store)
stats.current_point = x;
end
```

This logs all the points visited during the optimization process in the `info`

struct-array returned by the solver. One could also write `x`

to disk during this call, or update a plot (if that is useful).

As input, `statsfun`

receives:

- the
`problem`

structure, - the current point
`x`

, - the
`stats`

structure for the current iteration as recorded in the`info`

struct-array, and - (optionally) the
`store`

structure with the cache for`x`

(and the common cache in`store.shared`

).

The output is the `stats`

structure. This function gives you a chance to modify the `stats`

structure, hence to add fields if you want to.

Bear in mind that structures in a struct-array must *all* have the same fields: if `statsfun`

adds a certain field to a `stats`

structure, it must do so for *all* iterations. (In some instances, it is even important that the fields be created in the same order.)

Time spent in `statsfun`

is discounted from execution time recorded in `[info.time]`

, as this is typically only used for prototyping, debugging, plotting etc.

If you include `store`

as an input for the `statsfun`

, this lets you access the data you cached, both for that particular iterate. You also get access to the cache in `store.shared`

, common to *all* points `x`

visited by the solver so far. Note that `statsfun`

can read but it cannot write to `store`

(modifications are lost).

An alternative to creating a `statsfun`

by hand is to use the `statsfunhelper`

tool. This is sometimes simpler (and allows to pass inline functions). The example above simplifies to:

`options.statsfun = statsfunhelper('current_point', @(x) x);`

The helper can also be used to log more than one metric, by passing it a structure. In the example below, `x_reference`

is a certain point on the manifold `problem.M`

. The stats structures will include fields `current_point`

and `dist_to_ref`

. Notice how the function handles can take different inputs. See the help of that tool for more info.

```
metrics.current_point = @(x) x;
metrics.dist_to_ref = @(problem, x) problem.M.dist(x, x_reference);
options.statsfun = statsfunhelper(metrics);
```

See also how to use Manopt counters to keep track of things such as the number of calls to cost, gradient and Hessian, or other special operations such as matrix-vector products. These counters are registered at every iteration and available in the returned stats structure. They can also be used in a stopping criterion.

`statsfun`

is called after *each*iteration

*completes*

Upon completion of an iteration, `options.statsfun`

is called with the point `x`

that was reached last, *even if* that `x`

did not change compared to the previous iteration. For example, with `trustregions`

, this triggers after the accept/reject decision. If the step was accepted, we get that new `x`

, optionally with its `store`

. If the step was rejected, we get the same `x`

as previously, likewise with its `store`

. The nuance is this: there is a `stats`

structure for each *iteration*, but there is a `store`

structure for each *point*.

### Option to specify a line-search

Some solvers, such as `steepestdescent`

and `conjugategradient`

, need to solve a line-search problem at each iteration. That is, they need to (approximately) solve the one-dimensional optimization problem:

\[
\min_{t\geq 0} \phi(t) = f(\Retr_x(td)),
\]

where \(x\) is the current point on the manifold, \(d\) is a tangent vector at \(x\) (the search direction), \(\Retr\) is the retraction on the manifold and \(f\) is the cost function. Assuming \(d\) is a descent direction, there exists \(t > 0\) such that \(\phi(t) < \phi(0) = f(x)\). The purpose of a line-search algorithm is to find such a real number \(t\).

Manopt includes certain generic purpose line-search algorithms. To force the use of one of them or of your own, specify this in the options structure (not in the problem structure), for example as follows:

`options.linesearch = @linesearch_adaptive;`

Each line-search algorithm accepts its own options which can be added in this same `options`

structure passed to the main solver. See each line-search’s help for details.

For certain problems, you may want to implement your own line-search, typically in order to exploit structure specific to the problem at hand. To this end, it is best to start from an existing line-search function and to adapt it. Alternatively (and perhaps more easily), you may specify a `linesearch`

function in the `problem`

structure (see the cost description page) and use a line-search that uses it, to incorporate the additional information you supply there. Do not hesitate to ask for help on the forum if you have questions on this feature.

## The info struct-array

The various solvers log information at each iteration about their progress. This information is returned in the output `info`

.

This is a struct-array, that is, an array of structures. Read this MathWorks blog post for more on this data container in Matlab.

For example, to extract a vector containing the cost value at each iteration, call `[info.cost]`

*with the brackets*. To plot the cost function value against computation time, call

`plot([info.time], [info.cost]);`

The following table lists some indicators that might be present in the `info`

output.

Field name (`[info."..."]` ) |
Value type | Description |
---|---|---|

`iter` |
integer | Iteration number (0 corresponds to the initial guess). |

`time` |
double | Elapsed execution time until completion of the iterate, in seconds. |

`cost` |
double | Attained value of the cost function. |

`gradnorm` |
double | Attained value for the norm of the Riemannian gradient. |

A specific solver may not populate all of these fields and may provide additional fields, which would then be described in the solver’s documentation.

Include a `statsfun`

in your `options`

structure: read more here. Also read about Manopt counters to keep track of things such as function calls, Hessian calls, matrix products, etc.

`info.time`

The execution time is logged *without* incorporating time spent in `statsfun`

, as it usually performs computations that are not needed to solve the optimization problem. If, however, your `statsfun`

influences the final output of the solver (e.g., because it logs information that you then use in your `stopfun`

criterion), then you should add the relevant computation time to the `stats.time`

field.