Foreword

With Manopt, you can solve optimization problems on manifolds and on linear spaces (e.g., matrix spaces) using state-of-the-art algorithms, with minimal effort. The toolbox targets great flexibility in the problem description and comes with advanced features, such as caching.

The toolbox architecture is based on a separation of the manifolds, the solvers and the problem descriptions. For basic use, one only needs to pick a manifold from the library, describe the cost function (and possible derivatives) on this manifold and pass it on to a solver. Accompanying tools help the user in common tasks such as numerically checking whether the cost function agrees with its derivatives up to the appropriate order etc.

This is a prototyping toolbox, designed based on the idea that the costly part of solving an optimization problem is querying the cost function, and not the inner machinery of the solver. It is also work in progress: feedback and contributions are welcome!

Examples are available.

A short blog post gives an informal overview of optimization on manifolds. It may be a good start to get a general feeling. There is also a 5 minute video giving an overview of the general concept.

The about page links to two books to pick up the mathematical foundations of this topic.

Here are a one-hour video and a two-hour video introducing the basics of differential geometry and Riemannian geometry for optimization on smooth manifolds.

It helps us to know our users. If you'd like to, please follow this link to be on our user list. Thanks!

We'd love to hear about your papers! Please fill in this quick form to add yours to the list. The list appears here.

Download

Download    The current version is 7.1 and was packaged on Sep. 30, 2022. The file is about 750 Kb.

Also on GitHub (that is where the latest version is).

Install

  1. Unzip and copy the whole manopt directory you just downloaded in a location of your choice, say, in /my/directory/.
  2. Go to /my/directory/manopt/ at the Matlab prompt and execute importmanopt.
  3. You may save this path for your next Matlab sessions (via savepath).

Check

Go to /my/directory/manopt/checkinstall/ and run the script basicexample.m. If there are no errors, you are done! Otherwise, feel free to contact us.

The math

In this first example, we compute a dominant eigenvector of a symmetric matrix $A \in \mathbb{R}^{n\times n}$. Let $\lambda_1 \geq \cdots \geq \lambda_n$ be its eigenvalues. The largest eigenvalue, $\lambda_1$, is known to be the optimal value for the following optimization problem:

$$\max\limits_{x\in\mathbb{R}^n, x \neq 0} \frac{x^\top A x}{x^\top x}.$$

This can be rewritten as follows:

$$\min\limits_{x\in\mathbb{R}^n, \|x\| = 1} -x^\top A x.$$

The cost function and its gradient in $\mathbb{R}^n$ read:

$$
\begin{align}
    f(x) & = -x^\top A x,\\
    \nabla f(x) & = -2Ax.
\end{align}
$$

The constraint on the vector $x$ requires that $x$ be of unit 2-norm, that is, $x$ is a point on the sphere (one of the nicest manifolds):

$$\mathbb{S}^{n-1} = \{x \in \mathbb{R}^n : x^\top x = 1\}.$$

This is all the information we need to apply Manopt to our problem.

Users interested in how optimization on manifolds works will be interested in the following too: the cost function is smooth on $\mathbb{S}^{n-1}$. Its Riemannian gradient on $\mathbb{S}^{n-1}$ at $x$ is a tangent vector to the sphere at $x$. It can be computed as the projection from the usual gradient $\nabla f(x)$ to that tangent space using the orthogonal projector $\mathrm{Proj}_x u = (I-xx^\top)u$:

$$\mathrm{grad}\,f(x) = \mathrm{Proj}_x \nabla f(x) = -2(I-xx^\top)Ax.$$

This is an example of a mathematical relationship between the Euclidean gradient $\nabla f$, which we often already know how to compute from calculus courses, and the Riemannian gradient $\mathrm{grad}\,f$, which is needed for the optimization. Fortunately, in Manopt the conversion happens behind the scenes via a function called egrad2rgrad and we only need to compute $\nabla f$. This website can help in figuring out a formula for $\nabla f$.

We solve this simple optimization problem using Manopt to illustrate the most basic usage of the toolbox. For additional theory, see the two books linked on the about page.

The code

Solving this optimization problem using Manopt requires little Matlab code:

% Generate random problem data.
n = 1000;
A = randn(n);
A = .5*(A+A.');

% Create the problem structure.
manifold = spherefactory(n);
problem.M = manifold;

% Define the problem cost function and its Euclidean gradient.
problem.cost  = @(x) -x'*(A*x);
problem.egrad = @(x) -2*A*x;      % notice the 'e' in 'egrad' for Euclidean

% Numerically check gradient consistency (optional).
checkgradient(problem);

% Solve.
[x, xcost, info, options] = trustregions(problem);

% Display some statistics.
figure;
semilogy([info.iter], [info.gradnorm], '.-');
xlabel('Iteration number');
ylabel('Norm of the gradient of f');

Let us look at the code bit by bit. First, we generate some data for our problem and execute these two lines:

manifold = spherefactory(n);
problem.M = manifold;

The call to spherefactory returns a structure describing the manifold $\mathbb{S}^{n-1}$, i.e., the sphere. This manifold corresponds to the constraint appearing in our optimization problem. For other constraints, take a look at the various supported manifolds. The second instruction creates a structure named problem and sets the field problem.M to contain the manifold structure. The problem structure is populated with everything a solver might need to know about the problem in order to solve it, such as the cost function and its gradient:

problem.cost = @(x) -x'*(A*x);
problem.egrad = @(x) -2*A*x;

The cost function (to be minimized: Manopt always minimizes) and its derivatives are specified as function handles. Notice how the gradient was specified as the Euclidean gradient of $f$, i.e., $\nabla f(x) = -2Ax$ in the function egrad (mind the "e"). The conversion to the Riemannian gradient happens behind the scene. This is particularly useful when one is working with a more complicated manifold.

An alternative to the definition of the gradient is to specify the Riemannian gradient directly, possibly calling Manopt's egrad2rgrad conversion tool explicitly. This would replace the line problem.egrad = ... with:

problem.grad = @(x) manifold.egrad2rgrad(x, -2*A*x);

This is useful if an expression for the Riemannian gradient is known for example, and it is natural to use that explicitly. Mind the names: problem.grad is to specify the Riemannian gradient. If you want to specify the Euclidean gradient, the correct name is problem.egrad, with an "e". For day to day use, egrad is often the preferred way to go.

With Manopt 7.0 and Matlab R2021a or later, if you have the Deep Learning Toolbox, then you can also use automatic differentiation (AD) instead of defining the gradient (and even the Hessian) yourself:

problem = manoptAD(problem);

See manoptAD and manoptADhelp for more information about how this works, and about what to do when it does not work. It is important to keep in mind that, while AD is convenient and quite efficient to save human time, it is somewhat inefficient in terms of computation time. Indeed, it is typical for AD to slow-down optimization by about a factor of five. Still, for prototyping, this is often a comfortable option.

Tip! This website can be helpful in figuring out a formula for the Euclidean gradient of your cost function. You can also learn the math to do these computations in Section 4.7 of this book.
Tip! Notice that the functions cost and egrad both compute the product $Ax$, which is likely to be the most expensive operation for large scale problems. This is perfectly fine for prototyping, but less so for a final version of the implementation. See the many ways of describing the cost function for alternatives that reduce redundant computations.
Tip! If you do not specify the gradient, then Manopt approximates it with finite differences. This is slow and can make it difficult to reach accurate solutions, hence this feature should only be used for quick prototyping on low-dimensional manifolds. As in that case the solver may have a hard time reaching points with a small gradient, you can pass an options structure to the solver with options.tolgradnorm set to a larger value to allow it to stop earlier.

The next instruction is not needed to solve the problem but often helps at the prototyping stage:

checkgradient(problem);

The checkgradient tool verifies numerically that the cost function and its gradient agree up to the appropriate order. See the tools section for more details and more helpful tools offered by Manopt. This tool generates the following figure:

checkgradient figure

The blue curve seems to have the same slope as the dashed line over a decent segment (highlighted in orange): that's what we want to see (also check the textual output). We now call a solver for our problem:

[x, xcost, info, options] = trustregions(problem);

This instruction calls trustregions on our problem, without initial guess and without options structure. As a result, the solver generates a random initial guess automatically and resorts to the default values for all options. As a general feature in Manopt, all options are, well, optional. The returned values are x (usually an approximate local minimizer of the cost function), xcost (the cost value attained by x), info (a struct-array containing information about the successive iterations performed by the solver) and options (a structure containing all options used and their values: take a peek to find out what you can parameterize). For more details and more solvers, see the solvers section.

This call issues a warning because the trust-regions algorithm normally requires the Hessian of the cost function, or an approximation of it, to be provided in the problem structure. When the Hessian is not provided, Manopt approximates it using a finite-differencing scheme on the gradient function and warns you about it. You may disable this warning by calling warning('off', 'manopt:getHessian:approx');.

Finally, we access the contents of the struct-array info to display the convergence plot of our solver:

semilogy([info.iter], [info.gradnorm], '.-');
xlabel('Iteration number');
ylabel('Norm of the gradient of f');

This generates the following figure:

Gradient norm converging to zero

For more information on what data is stored in info, see the solvers section.

Heads up! Notice that we write [info.xxx] and not simply info.xxx, because info is a struct-array. Read this MathWorks blog post for further information.

General description

Manifolds in Manopt are represented as structures and are obtained by calling a factory. Built-in factories are located in /manopt/manifolds. Picking a manifold corresponds to specifying a search space for the decision variables. For the special (but common) case of a submanifold, the manifold represents a constraint on the decision variables (such as the sphere, which constrains vectors to have unit norm). In the case of a quotient manifold, the manifold captures an invariance in the cost function (such as the Grassmann manifold). Typically, points on the manifold as well as tangent vectors are represented by matrices, but they could be represented by structures, cells, etc. They could even be represented by data on a GPU.

Available manifolds

Manopt comes with a number of implementations for generically useful manifolds. Of course, manifolds can also be user-defined. The best way to build your own is probably to read the code of some of the standard factories and to adapt what needs to be changed. If you develop an interesting manifold factory and would like to share it, be sure to let us know: we would love to add it to Manopt if it can be of interest to other users!

Name Set Factory
Euclidean space (complex) $\mathbb{R}^{m\times n}$, $\mathbb{C}^{m\times n}$ euclideanfactory(m, n)
euclideancomplexfactory(m, n)
Symmetric matrices $\{ X \in \mathbb{R}^{n\times n} : X = X^\top\}^k$ symmetricfactory(n, k)
Skew-symmetric matrices $\{ X \in \mathbb{R}^{n\times n} : X + X^\top = 0\}^k$ skewsymmetricfactory(n, k)
Centered matrices $\{ X \in \mathbb{R}^{m\times n} : X\mathbf{1}_n = 0_m \}$ centeredmatrixfactory(m, n)
Linear subspaces of linear spaces $\{ x \in E : x = \mathrm{proj}(x) \}$ where $E$ is a linear space and $\mathrm{proj}$ is an orthogonal projector to a subspace. euclideansubspacefactory(E, proj, dim)
Sphere $\{X\in\mathbb{R}^{n\times m} : \|X\|_\mathrm{F} = 1\}$ spherefactory(n, m)
Symmetric sphere $\{X\in\mathbb{R}^{n\times n} : \|X\|_\mathrm{F} = 1, X = X^\top\}$ spheresymmetricfactory(n)
Complex sphere $\{X\in\mathbb{C}^{n\times m} : \|X\|_\mathrm{F} = 1\}$ spherecomplexfactory(n, m)
Oblique manifold $\{X\in\mathbb{R}^{n\times m} : \|X_{:1}\| = \cdots = \|X_{:m}\| = 1\}$ obliquefactory(n, m) (To work with $X\in\mathbb{R}^{m \times n}$ with $m$ unit-norm rows instead of columns: obliquefactory(n, m, true).)
Complex oblique manifold $\{X\in\mathbb{C}^{n\times m} : \|X_{:1}\| = \cdots = \|X_{:m}\| = 1\}$ obliquecomplexfactory(n, m) (To work with unit-norm rows instead of columns: obliquecomplexfactory(n, m, true).)
Complex circle $\{z\in\mathbb{C}^n : |z_1| = \cdots = |z_n| = 1\}$ complexcirclefactory(n)
Phases of real DFT $\{z\in\mathbb{C}^n : |z_k| = 1, z_{1+\operatorname{mod}(k, n)} = \bar{z}_{1+\operatorname{mod}(n-k, n)} \ \forall k\}$ realphasefactory(n)
Stiefel manifold $\{X \in \mathbb{R}^{n \times p} : X^\top X = I_p\}^k$ stiefelfactory(n, p, k)
Complex Stiefel manifold $\{X \in \mathbb{C}^{n \times p} : X^*X = I_p\}^k$ stiefelcomplexfactory(n, p, k)
Generalized Stiefel manifold $\{X \in \mathbb{R}^{n \times p} : X^\top BX = I_p\}$ for some $B \succ 0$ stiefelgeneralizedfactory(n, p, B)
Stiefel manifold, stacked $\{X \in \mathbb{R}^{md \times k} : (XX^\top)_{ii} = I_d\}$ stiefelstackedfactory(m, d, k)
Grassmann manifold $\{\operatorname{span}(X) : X \in \mathbb{R}^{n \times p}, X^\top X = I_p\}^k$ grassmannfactory(n, p, k)
Complex Grassmann manifold $\{\operatorname{span}(X) : X \in \mathbb{C}^{n \times p}, X^*X = I_p\}^k$ grassmanncomplexfactory(n, p, k)
Generalized Grassmann manifold $\{\operatorname{span}(X) : X \in \mathbb{R}^{n \times p}, X^\top BX = I_p\}$ for some $B \succ 0$ grassmannfactory(n, p, B)
Rotation group $\{R \in \mathbb{R}^{n \times n} : R^\top R = I_n, \det(R) = 1\}^k$ rotationsfactory(n, k)
Special Euclidean group $\{ (R, t) \in \mathbb{R}^{n \times n} \times \mathbb{R}^n : R^\top R = I_n, \det(R) = 1 \}^k$ specialeuclideanfactory(n, k)
Unitary matrices $\{ U \in \mathbb{C}^{n \times n} : U^*U = I_n \}^k$ unitaryfactory(n, k)
Hyperbolic manifold $\{ x \in \mathbb{R}^{n+1} : x_0^2 = x_1^2 + \cdots + x_n^2 + 1 \}^m$ with Minkowski metric hyperbolicfactory(n, m)
Essential manifold Epipolar constraint between projected points in two perspective views, see Roberto Tron's page essentialfactory(k, '(un)signed')
Fixed-rank $\{X \in \mathbb{R}^{m \times n} : \operatorname{rank}(X) = k\}$ fixedrankembeddedfactory(m, n, k) (ref)
fixedrankfactory_2factors(m, n, k) (doc)
fixedrankfactory_2factors_preconditioned(m, n, k) (ref)
fixedrankfactory_2factors_subspace_projection(m, n, k) (ref)
fixedrankfactory_3factors(m, n, k) (ref)
fixedrankMNquotientfactory(m, n, k) (ref)
Fixed-rank tensor, Tucker Tensors of fixed multilinear rank in Tucker format fixedranktensorembeddedfactory (ref)
fixedrankfactory_tucker_preconditioned (ref)
Matrices with strictly positive entries $\{ X \in \mathbb{R}^{m\times n} : X_{ij} > 0 \ \forall i, j\}$ positivefactory(m, n)
Symmetric, positive definite matrices $\{ X \in \mathbb{R}^{n\times n} : X = X^\top, X \succ 0\}^k$ sympositivedefinitefactory(n)
Symmetric positive semidefinite, fixed-rank (complex) $\{X \in \mathbb{R}^{n \times n} : X = X^\top \succeq 0, \operatorname{rank}(X) = k\}$ symfixedrankYYfactory(n, k)
symfixedrankYYcomplexfactory(n, k)
Symmetric positive semidefinite, fixed-rank with unit diagonal $\{X \in \mathbb{R}^{n \times n} : X = X^\top \succeq 0, \operatorname{rank}(X) = k, \operatorname{diag}(X) = 1\}$ elliptopefactory(n, k)
Symmetric positive semidefinite, fixed-rank with unit trace $\{X \in \mathbb{R}^{n \times n} : X = X^\top \succeq 0, \operatorname{rank}(X) = k, \operatorname{trace}(X) = 1\}$ spectrahedronfactory(n, k)
Multinomial manifold (strict simplex elements) $\{ X \in \mathbb{R}^{n\times m} : X_{ij} > 0 \forall i,j \textrm{ and } X^\top \mathbf{1}_m = \mathbf{1}_n \}$ multinomialfactory(n, m)
Multinomial doubly stochastic manifold $\{ X \in \mathbb{R}^{n\times n} : X_{ij} > 0 \forall i,j \textrm{ and } X \mathbf{1}_n = \mathbf{1}_n, X^\top \mathbf{1}_n = \mathbf{1}_n \}$ multinomialdoublystochasticfactory(n)
Multinomial symmetric and stochastic manifold $\{ X \in \mathbb{R}^{n\times n} : X_{ij} > 0 \forall i,j \textrm{ and } X \mathbf{1}_n = \mathbf{1}_n, X = X^\top \}$ multinomialsymmetricfactory(n)
Positive definite simplex $\{ (X_1, \ldots, X_k) \in (\mathbb{R}^{n \times n})^k : X_i \succ 0 \forall i \textrm{ and } X_1 + \cdots + X_k = I_n \}$ sympositivedefinitesimplexfactory(n, k)
Positive definite simplex, complex $\{ (X_1, \ldots, X_k) \in (\mathbb{C}^{n \times n})^k : X_i \succ 0 \forall i \textrm{ and } X_1 + \cdots + X_k = I_n \}$ sympositivedefinitesimplexcomplexfactory(n, k)
Sparse matrices with fixed sparsity pattern $\{ X \in \mathbb{R}^{m \times n} : X_{ij} = 0 \Leftrightarrow A_{ij} = 0 \}$ euclideansparsefactory(A) (inefficient implementation)
Constant manifold (singleton) $\{ A \}$ constantfactory(A)

Bear in mind that a set can often be turned into a Riemannian manifold in many different ways, by choosing one or another metric. Which metric is best for a specific application may vary. This is particularly true for the geometries of the fixed-rank matrices. The latter is still a research topic and there is no better method yet than experimenting with various geometries.

Good to know! Need to work on a product of manifolds? For example, are you minimizing a function $f(X, Y)$ where $X$ has unit norm and $Y$ is orthonormal? Or a function $f(X_1, \ldots, X_n)$ where each $X_i$ lives on a same manifold? Then make sure to check out productmanifold and powermanifold in the tools section.

Manifold structure fields

A manifold structure has a number of fields, most of which contain function handles. Here is a list of things you might find in a structure M returned by a manifold factory:

Name Field usage Functionality
Name M.name() Returns a name for the manifold as a string.
Dimension M.dim() Returns the dimension of the manifold.
Metric M.inner(x, u, v) Computes the Riemannian metric $\langle u, v \rangle_x$.
Norm M.norm(x, u) Computes the Riemannian norm $\|u\|_x = \sqrt{\langle u, u \rangle_x}$.
Distance M.dist(x, y) Computes the Riemannian distance $\operatorname{dist}(x, y)$.
Typical distance M.typicaldist() Returns the "scale" of the manifold. This is used by the trust-regions solver for example, to determine default initial and maximal trust-region radii.
Tangent space projector M.proj(x, u) Computes $\operatorname{Proj}_x u$, the orthogonal projection of the vector $u$ from the ambient or total space to the tangent space at $x$ or to the horizontal space at $x$.
Euclidean to Riemannian gradient M.egrad2rgrad(x, egrad) For manifolds embedded in a Euclidean space, converts the gradient of $f$ at $x$ seen as a function in that Euclidean space to the Riemannian gradient of $f$ on the manifold. Allowed to take (storedb, key) as input for caching, but must also allow to be called without it.
Euclidean to Riemannian Hessian M.ehess2rhess(x, egrad, ehess, u) Similarly to egrad2rgrad, converts the Euclidean gradient and Hessian of $f$ at $x$ along a tangent vector $u$ to the Riemannian Hessian of $f$ at $x$ along $u$ on the manifold. Allowed to take (storedb, key) as input for caching, but must also allow to be called without it.
Tangentialize M.tangent(x, u) Re-tangentializes a vector. The input is a vector in the tangent vector representation, which possibly (for example because of error accumulations) is no longer tangent. The output is the "closest" tangent vector to the input. If tangent vectors are represented in the ambient space, this is equivalent to proj.
Tangent to ambient representation M.tangent2ambient(x, u) Tangent vectors are sometimes represented differently from their counterpart in the ambient space. This function returns the ambient space representation of a tangent vector $u$. Useful when defining the Euclidean Hessian ehess for example.
Exponential map M.exp(x, u, t) Computes $\operatorname{Exp}_x(tu)$, the point you reach by following the vector $u$ scaled by $t$ starting at $x$. As of 5.0, this field should only exist if the manifold provides a genuine exponential map. Otherwise, manually fall back to M.retr.
Retraction M.retr(x, u, t) Computes $\operatorname{Retr}_x(tu)$, where $\operatorname{Retr}$ is a retraction: a cheaper proxy for the exponential map.
Logarithmic map M.log(x, y) Computes $\operatorname{Log}_x(y)$, a tangent vector at $x$ pointing toward $y$. This is meant to be the inverse of $\operatorname{Exp}$.
Inverse retraction M.invretr(x, y) Computes the inverse of the retraction: a tangent vector at $x$ pointing toward $y$. Only few manifolds have this implemented right now.
Random point M.rand() Computes a random point on the manifold.
Random vector M.randvec(x) Computes a random, unit-norm tangent vector in the tangent space at $x$.
Zero vector M.zerovec(x) Returns the zero tangent vector at $x$.
Linear combination M.lincomb(x, a1, u1, a2, u2) Computes the tangent vector at $x$: $v = a_1 u_1 + a_2 u_2$, where $a_1, a_2$ are scalars and $u_1, u_2$ are tangent vectors at $x$. The inputs $a_2, u_2$ are optional.
Vector transport M.transp(x, y, u) Computes a tangent vector at $y$ that "looks like" the tangent vector $u$ at $x$. This is not necessarily a parallel transport.
Isometric transport M.isotransp(x, y, u) An isometric vector transport (few manifold implementations offer this, though for some M.transp is isometric: see their documentation).
Pair mean M.pairmean(x, y) Computes the intrinsic mean of $x$ and $y$, that is, a point that lies mid-way between $x$ and $y$ on the geodesic arc joining them.
Hashing function M.hash(x) Computes a string that (almost) uniquely identifies the point $x$ and that can serve as a field name for a structure. (No longer used as of Manopt 2.0.)
Vector representation M.vec(x, u) Returns a real column-vector representation of the tangent vector $u$. The length of the output is always the same and at least M.dim(). This function is linear and invertible on the tangent space at $x$.
Normal representation M.mat(x, u_vec) The inverse of the vec function: returns a tangent vector representation from a column vector such that M.mat(x, M.vec(x, u)) = u.
vec and mat isometry check M.vecmatareisometries() Returns true if M.vec is a linear isometry, i.e., if for all tangent vectors $u,v$, M.inner(x, u, v) == M.vec(x, u).'*M.vec(x, v). Then, M.mat is both the adjoint and the inverse of M.vec (on the tangent space).
Lie group identity M.lie_identity() If the manifold is a Lie group, this function returns the identity element (e.g., for the rotation group in $\mathbb{R}^d$, that would be the identity matrix of size $d$).

Not all manifold factories populate all of these fields, but that's okay: for many purposes, only a subset of these functions are necessary. Notice that it is also very easy to add or replace fields in a manifold structure returned by a factory, which can be desirable to experiment with various retractions, vector transports, etc. If you find ways to improve the built-in geometries, let us know.

General description

Solvers, or optimization algorithms, are functions in Manopt. Built-in solvers are located in /manopt/solvers. In principle, all solvers admit the basic call format x = mysolver(problem). The returned value x is a point on the manifold problem.M. Depending on the properties of your problem and on the guarantees of the solver, x is more or less close to a good minimizer of the cost function described in the problem structure. Bear in mind that we are dealing with usually nonconvex, and possibly nonsmooth or derivative-free optimization, so that it is in general not guaranteed that x is a global minimizer of the cost. For smooth problems with gradient information though, most decent algorithms guarantee that x is (approximately) a critical point. Typically, we even expect an approximate local minimizer, but even that is usually not guaranteed in all cases: this is a fundamental limitation of nonlinear optimization).

Min or max? All provided solvers are minimization algorithms. If you want to maximize your objective function, multiply it by -1 (and accordingly for the derivatives of the objective function if needed), as we did in the first example.

In principle, all solvers also admit a more complete call format: [x, xcost, info, options] = mysolver(problem, x0, options). The output xcost is the value of the cost function at the returned point x. The info struct-array is described below, and contains information collected at each iteration of the solver's progress. The options structure is returned too, so you can see what default values the solver used on top of the options you (possibly) specified. The input x0 is an initial guess, or initial iterate, for the solver. It is typically a point on the manifold problem.M, but may be something else depending on the solver. It can be omitted by passing the empty matrix [] instead. The options structure is used to fine tune the behavior of the optimization algorithm. On top of hosting the algorithmic parameters, it manages the stopping criteria as well as what information needs to be displayed and / or logged during execution.

Available solvers

The toolbox comes with a handful of solvers. The most trust-worthy is the trust-regions algorithm. Originally, it is a modification of the code of GenRTR. The toolbox was designed to accommodate many more solvers though: we have since then added BFGS-style solvers, stochastic gradient descent and more. In particular, we look forward to proposing algorithms for nonsmooth cost functions (which notably arise when L1 penalties are at play). You can also propose your own solvers.

Name Requires (benefits of) Comment Call
Trust-regions (RTR) Cost, gradient (Hessian, approximate Hessian, preconditioner) #1 choice for smooth optimization; uses FD of the gradient in the absence of Hessian. trustregions(...)
Adaptive regularization by cubics (ARC) Cost, gradient (Hessian, approximate Hessian) Alternative to RTR; uses FD of the gradient in the absence of Hessian. arc(...)
Steepest-descent Cost, gradient Simple implementation of GD ; the built-in line-search is backtracking based. steepestdescent(...)
Conjugate-gradient Cost, gradient (preconditioner) Often performs better than steepest-descent. conjugategradient(...)
Barzilai-Borwein Cost, gradient Gradient descent with BB step size heuristic. barzilaiborwein(...)
BFGS
Cost, gradient Limited-memory version of BFGS. rlbfgs(...)
SGD Partial gradient (preconditioner) [no cost needed] Stochastic gradient algorithm for optimization of large sums. stochasticgradient(...)
Particle swarm (PSO) Cost DFO based on a population of points. pso(...)
Nelder-Mead Cost DFO based on a simplex; requires M.pairmean; limited to (very) low-dimensional problems. neldermead(...)

The options structure

In Manopt, all options are optional. Standard options are assigned a default value at the toolbox level in /manopt/core/getGlobalDefaults.m (it's a core tool, best not to edit it). Solvers then overwrite and complement these options with solver-specific fields. These options are in turn overwritten by the user-specified options, if any. Here is a list of commonly used options (see each solver's documentation for specific information):

Field name (options."...") Value type Description
Output and information logging
verbosity integer Controls how much information a solver outputs during execution; 0: no output; 1 : output at init and at exit; 2: light output at each iteration; more: all you can read.
debug integer If larger than 0, the solver may perform additional computations for debugging purposes.
statsfun fun. handle

If you specify a function handle with prototype stats = statsfun(problem, x, stats), it is called after each iteration completes. It receives the problem structure, the current point x and the statistics structure stats that is logged in the info struct-array at the corresponding iteration number. This function gives you a chance to modify the stats structure, hence to add fields if you want to. Bear in mind that structures in a struct-array must all have the same fields, so that if statsfun adds a field to a stats structure, it must do so for all iterations. Time spent in statsfun is discounted from execution time, as this is typically only used for prototyping / debugging.

Example:

options.statsfun = @mystatsfun;
function stats = mystatsfun(problem, x, stats)
    stats.current_point = x;
end

This logs all the points visited during the optimization process in the info struct-array returned by the solver. One could also write x to disk during this call (if that is useful).

You may also provide a function handle with this calling pattern: stats = statsfun(problem, x, stats, store). This additionally lets you access the data stored for that particular iterate in the store structure. As of Manopt 1.0.8, this memory has an additional field: store.shared (it can be read, but not edited here). This field contains "permanent" memory, shared by all points x visited so far.

An alternative is to use the statsfunhelper tool, which is sometimes simpler (and allows to pass inline functions). The example above simplifies to:

options.statsfun = statsfunhelper('current_point', @(x) x);

The helper can also be used to log more than one metric, by passing it a structure. In the example below, x_reference is a certain point on the manifold problem.M. The stats structures will include fields current_point and dist_to_ref. Notice how the function handles can take different inputs. See the help of that tool for more info.

metrics.current_point = @(x) x;
metrics.dist_to_ref = @(problem, x) problem.M.dist(x, x_reference);
options.statsfun = statsfunhelper(metrics);

See also the example on how to use Manopt counters to keep track of things such as cost / gradient / Hessian calls or other special operations such as matrix-vector products. These counters are registered at every iteration and available in the returned stats structure. They can also be used as stopping criterion.

Stopping criteria
maxiter integer Limits the number of iterations of the solver.
maxtime double Limits the execution time of the solver, in seconds.
tolcost double Stop as soon as the cost drops below this tolerance.
tolgradnorm double Stop as soon as the norm of the gradient drops below this tolerance.
stopfun fun. handle

If you specify a function handle with prototype stopnow = stopfun(problem, x, info, last), it is called after each iteration completes with the problem structure, the current point x, the whole info struct-array built so far and an index last such that info(last) is the structure pertaining to the current iteration (this is because info is pre-allocated, so that info(end) typically does not refer to the current iteration). The return value is a Boolean. If stopnow is returned as true, the solver terminates.

Example:

options.stopfun = @mystopfun;
function stopnow = mystopfun(problem, x, info, last)
    stopnow = (last >= 3 && info(last-2).cost - info(last).cost < 1e-3);
end

This tells the solver to exit as soon as two successive iterations combined have decreased the cost by less than 10-3. It is also possible to return a second output, reason: a string that is displayed (if options.verbosity is large enough) to inform the user of why the solver stopped (if it did because of this criterion).

See also the two interactive stopping criteria: by closing a figure, and by deleting a file. This allows to gracefully interrupt a solver when it takes too much time.

See also the example on how to use Manopt counters in a stopping criterion, which makes it easy to stop after a certain budget of function calls, matrix-vector products etc. has been exceeded.

Line-search 
linesearch fun. handle

Some solvers, such as steepestdescent and conjugategradient, need to solve a line-search problem at each iteration. That is, they need to (approximately) solve the one-dimensional optimization problem:
$$\min_{t\geq 0} \phi(t) = f(\operatorname{Retr}_x(td)),$$
where $x$ is the current point on the manifold, $d$ is a tangent vector at $x$ (the search direction), $\operatorname{Retr}$ is the retraction on the manifold and $f$ is the cost function. Assuming $d$ is a descent direction, there exists $t > 0$ such that $\phi(t) < \phi(0) = f(x)$. The purpose of a line-search algorithm is to find such a real number $t$.

Manopt includes certain generic purpose line-search algorithms. To force the use of one of them or of your own, specify this in the options structure (not in the problem structure) as follows: options.linesearch = @linesearch_adaptive; (for example). Each line-search algorithm accepts its own options which can be added in this same options structure passed to the master solver. See each line-search's help for details.

For certain problems, you may want to implement your own line-search, typically in order to exploit structure specific to the problem at hand. To this end, it is best to start from an existing line-search function and to adapt it. Alternatively (and perhaps more easily), you may specify a linesearch function in the problem structure (see the cost description section) and use a line-search that uses it, to incorporate the additional information you supply there. Do not hesitate to ask for help on the forum if you run into trouble here :).

Miscellaneous
storedepth integer Maximum number of store structures that may be kept in memory (see the cost description section). As of Manopt 5.0, this is mostly irrelevant because main solvers do a much better job of discarding stale information on the go.

Keep in mind that a specific solver may not use all of these options and may use additional options, which would then be described on the solver's documentation page or, more commonly, in the help section of the solver's code (e.g.: help trustregions).

Good to know! Need a problem-specific stopping criterion? Include a stopfun in your options structure. See above for details.

The info struct-array

The various solvers log information at each iteration about their progress. This information is returned in the output info, a struct-array, that is, an array of structures. Read this MathWorks blog post for help on dealing with this data container in Matlab. For example, to extract a vector containing the cost value at each iteration, call [info.cost] with the brackets. Here are the typical indicators that might be present in the info output:

Field name ([info."..."]) Value type Description
iter integer Iteration number (0 corresponds to the initial guess).
time double Elapsed execution time until completion of the iterate, in seconds.
cost double Attained value of the cost function.
gradnorm double Attained value for the norm of the gradient.

A specific solver may not populate all of these fields and may provide additional fields, which would then be described in the solver's documentation.

Good to know! Need to log problem-specific information at each iteration? Include a statsfun in your options structure. See above for details. See also an example on Manopt counters to keep track of things such as function calls, Hessian calls, etc.
Heads up! The execution time is logged without incorporating time spent in statsfun, as it usually performs computations that are not needed to solve the optimization problem. If, however, you use information logged by statsfun for your stopfun criterion, and if this is important for your method (i.e., it is not just for convenience during prototyping), you should time the execution time of statsfun and add it to the stats.time field.

General philosophy

An optimization problem in Manopt is represented as a problem structure. The latter must include a field problem.M which contains a structure describing a manifold, as obtained from a factory. On top of this, the problem structure must include some fields that describe the cost function $f$ to be minimized and, possibly, its derivatives.

The solvers do not query these function handles directly. Instead, they call core (internal) tools such as getCost, getGradient, getHessian, etc. These tools consider the available fields in the problem structure and "do their best" to return the required object.

As a result, we gain great flexibility in the cost function description. Indeed, as the needs grow during the life-cycle of the toolbox and new ways of describing the cost function become necessary, it suffices to update the core get* tools to take these new ways into account. This has also made it much easier over time to incorporate (and improve) caching. We seldom have to modify the solvers.

Cost describing fields

You may specify as many of the following fields as you wish in the problem structure. If you specify some function more than once (for example, if you define diff and grad, both of which could be used to compute directional derivatives), the toolbox does not specify which is called (hence, it is better not to, or to be really sure about consistency). Probably, the toolbox would assume the code for diff is more efficient than the code for grad when only a directional derivative is needed, but there is no guarantee. Bottom line: they should be consistent (profile if need be).

In the table below, each function admits three different calling patterns. The first one is the simplest and is perfectly fine for prototyping. The other calling patterns give explicit access to Manopt's caching system, which is documented below.

Good to know! All function handles admit a store structure as extra argument for caching purposes, as explained in the next section. This is an optional feature. For prototyping, it is often easier to write a first version of the code without caching. In any case, Manopt includes some automatic caching.
Field name (problem."...") Prototype Description
cost f = cost(x)
[f, store] = cost(x, store)
f = cost(x, storedb, key)
$f = f(x)$
grad g = grad(x)
[g, store] = grad(x, store)
g = grad(x, storedb, key)
$g = \operatorname{grad} f(x)$
costgrad [f, g] = costgrad(x)
[f, g, store] = costgrad(x, store)
[f, g] = costgrad(x, storedb, key)
Computes both $f = f(x)$ and $g = \operatorname{grad} f(x)$.
egrad eg = egrad(x)
[eg, store] = egrad(x, store)
eg = egrad(x, storedb, key)

For submanifolds of a Euclidean space or quotient spaces with a Euclidean total space, computes $eg = \nabla f(x)$, the gradient of $f$ "as if" it were defined in that Euclidean space. This is passed to M.egrad2rgrad.

Function egrad involves automatic caching for use with ehess.

partialgrad pg = partialgrad(x, I)
[pg, store] = partialgrad(x, I, store)
pg = partialgrad(x, I, storedb, key)

Assume the cost function problem.cost is a sum of many terms, as $f(x) = \sum_{i=1}^{d} f_i(x)$ where $d$ is specified as problem.ncostterms = d. For a subset $I$ of $1\ldots d$, partialgrad(x, I) returns the Riemannian gradient of the partial cost function $f_I(x) = \sum_{i \in I} f_i(x)$.

partialegrad peg = partialegrad(x, I)
[peg, store] = partialegrad(x, I, store)
peg = partialegrad(x, I, storedb, key)

Same as partialgrad but returns the Euclidean partial gradient. This is automatically transformed into a Riemannian partial gradient by Manopt.

approxgrad g = approxgrad(x)
[g, store] = approxgrad(x, store)
g = approxgrad(x, storedb, key)

Approximation for the gradient of the cost at $x$. Solvers asking for the gradient when one is not provided automatically fall back to this approximation. If it is not provided either, a standard finite-difference approximation of the gradient based on the cost is built-in.

This is slow because it involves generatin an orthonormal basis of the tangent space at $x$ and computing a finite difference of the cost along each basis vector. This is useful almost exclusively for prototyping. Because of the limited accuracy, it may be necessary to increase options.tolgradnorm when using this feature.

See /solvers/gradientapproximations.

subgrad g = subgrad(x, tol)
[g, store] = subgrad(x, tol, store)
g = subgrad(x, tol, storedb, key)
Returns a Riemannian subgradient of the cost function at $x$, with a tolerance tol which is a nonnegative real number. If you wish to return the minimal norm subgradient (which may help solvers), see the smallestinconvexhull tool.
diff d = diff(x, u)
[d, store] = diff(x, u, store)
d = diff(x, u, storedb, key)
$d = \operatorname{D}\! f(x)[u]$ defines directional derivatives. If the gradient exists, it can be computed from this (slowly.)
hess h = hess(x, u)
[h, store] = hess(x, u, store)
h = hess(x, u, storedb, key)
$h = \operatorname{Hess} f(x)[u]$, where $u$ represents a tangent vector.
ehess eh = ehess(x, u)
[eh, store] = ehess(x, u, store)
eh = ehess(x, u, storedb, key)

For submanifolds of a Euclidean space, or for quotient spaces with a Euclidean total space, this computes $eh = \nabla^2 f(x)[u]$: the Hessian of $f$ along $u$ "as if" it were defined in that Euclidean space. This is passed to M.ehess2rhess and thus requires the Euclidean gradient to be accessible (egrad). Input $u$ is a representation of the tangent vector. For most manifolds, this is a vector in the ambient space. In general, you may need to call M.tangent2ambient(x, u) to obtain the ambient space equivalent of $u$. This is the case for rotationsfactory for example. The output eh should be a vector in the ambient space.

Function egrad involves automatic caching for use with ehess.

approxhess h = approxhess(x, u)
[h, store] = approxhess(x, u, store)
h = approxhess(x, u, storedb, key)

This can be any mapping from the tangent space at $x$ to itself. Often, one would like for it to be a linear, symmetric operator. Solvers asking for the Hessian when one is not provided automatically fall back to this approximate Hessian. If it is not provided either, a standard finite-difference approximation of the Hessian based on the gradient is built-in.

See /solvers/hessianapproximations.

precon v = precon(x, u)
[v, store] = precon(x, u, store)
v = precon(x, u, storedb, key)

$v = \operatorname{Prec}(x)[u]$, where $\operatorname{Prec}(x)$ is a preconditioner for the Hessian $\operatorname{Hess} f(x)$, that is, $\operatorname{Prec}(x)$ is a symmetric, positive-definite linear operator (w.r.t. the Riemannian metric) on the tangent space at $x$. Ideally, it is cheap to compute and such that solving a linear system in $\operatorname{Prec}^{1/2}(x) \circ \operatorname{Hess} f(x) \circ \operatorname{Prec}^{1/2}(x)$ is easier than without the preconditioner, i.e., it should approximate the inverse of the Hessian.

See /solvers/preconditioners.

sqrtprecon v = sqrtprecon(x, u)
[v, store] = sqrtprecon(x, u, store)
v = sqrtprecon(x, u, storedb, key)
$v = \operatorname{Prec}^{1/2}(x)[u]$, where $\operatorname{Prec}^{1/2}(x)$ is an (operator) square root of a preconditioner for the Hessian $\operatorname{Hess} f(x)$, that is, $\operatorname{Prec}^{1/2}(x)$ is a symmetric, positive-definite linear operator (w.r.t. the Riemannian metric) on the tangent space at $x$, and applying it twice should amount to applying $\operatorname{Prec}(x)$ once. Solvers typically use precon rather than sqrtprecon, but some tools (such as hessianspectrum) can use sqrtprecon to speed up computations.
linesearch t = linesearch(x, u)
[t, store] = linesearch(x, u, store)
t = linesearch(x, u, storedb, key)

Given a point $x$ and a tangent vector $u$ at $x$, assume $u$ is a descent direction. This means there exists $t > 0$ such that $\phi(t) < \phi(0)$ with
$$\phi(t) = f(\operatorname{Retr}_x(td)).$$
Line-search algorithms, which are used by some solvers such as steepestdescent and conjugategradient, are designed to (approximately) minimize $\phi$ at each iteration.

There are built-in, generic ways of doing this. If you have additional structure in your problem that enables you to take a good guess at what $t$ should be, then you can specify it here, in this function handle. This (very much optional) function should return a positive $t > 0$ such that $t$ is a good guess of where to look for a minimizer of $\phi$. The line-search algorithm (if it decides to use this information) starts by looking at the step $td$, and decides to accept it or not based on its internal rules. See the linesearch option in the solver section (options table) for details on available line-search algorithms and how to pick one.

See low_rank_matrix_completion for an example from the literature.

Heads up! StoreDB is a handle class, which means its instances are passed by reference. This means that when a storedb object is passed as input to a function, and that function modifies the storedb object, the calling function sees the changes too (without the need to explicitly return the storedb object). Thus, each storedb object exists only once in memory. This makes for cleaner calling patterns and avoids unnecessary copies. This is not the case for the store structures though, which are passed by copy and thus must be returned if the changes are to be permanent.

Here is one way to address the redundant computation of $Ax$ that appeared in the first example. Replace the cost and gradient description (code lines 11-12) with the following code (we chose to spell out the gradient projection, but that is not necessary: you could also use M.egrad2rgrad).

problem.costgrad = @(x) mycostgrad(A, x);
function [f, g] = mycostgrad(A, x)
    Ax = A*x;
    f = -x'*Ax;
    if nargout == 2
        g = -2*(Ax + f*x);   % or: g = M.egrad2rgrad(x, -2*Ax);
    end
end

Solvers that call subsequently for the cost and the gradient at the same point are able to escape most redundant computations (e.g., steepestdescent and conjugategradient are good at this). This is not perfect though: when the Hessian is requested for example, we can't access our hard work (trustregions would not gain much for example). In the next section, we cover a more sophisticated way of sharing data between components of the cost description.

Caching: how to use the store structure

As demonstrated in the first example, it is often the case that computing $f(x)$ produces intermediate results (such as the product $Ax$) that can be reused in order to compute $\operatorname{grad} f(x)$. More generally, computing anything at a point $x$ may produce intermediate results that could be reused for other computations at $x$. Furthermore, it may happen that a solver calls cost-related functions more than once at the same point $x$. For those cases, it may be beneficial to cache (to store) some of the previously computed objects, or intermediate calculations.

For that purpose, Manopt manages a database of store structures, with a class called StoreDB. For each visited point $x$, a store structure is stored in the database. Only the structures pertaining to the most recently used points are kept in memory (see the options.storedepth option). StoreDB manages a counter to number visited points on the manifold. This way, each point $x$ receives a unique key. This key can be used to interact with the store associated to $x$.

Whenever a solver calls, say, the cost function at some point $x$, the toolbox searches for a store structure pertaining to that $x$ in the database (using its key). If there is one and if problem.cost (for example) admits store as an input and as an output, the store is passed to the cost function. The cost function then performs its duty and gets to modify the store structure at will: it is yourstructure, do whatever you fancy with it. Next time a function is called at the same point $x$ (say, problem.grad), the same store structure is passed along, modified, and stored again. As soon as the solver goes on to explore a new point $x'$, a different store structure is created and maintained in the same way. If the solver then decides to return to the previous $x$ (and options.storedepth is larger than 2), we still benefit from the previously stored work as the previous store structure is still available.

As of Manopt 5.0, by default, the cost value $f(x)$ is cached at every visited point (for as long as the memory associated to that point is retained.) This means that calling getCost(problem, x, storedb, key) multiple times with the same inputs only actually calls the cost function the first time. In practice, this provides good speed-ups for line-search algorithms. Similarly, the gradient and Euclidean gradient are cached by default, which provides speed-ups for a number of solvers. This is made practical by the new store managment system that allows solvers to more quickly discard irrelevant stores, thus minimizing memory usage.

As of Manopt 1.0.8, the store structure also includes a field store.shared. The contents of that field are shared among all visited points $x$. This memory is also readable from options.statsfun (see statsfun documentation and the maxcut example). (Note: this mechanism was originally used to keep track of function calls; as of Manopt 5.0, it is much better to use Manopt counters.)

The information in this paragraph is aimed at solver developers; it may also help users understand what happens under the hood: When given access to storedb and a key associated to $x$ rather than to a specific store, the store of $x$ can be obtained as store = storedb.getStore(key). Put the modified store back in the database with storedb.set(store, key). Access the shared memory directly as storedb.shared, not via store.shared. This is important: store might have a store.shared field, but when storedb and key are explicitly used, store.shared will not be populated or read on get/set. Each point $x$ should be associated to a key, which is obtained by calling storedb.getNewKey(). From time to time, call storedb.purge() to reduce memory usage. Even better, as soon as you know that the store associated to a certain point is no longer useful, call storedb.remove(key) or storedb.removefirstifdifferent(key1, key2).

Here is an example of how we can modify the first example to avoid redundant computations, using the caching mechanism:

problem.cost = @mycost;       % Cost function
function [f, store] = mycost(x, store)

    if ~isfield(store, 'Ax')
        store.Ax = A*x;       % The store memory is associated to a specific x
    end
    Ax = store.Ax;
    
    f = -x'*Ax;               % No need to cache f: cost values are cached 'under the hood'
    
end

problem.egrad = @myegrad;     % Euclidean gradient of the cost function
function [g, store] = myegrad(x, store)

    % This could be placed in a separate function
    % to avoid code duplication.
    if ~isfield(store, 'Ax')
        store.Ax = A*x;
    end
    Ax = store.Ax;

% Euclidean gradient; this is also cached 'under the hood'.
g = -2*Ax; end

It is instructive to execute such code with the profiler activated and to look at how many times each instruction gets executed. You should find that the matrix-vector products $Ax$, which is where all the work happens, are executed exactly as often as they should be, and not more. You can also use Manopt counters to track these products.

Heads up! You should never assume that the gradient function, for example, will be called after the cost function (even though this is usually the case). Always check that the fields you use in the store structure are populated; and if they are not, call the appropriate functions to make up for it, as in the example above.
Good to know! Which variables should I store? As a rule of thumb, store the intermediate computation results which constitute the bottleneck in your computation. This can usually be determined by considering the asymptotic time complexity of each operation. Typically, matrix products of large size are involved in the slowest parts. When in doubt, the Matlab profiler is a tremendous tool to identify the code bits that need special attention. In any case, remember that caching is optional: at the prototyping stage, it is best to keep things simple.

Generic Hessian approximations and preconditioners

If the Hessian is complicated or costly to compute, it may be advantageous to resort to an approximation for it. Likewise, if the Hessian is poorly conditioned, it may be advantageous to provide a preconditioner for it (a cheap, approximate and positive definite inverse of the Hessian). Manopt allows for the definition of generic Hessian approximations and generic preconditioners. The feature is working, but as this is work in progress we do not have many options to show yet. Check out these folders if you are interested: In any case, the trust-regions solver by default works with a finite-difference approximation of the Hessian based on the gradient which has proven effective and robust over the years. See this paper for a proof of global convergence with this approximation. This finite difference approximation is also covered by the analysis in that paper.

Automatic differentiation (AD)

See the help of manoptAD and manoptADhelp for information about how to use automatic differentiation starting with Manopt 7.0. Here is a simple example:

n = 5;
A = randn(n);
A = A+A';
problem.M = spherefactory(n);
problem.cost = @(x) x'*A*x;
problem = manoptAD(problem);
x = trustregions(problem);
AD is slower than (good) hand-written code for gradients and Hessian, but it is great to save human time when prototyping.

A number of generically useful tools in the context of using Manopt are available in /manopt/tools. The multitrace tool is a wrapper around diagsum, which is code by Wynton Moore; multiprod is a wrapper around pagemtimes.


Call Description

Diagnostics tools
  checkdiff(problem, x, u) Numerical check of the directional derivatives of the cost function. From a truncated Taylor expansion, we know that the following holds: $$f(\operatorname{Exp}_x(tu)) - \left[f(x) + t\cdot\operatorname{D}\!f(x)[u]\right] = \mathcal{O}(t^2).$$ Hence, in a log-log plot with $\log(t)$ on the abscissa, the error should behave as $\log(t^2) = 2\log(t)$, i.e., we should observe a slope of 2. This tool produces such a plot and tries to compute the slope of it (tries to, because numerical errors prevent the curve to have a slope of 2 everywhere even if directional derivatives are correct; so you should really just inspect the plot visually). If x and u are omitted, they are picked at random.
  checkgradient(problem, x, u) Numerical check of the gradient of the cost function. Based on the statement that if the gradient exists, then it is the only tangent vector field that satisfies $$\langle \operatorname{grad} f(x), u\rangle_x = \operatorname{D}\!f(x)[u],$$ this tool calls checkdiff first, and it also verifies that the gradient is indeed a tangent vector, by computing the norm of the difference between the gradient and its projection to the tangent space (if a projector is available). Of course, this should be zero.
  checkhessian(problem, x, u)

Numerical check of the Hessian of the cost function. From a truncated Taylor expansion, we know that the following holds: $$f(\operatorname{Exp}_x(tu)) - \left[f(x) + t\cdot\operatorname{D}\!f(x)[u] + \frac{t^2}{2} \cdot \langle \operatorname{Hess} f(x)[u], u \rangle_x\right] = \mathcal{O}(t^3).$$ Hence, in a log-log plot with $\log(t)$ on the abscissa, the error should behave as $\log(t^3) = 3\log(t)$, i.e., we should observe a slope of 3. This tool produces such a plot and tries to compute the slope of it (tries to, because numerical errors prevent the curve to have a slope of 3 everywhere even if the derivatives are correct; so you should really just inspect the plot visually). If x and u are omitted, they are picked at random. The tool also verifies that the Hessian indeed returns a tangent vector, by computing the norm of the difference between $\operatorname{Hess} f(x)[u]$ and its projection to the tangent space (if a projector is available). Of course, this should be zero.

The Hessian is a linear, symmetric operator from the tangent space at $x$ to itself. To verify symmetry, this tool generates two random tangent vectors $u_1$ and $u_2$ and computes the difference $$\langle \operatorname{Hess} f(x)[u_1], u_2 \rangle_x - \langle u_1,  \operatorname{Hess} f(x)[u_2]\rangle_x,$$ which should be zero.

See also the heads up in the blue box below.

  checkretraction(M, x, v) For manifolds M which have a correct exponential map M.exp implemented, this tool allows to check the order of agreement of the retraction M.retr with the exponential. A slope of 2 indicates the retraction is a first-order approximation of the exponential (which is necessary for most (all?) convergence theorems to hold.) A slope of 3 indicates the retraction is second-order, which may be necessary theoretically to prove convergence to second-order KKT points. In practice, this may have little impact. The check is conducted at point x along direction v; these are generated at random if omitted.
  checkmanifold(M) Runs a collection of tests on a manifold structure produced by a factory.
  plotprofile(problem, x, d, t) Plots the cost function along a geodesic or a retraction path starting at $x$, along direction $d$. See help plotprofile for more information.
  surfprofile(problem, x, d1, d2, t1, t2) Plots the cost function, lifted and restricted to a 2-dimensional subspace of the tangent space at $x$. See help surfprofile for more information.

Cost utilities
  problem = manoptAD(problem)
Automatic differentiation tool: given a problem which contains a manifold structure in problem.M and a cost function handle in problem.cost, the tool manoptAD attempts to add function handles for the gradient and possibly also the Hessian of the cost function. There are limitations: type help manoptAD and help manoptADhelp for more information.
  lambdas = hessianspectrum(problem, x, useprecon, storedb, key)

Computes the eigenvalues of the Hessian $H$ at $x$. If a preconditioner $P$ is specified in the problem structure and useprecon is set to 'precon', the eigenvalues of the preconditioned Hessian $HP$ are computed.

This function relies on problem.M.vec and problem.M.mat to pass the computation to the built-in eigs function. For the eigenvalue problem to remain symmetric in the column-vector representation domain, we need M.vec and M.mat to be orthonormal, i.e., isometries (see matvecareisometries in the manifod section). If they are not isometries, computations may take longer. Indeed, let $G$ denote the M.vec operator and let $G^{-1}$ represent the M.mat operator (on the appropriate domain). Then, eigs computes the spectrum of $GHG^{-1}$ or $GHPG^{-1}$, which are identical to, respectively, the spectra of $H$ and $HP$. This is only symmetric if there is no preconditioner and $G^\top = G^{-1}$.

If a preconditioner is used, the symmetry of the eigenvalue problem is lost: $H$ and $P$ are symmetric, but $HP$ is not. If M.vec and M.mat are isometries and the dimension of the manifold is large, it may be useful to restore symmetry by giving this tool a function handle for the square root of the preconditioner, $P^{1/2}$ (optional). Then, eigs is given the problem of computing the spectrum of $GP^{1/2}HP^{1/2}G^\top$ (symmetric), which is equal to the spectrum of $HP$. Typically, the square root of the preconditioner is given via problem.sqrtprecon (see cost description).

storedb and key are optional (see caching system).

  [u, lambda] = hessianextreme(problem, x, side, u0, options, storedb, key)

Computes either an eigenvector / eigenvalue pair associated to a largest or to a smallest eigenvalue of the Hessian of the cost at x, for the cost as described in the problem structure. Choose an extreme side of the spectrum by setting side either to the 'min' or to the 'max' string. The (optional) parameters u0 (initial guess for u) and options are passed on to manoptsolve, then on to the Manopt solver that ultimately computes the eigenpair, by means of Rayleigh quotient optimization over the sphere in the tangent space at x.

storedb and key are optional (see caching system).

  [H, basis] = hessianmatrix(problem, x, basis)

Given a problem structure, a point x on problem.M and a cell basis containing an orthonormal basis of (a subspace of) the tangent space to M at x, returns a matrix H which represents the Hessian of problem.cost at x in the given basis (possibly restricted to the given subspace). If no basis is given, a random orthonormal basis is generated for the full tangent space. In all cases, the used basis is returned as second output.

  cp_problem = criticalpointfinder(problem) Given a problem structure for a twice continuously differentiable cost function $f(x)$, returns a new problem structure for the cost function $g(x) = \frac{1}{2} \| \operatorname{grad} f(x) \|^2_x$, whose optima are all the critical points of the original problem. Thus, running solvers on the new problem from various initial points can help understand the critical points of the original problem. The gradient of $g$ is computed via $\operatorname{grad} g(x) = \operatorname{Hess} f(x)[\operatorname{grad} f(x)]$, and an approximate Hessian can also be generated.

Matrix utilities
  B = multiscale(scale, A) For a 3D matrix A of size nxmxN and a vector scale of length N, returns B, a 3D matrix of the same size as A such that B(:, :, k) = scale(k) * A(:, :, k) for each k.
  tr = multitrace(A) For a 3D matrix A of size nxnxN, returns a column vector tr of length N such that tr(k) = trace(A(:, :, k)) for each k.
  sq = multisqnorm(A) For a 3D matrix A of size nxmxN, returns a column vector sq of length N such that sq(k) = norm(A(:, :, k), 'fro')^2 for each k.
  B = multitransp(A) For a 3D matrix A of size nxmxN, returns B, a 3D matrix of size mxnxN such that B(:, :, k) = A(:, :, k).' for each k.
  B = multihconj(A) For a complex 3D matrix A of size nxmxN, returns B, a complex 3D matrix of size mxnxN such that B(:, :, k) = A(:, :, k)' for each k.
  C = multiprod(A, B) For 3D matrices A of size nxpxN and B of size pxmxN, returns C, a 3D matrix of size nxmxN such that C(:, :, k) = A(:, :, k) * B(:, :, k) for each k.
  B = multiskew(A) For a 3D matrix A of size nxnxN, returns a 3D matrix B the same size as A such that each slice B(:, :, i) is the skew-symmetric part of the slice A(:, :, i), that is, (A(:, :, i)-A(:, :, i).')/2.
  B = multiskewh(A) For a 3D matrix A of size nxnxN, returns a 3D matrix B the same size as A such that each slice B(:, :, i) is the Hermitian skew-symmetric part of the slice A(:, :, i), that is, (A(:, :, i)-A(:, :, i)')/2.
  B = multisym(A) For a 3D matrix A of size nxnxN, returns a 3D matrix B the same size as A such that each slice B(:, :, i) is the symmetric part of the slice A(:, :, i), that is, (A(:, :, i)+A(:, :, i).')/2.
  B = multiherm(A) For a complex 3D matrix A of size nxnxN, returns a complex 3D matrix B the same size as A such that each slice B(:, :, i) is the Hermitian part of the slice A(:, :, i), that is, (A(:, :, i)+A(:, :, i)')/2.
  dfunm, dlogm, dexpm, dsqrtm Fréchet derivatives of the (built-in) matrix functions logm, expm and sqrtm. They return both $\mathrm{fun}(A)$ and $\mathrm{D}\mathrm{fun}(A)[\dot A]$.
  lyapunov_symmetric Tool to solve the Lyapunov matrix equation $AX + XA = C$ when $A = A^*$ (real symmetric or Hermitian), as a pseudo-inverse. Can solve for more than one right-hand side at a time.
  lyapunov_symmetric_eig Same as lyapunov_symmetric but the user supplies the eigenvalue decomposition of $A$ instead of $A$.
  sylvester_nochecks Solves the Sylvester equation $AX + XB = C$, where $A$ is an m-by-m matrix, $B$ is an n-by-n matrix, and $X$ and $C$ are two m-by-n matrices. This is a stripped-down version of Matlab's own sylvester function that bypasses any input checks. This is significantly faster for small m and n, which is often useful in Manopt.
  Q = qr_unique(A) Given $A$ with full columns rank, this computes $Q$ of the same size as $A$ such that $A = QR$, $Q$ has orthonormal columns and $R$ is upper triangular with positive diagonal entries. This fully specifies $Q$. (Matlab's [Q, ~] = qr(A, 0) does not enforce positive diagonal entries of $R$ by default, losing the uniqueness property). This Q-factor is exactly what one would compute through Gram-Schmidt orthonormalization of the columns of $A$, but it is computed differently. Works with 3D arrays (on each slice separately) and with both real and complex matrices.

Manifold utilities
  Mn = powermanifold(M, n) Given M, a structure representing a manifold $\mathcal{M}$, and n, an integer, returns Mn, a structure representing the manifold $\mathcal{M}^n$. The geometry is obtained by element-wise extension. Points and vectors on Mn are represented as cells of length n.
  M = productmanifold(elements) Given elements, a structure with fields A, B, C... containing structures Ma, Mb, Mc... such that Ma is a structure representing a manifold $\mathcal{M}_A$ etc., returns M, a structure representing the manifold $\mathcal{M}_A \times \mathcal{M}_B \times \mathcal{M}_C \times \cdots$. The geometry is obtained by element-wise extension. Points and vectors are represented as structures with the same field names as elements.
  N = tangentspherefactory(M, x) Given a manifold structure M and a point on that manifold x, returns a manifold structure N representing the unit sphere on the tangent space to M at x. This is notably used by the hessianextreme tool.
  N = tangentspacefactory(M, x) Given a manifold structure M and a point on that manifold x, returns a manifold structure N representing the tangent space to M at x. This is notably used by the preconhessiansolve preconditioner.
  vec = lincomb(M, x, vecs, coeffs) Given a cell vecs of $n$ tangent vectors to the manifold M at x and a vector coeffs of $n$ real coefficients, returns the linear combination of the given vectors with the given coefficients. The empty linear combination is the zero vector at x.
  vec = tangent2vec(M, x, B, u) Given a tangent vector u and an orthogonal basis B on the corresponding tangent space, returns the coordinates vec of the vector in that basis. The inverse operation is lincomb, see above.
  G = grammatrix(M, x, vectors) Given $n$ tangent vectors $v_1, \ldots, v_n$ in a cell vectors to the manifold M at point x, returns a symmetric, positive semidefinite matrix G of size $n\times n$ such that $G_{ij} = \langle v_i, v_j \rangle_x$.
  [orthobasis, L] = orthogonalize(M, x, basis) Given a cell basis which contains linearly independent tangent vectors to the manifold M at x, returns an orthogonal basis of the subspace spanned by the give basis. L is an upper triangular matrix containing the coefficients of the linear combinations needed to transform basis into orthobasis. This is essentially a QR factorization, via modified Gram-Schmidt.
  [orthobasis, L] = orthogonalizetwice(M, x, basis) Same as orthogonalize, but calls it twice in sequence for (much) improved numerical stability (at twice the computational cost).
  obasis = tangentorthobasis(M, x, n) Given a point x on the manifold M, generates n unit-norm, pairwise orthogonal vectors in the tangent space at x to M, in a cell. See help for more advanced call patterns.
  [u_norm, coeffs, u] = smallestinconvexhull(M, x, U) Computes u, a tangent vector to M at x contained in the convex hull spanned by the $n$ vectors in the cell U, with minimal norm (according to the Riemannian metric on M). This is obtained by solving a convex quadratic program involving the Gram matrix of the given tangent vectors. The quadratic program is solved using Matlab's built-in quadprog, which requires the optimization toolbox.
  [A, B1, B2] = operator2matrix(M1, x, y, F, B1, B2, M2) Given manifold structures M1 and M2, two points x and y on these manifolds, and a function F encoding a linear operator from the tangent space $T_x M_1$ to the tangent space $T_y M_2$, this tool uses two orthonormal bases B1 and B2 (one for $T_x M_1$, and one for $T_y M_2$; generated at random if omitted), and forms the matrix A which represents the operator F in those bases. In particular, the singular values of A are equal to the singular values of F. If M2 is omitted, then M2 = M1. See the code for more usage modes.

Solver utilities
  [x, cost, info, options] = manoptsolve(problem, x0, options) Gateway function to call a Manopt solver. You may specify which solver to call by setting options.solver to a function handle corresponding to a solver. Otherwise, a solver is picked automatically. This is mainly useful when programming meta algorithms which need to solve a Manopt problem at some point, but one wants to leave the decision of which solver to use up to the final user.
  statsfun = statsfunhelper(name, fun)
statsfun = statsfunhelper(S)
Helper function to place a function handle in the field options.statsfun. See the help about the statsfun option earlier in this tutorial, and/or the help for statsfunhelper from the command line.
  S = statscounters(names) Tool to register Manopt counters: see the example file. This tool can be used in conjunction with the tool incrementcounter to track all sorts of metrics, including function calls, time spent in specific parts of them, particular operations, etc.
  store = incrementcounter(store, countername, increment) Tool to increment a Manopt counter: see the example file. This tool is used in conjunction with the tool statscounters to track all sorts of metrics.
  stopfun = stopifclosedfigure() Interactive stopping criterion to place in options.stopfun. Upon running the solver with this options structure, a special figure opens. If at any point during the solver's execution the figure is closed, the solver gracefully terminates and returns the latest iterate produced so far. Termination may not be immediate as the solver has to finish the current iteration first.
  y = stopifdeletedfile(filename) Interactive stopping criterion to place in options.stopfun. Upon running the solver with this options structure, a special file is created. If at any point during the solver's execution the file is deleted, the solver gracefully terminates and returns the latest iterate produced so far. Termination may not be immediate as the solver has to finish the current iteration first.

Other utilities
  y = sinxoverx(x) Computes $y = \sin(x)/x$, with $\sin(0)/0 = 1$.
  s = getsize(x) Estimates the memory usage of the input variable.
Heads up! When using the checkhessian tool, it is important to obtain both a slope of 3 and to pass the symmetry test. Indeed, the slope test ignores the skew-symmetric part of the Hessian, since $x^\top A x = x^\top \frac{A+A^\top}{2} x$. As a result, if your code for the Hessian has a spurious skew-symmetric part, the slope test is oblivious to it.
Heads up! Still regarding checkhessian: if the exponential map is not available for your manifold, the test may use a retraction instead. If the retraction is only a first-order approximation of the exponential, then the slope test is only expected to succeed at critical points of the cost function (for other points, we can only hope to see a slope of 2, in which case the test is inconclusive.)

Internally, Manopt uses a number of tools to manipulate problem structures, solvers and manifolds. These tools are listed here. One central tool was already documented in the caching system description: the StoreDB class. Because the toolbox targets flexibility in the problem description, the cost, gradient, Hessian etc. can be specified in a number of different ways in a problem structure. Thus, to evaluate cost-related quantities, it is best to use the functions below, rather than to use fields in the problem structure directly. For example, call getCost rather than problem.cost.

These tools are mostly useful for solver and tool developers.

The inputs storedb and key are usually optional. It is a good idea to pass them if they are available, as this allows for caching to be used.

Functions called canGet*** return true if the problem structure provides sufficient information for Manopt to compute *** exactly; they return false otherwise. If false is returned, that does not imply a call to get*** will fail. For example, if the problem structure specifies the gradient via problem.grad but it does not provide the Hessian, there is not enough information to compute the exact Hessian. Hence, canGetHessian returns false. Yet, a call to getHessian does return something; namely, a finite difference approximation of the Hessian for the provided inputs. Typically, solver and tool developers call canGet*** functions to assess what can be done with the given problem structure, and issue appropriate warnings as needed; then proceed to call the get*** functions anyway. The general philosophy is that Manopt tries to do its best to answer the question asked (with the caveat that it might be slow or inaccurate.)


Call

Cost evaluations (and cost-related quantities)
  cost = getCost(problem, x, storedb, key)
  [cost, grad] = getCostGrad(problem, x, storedb, key)
  grad = getGradient(problem, x, storedb, key)
  agrad = getApproxGradient(problem, x, storedb, key)
  pgrad = getPartialGradient(problem, x, I, storedb, key)
  egrad = getEuclideanGradient(problem, x, storedb, key)
  pgrad = getPartialEuclideanGradient(problem, x, I, storedb, key)
  subgrad = getSubgradient(problem, x, tol, storedb, key)
  diff = getDirectionalDerivative(problem, x, d, storedb, key)
  hess = getHessian(problem, x, d, storedb, key)
  hessfd = getHessianFD(problem, x, d, storedb, key)
  approxhess = getApproxHessian(problem, x, d, storedb, key)
  t = getLinesearch(problem, x, d, storedb, key)
  Pd = getPrecon(problem, x, d, storedb, key)
  sqrtPd = getSqrtPrecon(problem, x, d, storedb, key)

Cost-related availability checks (checks whether the user explicitly specified these)
  candoit = canGetCost(problem)
  candoit = canGetDirectionalDerivative(problem)
  candoit = canGetGradient(problem)
  candoit = canGetApproxGradient(problem)
  candoit = canGetPartialGradient(problem)
  candoit = canGetEuclideanGradient(problem)
  candoit = canGetPartialEuclideanGradient(problem)
  candoit = canGetSubgradient(problem)
  candoit = canGetHessian(problem)
  candoit = canGetApproxHessian(problem)
  candoit = canGetPrecon(problem)
  candoit = canGetSqrtPrecon(problem)
  candoit = canGetLinesearch(problem)

Solver helpers
  stats = applyStatsfun(problem, x, storedb, key, options, stats)
  [stop, reason] = stoppingcriterion(problem, x, options, info, last)
  opts = getGlobalDefaults()
  opts = mergeOptions(opts1, opts2)

A reference is available here, to help navigate the source code of the toolbox. It is generated with m2html.

This reference is updated with each release. In between releases, the most up-to-date code can be browsed and downloaded on GitHub.