Trust-region methods form a widely used class of nonlinear optimization algorithms. They were generalized to the realm of optimization on Riemannian manifolds in [ABG07], see also [AMS08], specifically chapter 7. See [ABG07] for a listing and explanation of the algorithm.

The standard convergence guarantees from the Euclidean setting essentially carry over to the Riemannian setting: the method is globally convergent, that is, regardless of the initial guess, it will converge toward critical points. Furthermore, when the true Hessian is used, we get quadratic local convergence; otherwise, we still get linear local convergence (superlinear if the approximate Hessian is "good enough"). See [ABG07] for precise theorem statements and the assumptions that need to be verified (in particular, sufficient smoothness of both the manifold and the cost).

This trust-region method uses a truncated conjugate-gradient (tCG) method to solve the inner minimization problems. This inner solve can be preconditioned: simply provide a preconditioner in the cost function description.

The implementation in Manopt is an adapted and improved version of GenRTR, the original code by the authors of [ABG07]. By default, if no Hessian and no approximate Hessian are provided, Manopt will approximate the Hessian using finite differences of the gradient. Empirically, this is seen to perform excellently. The global convergence proof of the Riemannian trust-region method can be extended to guarantee global convergence with this nonlinear approximation of the Hessian too, see [Bou15], but there is currently no proof of superlinear convergence for this algorithm. You may disable this by providing an approximate Hessian (for example, the identity operator), or the Hessian of course. If you want more control over the approximation algorithm, you may specify an approximate Hessian in the cost description and perhaps start from the base code in manopt/solvers/hessianapproximations/approxhessianFD.m.

What should the problem structure specify about the cost function $f$?

• Required: the manifold $\mathcal{M}$, the cost $f(x)$ and the gradient $\operatorname{grad}f(x)$.
• Recommended: the Hessian $\operatorname{Hess}f(x)[u]$, or approximate Hessian.
• Can help: a preconditioner $\operatorname{Prec}f(x)[u]$ (a symmetric, positive definite operator that should approximate the inverse of the Hessian).

How to call this solver?

The solver trustregions is defined in the file manopt/solvers/trustregions/trustregions.m. It may be called as follows:

• [x_opt cost_opt info] = trustregions(problem)
• [x_opt cost_opt info] = trustregions(problem, x0)
• [x_opt cost_opt info] = trustregions(problem, [], options)
• [x_opt cost_opt info] = trustregions(problem, x0, options)

The first input is a Manopt problem structure, describing the manifold and the cost. The second input (optional) is a point on the manifold from where the algorithm will start (initial guess). The third input (optional) is a structure specifying option values (see below).

For an in-depth description of the algorithm, with convergence theory, see [ABG07] or [AMS08].

All options have default values, so that you seldom have to tweak them. Nevertheless, from time to time, good tuning may bring about nice performance boosts or better control. The tutorial page documents generic options. Among those, the present solver supports options."...":

• Output and logging: verbosity, debug, statsfun
• Stopping criteria: maxiter, maxtime, tolcost, tolgradnorm, stopfun
• Misc.: storedepth
In addition, you may add the following specific options to the options structure:
 Field name (options."...") Value type Description Solver specific Delta0 double Initial radius $\Delta_0$ of the trust-region. You should not set this parameter without setting Delta_bar. Delta_bar double Maximum radius $\bar\Delta$ the trust-region can have. If you specify this parameter but not Delta0, then Delta0 will be set to 1/8 times this parameter. useRand Boolean Use randomization to try to escape saddle points (false by default; you should have a good reason to activate this). If activated, the preconditioner is not used. kappa double Parameter $\kappa$ in the stopping criterion for the inner iterations (truncated CG). See [ABG07]. theta double Parameter $0 < \theta \leq 1$ in the stopping criterion for the inner iterations (truncated CG). Set to 1.0 (default) to reach for a quadratic convergence (if the Hessian is provided). See [ABG07]. rho_prime double Steps (at the outer iteration level) are only accepted if $\rho$ (see the Outputs section) is at least $\rho' \in [0, 1/4)$. Defaults to 0.1. Make it smaller to accept steps more often, and larger to reject steps more often. See [ABG07]. rho_regularization double Close to convergence, evaluating the performance ratio $\rho$ is numerically challenging. Meanwhile, close to convergence, the quadratic model should be a good fit and the steps should be accepted. Regularization lets $\rho$ go to 1 as the model decrease and actual decrease go to zero. Set this option to zero to disable regularization (not recommended). See in-code for the specifics. This heuristic is documented in the book by Conn Gould and Toint on trust-region methods, section 17.4.2. debug Integer If set to 1 or more, will display the value of all options used and extra information at each iteration (the larger the number, the more information is displayed). Stopping criteria maxiter integer Stop after this number of outer iterations. miniter integer If useRand is true, do not stop before at least this many outer iterations were executed. maxinner integer Maximum number of inner iterations (usually, the number of Hessian evaluations) per outer iteration. This option may be quite critical for the overall performance of the algorithm. Setting it too low can jeopardize the quadratic convergence of the algorithm, but setting too high can sometimes slow down the algorithm at unimportant stages of the optimization. mininner integer Minimum number of inner iterations per outer iteration, unless negative curvature is detected or the trust-region is exceeded during the inner iterations.
Heads up! options.statsfun is called with the point x that was reached last, after the accept/reject decision. Hence: if the step was accepted, we get that new x, with a store which only saw the call for the cost and for the gradient. If the step was rejected, we get the same x as previously, with the store structure containing everything that was computed at that point (possibly including previous rejects at that same point). Hence, statsfun should not be used in conjunction with the store to count operations for example. Instead, you could use a global variable and increment that variable directly from the cost related functions. It is however possible to use statsfun with the store to compute, for example, alternate merit functions on the point x.

The first output is the best point $x$ found on the manifold. The second output is the value of the cost function at that point. The third output is a structure array containing information about the iterations performed by the solver. The tutorial page documents generic fields that may appear in the info structure array. Among those, the present solver will populate:

• iter, time, cost, gradnorm,
• and whatever options.statsfun logged, if it was provided.

In addition,you will find this information:

 Field name ([info."..."]) Value type Description rho double Performance ratio $\rho_k$ of the iterate $k$: ratio between the actual cost improvement (as indicated by the cost function) and the expected cost improvement (as indicated by the quadratic model). rhonum, rhoden double Regularized numerator and denominator of the performance ratio $\rho$, such that $\rho = \rho_{num}/\rho_{den}$. The numerator corresponds to the actual decrease and the denominator corresponds to the model decrease. accepted Boolean Whether the iterate was accepted or rejected. stepsize double The (Riemannian) norm of the vector returned by the inner solver tCG and which is retracted to obtain the proposed next iterate. If accepted is true for the corresponding iterate, this is the size of the step from the previous to the new iterate. If accepted is false, the step was not executed and this is the size of the rejected step. numinner integer The number of inner iterations used to compute the next iterate. Delta double The trust-region radius $\Delta_k$ at the outer iteration $k$. cauchy Boolean (if useRand is true) indicates whether the Cauchy point was used at this iterate or not.

Recall the structure array syntax in Matlab: [info.time] will return a vector of all the times logged (mind the brackets).

Heads up! The model decrease rhoden is theoretically always positive, but finite numerical accuracy or a nonlinear / non-symmetric approximate Hessian operator (or its approximation) may jeopardize that. When this happens, the step is rejected and the trust-region radius is reduced. If this happens too often, this is the sign that something is wrong with the Hessian or its approximation. You may detect this either by inspecting the text output (look for the message "model did not decrease") or by looking for true values in [info.rhoden] < 0, and of course using the checkhessian tool. If you did not provide a Hessian or approximation, then the generic finite-difference approximation of it is in use, getHessianFD. It is nonlinear and may at times lead to negative rhoden. If this happens, you may either try to provide an alternative Hessian operator, or try the conjugategradient solver for example. if you decide to copy getHessianFD to a new file to modify it, tuning the epsilon parameter is the first thing to try.

See the examples folder. Most examples use (or could use) this solver.