Trustregion methods form a widely used class of nonlinear optimization algorithms. They were generalized to the realm of optimization on Riemannian manifolds in [ABG07], see also [AMS08], specifically chapter 7. See [ABG07] for a listing and explanation of the algorithm.
The standard convergence guarantees from the Euclidean setting essentially carry over to the Riemannian setting: the method is globally convergent, that is, regardless of the initial guess, it will converge toward critical points. Furthermore, when the true Hessian is used, we get quadratic local convergence; otherwise, we still get linear local convergence (superlinear if the approximate Hessian is "good enough"). See [ABG07] for precise theorem statements and the assumptions that need to be verified (in particular, sufficient smoothness of both the manifold and the cost).
This trustregion method uses a truncated conjugategradient (tCG) method to solve the inner minimization problems. This inner solve can be preconditioned: simply provide a preconditioner in the cost function description.
The implementation in Manopt is an adapted and improved version of GenRTR, the original code by the authors of [ABG07]. By default, if no Hessian and no approximate Hessian are provided, Manopt will approximate the Hessian using finite differences of the gradient. Empirically, this is seen to perform excellently. The global convergence proof of the Riemannian trustregion method can be extended to guarantee global convergence with this nonlinear approximation of the Hessian too, see [Bou15], but there is currently no proof of superlinear convergence for this algorithm. You may disable this by providing an approximate Hessian (for example, the identity operator), or the Hessian of course. If you want more control over the approximation algorithm, you may specify an approximate Hessian in the cost description and perhaps start from the base code in manopt/solvers/hessianapproximations/approxhessianFD.m.
What should the problem structure specify about the cost function $f$?
How to call this solver?
The solver trustregions
is defined in the file manopt/solvers/trustregions/trustregions.m.
It may be called as follows:
[x_opt cost_opt info] = trustregions(problem)
[x_opt cost_opt info] = trustregions(problem, x0)
[x_opt cost_opt info] = trustregions(problem, [],
options)
[x_opt cost_opt info] = trustregions(problem, x0,
options)
The first input is a Manopt problem structure, describing the manifold and the cost. The second input (optional) is a point on the manifold from where the algorithm will start (initial guess). The third input (optional) is a structure specifying option values (see below).
For an indepth description of the algorithm, with convergence theory, see [ABG07] or [AMS08].
All options have default values, so that you seldom have to tweak
them. Nevertheless, from time to time, good tuning may bring about
nice performance boosts or better control. The tutorial
page documents generic options. Among those, the present
solver supports options."..."
:
verbosity
, debug
,
statsfun
maxiter
, maxtime
,
tolcost
,
tolgradnorm
, stopfun
storedepth
options
structure:
Field name (options."..." ) 
Value type  Description 
Solver specific  
Delta0 
double  Initial radius $\Delta_0$ of the trustregion. You should
not set this parameter without setting Delta_bar .

Delta_bar 
double  Maximum radius $\bar\Delta$ the trustregion can have. If
you specify this parameter but not Delta0 , then
Delta0 will be set to 1/8 times this parameter. 
useRand 
Boolean  Use randomization to try to escape saddle points (false by default; you should have a good reason to activate this). If activated, the preconditioner is not used. 
kappa 
double  Parameter $\kappa$ in the stopping criterion for the inner iterations (truncated CG). See [ABG07]. 
theta 
double  Parameter $0 < \theta \leq 1$ in the stopping criterion for the inner iterations (truncated CG). Set to 1.0 (default) to reach for a quadratic convergence (if the Hessian is provided). See [ABG07]. 
rho_prime 
double  Steps (at the outer iteration level) are only accepted if $\rho$ (see the Outputs section) is at least $\rho' \in [0, 1/4)$. Defaults to 0.1. Make it smaller to accept steps more often, and larger to reject steps more often. See [ABG07]. 
rho_regularization 
double  Close to convergence, evaluating the performance ratio $\rho$ is numerically challenging. Meanwhile, close to convergence, the quadratic model should be a good fit and the steps should be accepted. Regularization lets $\rho$ go to 1 as the model decrease and actual decrease go to zero. Set this option to zero to disable regularization (not recommended). See incode for the specifics. This heuristic is documented in the book by Conn Gould and Toint on trustregion methods, section 17.4.2. 
debug 
Integer  If set to 1 or more, will display the value of all options used and extra information at each iteration (the larger the number, the more information is displayed). 
Stopping criteria  
maxiter 
integer  Stop after this number of outer iterations. 
miniter 
integer  If useRand is true, do not stop before at least this many outer iterations were executed. 
maxinner 
integer  Maximum number of inner iterations (usually, the number of Hessian evaluations) per outer iteration. This option may be quite critical for the overall performance of the algorithm. Setting it too low can jeopardize the quadratic convergence of the algorithm, but setting too high can sometimes slow down the algorithm at unimportant stages of the optimization. 
mininner 
integer  Minimum number of inner iterations per outer iteration, unless negative curvature is detected or the trustregion is exceeded during the inner iterations. 
options.statsfun
is called with the point x
that was reached last,
after the accept/reject decision. Hence: if the step was accepted,
we get that new x
, with a store
which
only saw the call for the cost and for the gradient. If the step was
rejected, we get the same x
as previously, with the store
structure containing everything that was computed at that point
(possibly including previous rejects at that same point). Hence, statsfun
should not be used in conjunction with the store
to
count operations for example. Instead, you could use a global
variable and increment that variable directly from the cost related
functions. It is however possible to use statsfun
with
the store
to compute, for example, alternate merit
functions on the point x
.The first output is the best point $x$ found on the manifold. The
second output is the value of the cost function at that point. The
third output is a structure array containing information about the
iterations performed by the solver. The tutorial
page documents generic fields that may appear in the info
structure array. Among those, the present solver will populate:
iter
, time
, cost
, gradnorm
,options.statsfun
logged, if it was
provided.In addition,you will find this information:
Field name ([info."..."] ) 
Value type  Description 
rho 
double  Performance ratio $\rho_k$ of the iterate $k$: ratio between the actual cost improvement (as indicated by the cost function) and the expected cost improvement (as indicated by the quadratic model). 
rhonum , rhoden 
double  Regularized numerator and denominator of the performance ratio $\rho$, such that $\rho = \rho_{num}/\rho_{den}$. The numerator corresponds to the actual decrease and the denominator corresponds to the model decrease. 
accepted 
Boolean  Whether the iterate was accepted or rejected. 
stepsize 
double  The (Riemannian) norm of the vector returned by the inner
solver tCG and which is retracted to obtain the proposed
next iterate. If accepted is true for the
corresponding iterate, this is the size of the step from the
previous to the new iterate. If accepted is
false, the step was not executed and this is the size of the
rejected step. 
numinner 
integer  The number of inner iterations used to compute the next iterate. 
Delta 
double  The trustregion radius $\Delta_k$ at the outer iteration $k$. 
cauchy 
Boolean  (if useRand is true) indicates whether the Cauchy point was used at this iterate or not. 
Recall the structure array syntax in Matlab: [info.time]
will return a vector of all the times logged (mind the brackets).
rhoden
is theoretically always positive,
but finite numerical accuracy or a nonlinear / nonsymmetric
approximate Hessian operator (or its approximation) may jeopardize
that. When this happens, the step is rejected and the trustregion
radius is reduced. If this happens too often, this is the sign
that something is wrong with the Hessian or its approximation. You
may detect this either by inspecting the text output (look for the
message "model did not
decrease") or by looking for true values in [info.rhoden]
< 0
, and of course using the checkhessian
tool. If you did not provide a Hessian or approximation, then the
generic finitedifference approximation of it is in use, getHessianFD
.
It is nonlinear and may at times lead to negative rhoden
.
If this happens, you may either try to provide an alternative
Hessian operator, or try the conjugategradient
solver for example. if you decide to copy getHessianFD
to a new file to modify it, tuning the epsilon
parameter is the first thing to try.See the examples folder. Most examples use (or could use) this solver.