Home > manopt > solvers > trustregions > trustregions.m

# trustregions

## PURPOSE Riemannian trust-regions solver for optimization on manifolds.

## SYNOPSIS function [x, cost, info, options] = trustregions(problem, x, options)

## DESCRIPTION ``` Riemannian trust-regions solver for optimization on manifolds.

function [x, cost, info, options] = trustregions(problem)
function [x, cost, info, options] = trustregions(problem, x0)
function [x, cost, info, options] = trustregions(problem, x0, options)
function [x, cost, info, options] = trustregions(problem, [], options)

This is the Riemannian Trust-Region solver (with tCG inner solve), named
RTR. This solver will attempt to minimize the cost function described in
the problem structure. It requires the availability of the cost function
and of its gradient. It will issue calls for the Hessian. If no Hessian
nor approximate Hessian is provided, a standard approximation of the
Hessian based on the gradient will be computed. If a preconditioner for
the Hessian is provided, it will be used.

If no gradient is provided, an approximation of the gradient is computed,
but this can be slow for manifolds of high dimension.

For a description of the algorithm and theorems offering convergence
guarantees, see the references below. Documentation for this solver is
available online at:

http://www.manopt.org/solver_documentation_trustregions.html

The initial iterate is x0 if it is provided. Otherwise, a random point on
the manifold is picked. To specify options whilst not specifying an
initial iterate, give x0 as [] (the empty matrix).

The two outputs 'x' and 'cost' are the last reached point on the manifold
and its cost. Notice that x is not necessarily the best reached point,
because this solver is not forced to be a descent method. In particular,
very close to convergence, it is sometimes preferable to accept very
slight increases in the cost value (on the order of the machine epsilon)
in the process of reaching fine convergence.

The output 'info' is a struct-array which contains information about the
iterations:
iter (integer)
The (outer) iteration number, or number of steps considered
(whether accepted or rejected). The initial guess is 0.
cost (double)
The corresponding cost value.
The (Riemannian) norm of the gradient.
numinner (integer)
The number of inner iterations executed to compute this iterate.
Inner iterations are truncated-CG steps. Each one requires a
Hessian (or approximate Hessian) evaluation.
time (double)
The total elapsed time in seconds to reach the corresponding cost.
rho (double)
The performance ratio for the iterate.
rhonum, rhoden (double)
Regularized numerator and denominator of the performance ratio:
rho = rhonum/rhoden. See options.rho_regularization.
accepted (boolean)
Whether the proposed iterate was accepted or not.
stepsize (double)
The (Riemannian) norm of the vector returned by the inner solver
tCG and which is retracted to obtain the proposed next iterate. If
accepted = true for the corresponding iterate, this is the size of
the step from the previous to the new iterate. If accepted is
false, the step was not executed and this is the size of the
rejected step.
Delta (double)
The trust-region radius at the outer iteration.
cauchy (boolean)
Whether the Cauchy point was used or not (if useRand is true).
And possibly additional information logged by options.statsfun.
For example, type [info.gradnorm] to obtain a vector of the successive
gradient norms reached at each (outer) iteration.

The options structure is used to overwrite the default values. All
options have a default value and are hence optional. To force an option
value, pass an options structure with a field options.optionname, where
optionname is one of the following and the default value is indicated
between parentheses:

The algorithm terminates if the norm of the gradient drops below
this. For well-scaled problems, a rule of thumb is that you can
expect to reduce the gradient norm by 8 orders of magnitude
(sqrt(eps)) compared to the gradient norm at a "typical" point (a
rough initial iterate for example). Further decrease is sometimes
possible, but inexact floating point arithmetic will eventually
limit the final accuracy. If tolgradnorm is set too low, the
algorithm may end up iterating forever (or at least until another
stopping criterion triggers).
maxiter (1000)
The algorithm terminates if maxiter (outer) iterations were executed.
maxtime (Inf)
The algorithm terminates if maxtime seconds elapsed.
miniter (3)
Minimum number of outer iterations (used only if useRand is true).
mininner (1)
Minimum number of inner iterations (for tCG).
maxinner (problem.M.dim() : the manifold's dimension)
Maximum number of inner iterations (for tCG).
Delta_bar (problem.M.typicaldist() or sqrt(problem.M.dim()))
Maximum trust-region radius. If you specify this parameter but not
Delta0, then Delta0 will be set to 1/8 times this parameter.
Delta0 (Delta_bar/8)
Initial trust-region radius. If you observe a long plateau at the
beginning of the convergence plot (gradient norm VS iteration), it
may pay off to try to tune this parameter to shorten the plateau.
You should not set this parameter without setting Delta_bar too (at
a larger value).
useRand (false)
Set to true if the trust-region solve is to be initiated with a
random tangent vector. If set to true, no preconditioner will be
used. This option is set to true in some scenarios to escape saddle
points, but is otherwise seldom activated.
kappa (0.1)
tCG inner kappa convergence tolerance.
kappa > 0 is the linear convergence target rate: tCG will terminate
early if the residual was reduced by a factor of kappa.
theta (1.0)
tCG inner theta convergence tolerance.
1+theta (theta between 0 and 1) is the superlinear convergence
target rate. tCG will terminate early if the residual was reduced
by a power of 1+theta.
rho_prime (0.1)
Accept/reject threshold : if rho is at least rho_prime, the outer
iteration is accepted. Otherwise, it is rejected. In case it is
rejected, the trust-region radius will have been decreased.
To ensure this, rho_prime >= 0 must be strictly smaller than 1/4.
If rho_prime is negative, the algorithm is not guaranteed to
produce monotonically decreasing cost values. It is strongly
recommended to set rho_prime > 0, to aid convergence.
rho_regularization (1e3)
Close to convergence, evaluating the performance ratio rho is
numerically challenging. Meanwhile, close to convergence, the
quadratic model should be a good fit and the steps should be
accepted. Regularization lets rho go to 1 as the model decrease and
the actual decrease go to zero. Set this option to zero to disable
regularization (not recommended). See in-code for the specifics.
When this is not zero, it may happen that the iterates produced are
not monotonically improving the cost when very close to
convergence. This is because the corrected cost improvement could
change sign if it is negative but very small.
statsfun (none)
Function handle to a function that will be called after each
iteration to provide the opportunity to log additional statistics.
They will be returned in the info struct. See the generic Manopt
documentation about solvers for further information. statsfun is
called with the point x that was reached last, after the
accept/reject decision. See comment below.
stopfun (none)
Function handle to a function that will be called at each iteration
to provide the opportunity to specify additional stopping criteria.
See the generic Manopt documentation about solvers for further
information.
verbosity (2)
Integer number used to tune the amount of output the algorithm
generates during execution (mostly as text in the command window).
The higher, the more output. 0 means silent. 3 and above includes a
display of the options structure at the beginning of the execution.
debug (false)
Set to true to allow the algorithm to perform additional
computations for debugging purposes. If a debugging test fails, you
will be informed of it, usually via the command window. Be aware
that these additional computations appear in the algorithm timings
too, and may interfere with operations such as counting the number
of cost evaluations, etc. (the debug calls get storedb too).
storedepth (2)
Maximum number of different points x of the manifold for which a
store structure will be kept in memory in the storedb for caching.
If memory usage is an issue, you may try to lower this number.
Profiling or manopt counters may then help to investigate if a
performance hit was incurred as a result.
hook (none)
A function handle which allows the user to change the current point
x at the beginning of each iteration, before the stopping criterion
is evaluated. See applyHook for help on how to use this option.

Notice that statsfun is called with the point x that was reached last,
after the accept/reject decision. Hence: if the step was accepted, we get
that new x, with a store which only saw the call for the cost and for the
gradient. If the step was rejected, we get the same x as previously, with
the store structure containing everything that was computed at that point
(possibly including previous rejects at that same point). Hence, statsfun
should not be used in conjunction with the store to count operations for
example. Instead, you should use manopt counters: see statscounters.

Please cite the Manopt paper as well as the research paper:
@Article{genrtr,
Title    = {Trust-region methods on {Riemannian} manifolds},
Author   = {Absil, P.-A. and Baker, C. G. and Gallivan, K. A.},
Journal  = {Foundations of Computational Mathematics},
Year     = {2007},
Number   = {3},
Pages    = {303--330},
Volume   = {7},
Doi      = {10.1007/s10208-005-0179-9}
}

## CROSS-REFERENCE INFORMATION This function calls:
• StoreDB
• applyHook Apply the hook function to possibly replace the current x (for solvers).
• applyStatsfun Apply the statsfun function to a stats structure (for solvers).
• canGetApproxGradient Checks whether an approximate gradient can be computed for this problem.
• canGetApproxHessian Checks whether an approximate Hessian can be computed for this problem.
• canGetCost Checks whether the cost function can be computed for a problem structure.
• canGetGradient Checks whether the gradient can be computed for a problem structure.
• canGetHessian Checks whether the Hessian can be computed for a problem structure.
• getCost Computes the cost function at x.
• getCostGrad Computes the cost function and the gradient at x in one call if possible.
• getDirectionalDerivative Computes the directional derivative of the cost function at x along d.
• getGlobalDefaults Returns a structure with default option values for Manopt.
• getHessian Computes the Hessian of the cost function at x along d.
• mergeOptions Merges two options structures with one having precedence over the other.
• stoppingcriterion Checks for standard stopping criteria, as a helper to solvers.
• disp DISP Display TT/MPS tensor.
• disp DISP Display TT/MPS block-mu tensor.
• disp DISP Display TT/MPS operator.
• disp DISP Display TT/MPS operator.
• approxgradientFD Gradient approx. fnctn handle based on finite differences of the cost.
• approxhessianFD Hessian approx. fnctn handle based on finite differences of the gradient.
• tCG tCG - Truncated (Steihaug-Toint) Conjugate-Gradient method
This function is called by:

## SOURCE CODE ```0001 function [x, cost, info, options] = trustregions(problem, x, options)
0002 % Riemannian trust-regions solver for optimization on manifolds.
0003 %
0004 % function [x, cost, info, options] = trustregions(problem)
0005 % function [x, cost, info, options] = trustregions(problem, x0)
0006 % function [x, cost, info, options] = trustregions(problem, x0, options)
0007 % function [x, cost, info, options] = trustregions(problem, [], options)
0008 %
0009 % This is the Riemannian Trust-Region solver (with tCG inner solve), named
0010 % RTR. This solver will attempt to minimize the cost function described in
0011 % the problem structure. It requires the availability of the cost function
0012 % and of its gradient. It will issue calls for the Hessian. If no Hessian
0013 % nor approximate Hessian is provided, a standard approximation of the
0014 % Hessian based on the gradient will be computed. If a preconditioner for
0015 % the Hessian is provided, it will be used.
0016 %
0017 % If no gradient is provided, an approximation of the gradient is computed,
0018 % but this can be slow for manifolds of high dimension.
0019 %
0020 % For a description of the algorithm and theorems offering convergence
0021 % guarantees, see the references below. Documentation for this solver is
0022 % available online at:
0023 %
0024 % http://www.manopt.org/solver_documentation_trustregions.html
0025 %
0026 %
0027 % The initial iterate is x0 if it is provided. Otherwise, a random point on
0028 % the manifold is picked. To specify options whilst not specifying an
0029 % initial iterate, give x0 as [] (the empty matrix).
0030 %
0031 % The two outputs 'x' and 'cost' are the last reached point on the manifold
0032 % and its cost. Notice that x is not necessarily the best reached point,
0033 % because this solver is not forced to be a descent method. In particular,
0034 % very close to convergence, it is sometimes preferable to accept very
0035 % slight increases in the cost value (on the order of the machine epsilon)
0036 % in the process of reaching fine convergence.
0037 %
0038 % The output 'info' is a struct-array which contains information about the
0039 % iterations:
0040 %   iter (integer)
0041 %       The (outer) iteration number, or number of steps considered
0042 %       (whether accepted or rejected). The initial guess is 0.
0043 %   cost (double)
0044 %       The corresponding cost value.
0046 %       The (Riemannian) norm of the gradient.
0047 %   numinner (integer)
0048 %       The number of inner iterations executed to compute this iterate.
0049 %       Inner iterations are truncated-CG steps. Each one requires a
0050 %       Hessian (or approximate Hessian) evaluation.
0051 %   time (double)
0052 %       The total elapsed time in seconds to reach the corresponding cost.
0053 %   rho (double)
0054 %       The performance ratio for the iterate.
0055 %   rhonum, rhoden (double)
0056 %       Regularized numerator and denominator of the performance ratio:
0057 %       rho = rhonum/rhoden. See options.rho_regularization.
0058 %   accepted (boolean)
0059 %       Whether the proposed iterate was accepted or not.
0060 %   stepsize (double)
0061 %       The (Riemannian) norm of the vector returned by the inner solver
0062 %       tCG and which is retracted to obtain the proposed next iterate. If
0063 %       accepted = true for the corresponding iterate, this is the size of
0064 %       the step from the previous to the new iterate. If accepted is
0065 %       false, the step was not executed and this is the size of the
0066 %       rejected step.
0067 %   Delta (double)
0068 %       The trust-region radius at the outer iteration.
0069 %   cauchy (boolean)
0070 %       Whether the Cauchy point was used or not (if useRand is true).
0071 %   And possibly additional information logged by options.statsfun.
0072 % For example, type [info.gradnorm] to obtain a vector of the successive
0073 % gradient norms reached at each (outer) iteration.
0074 %
0075 % The options structure is used to overwrite the default values. All
0076 % options have a default value and are hence optional. To force an option
0077 % value, pass an options structure with a field options.optionname, where
0078 % optionname is one of the following and the default value is indicated
0079 % between parentheses:
0080 %
0082 %       The algorithm terminates if the norm of the gradient drops below
0083 %       this. For well-scaled problems, a rule of thumb is that you can
0084 %       expect to reduce the gradient norm by 8 orders of magnitude
0085 %       (sqrt(eps)) compared to the gradient norm at a "typical" point (a
0086 %       rough initial iterate for example). Further decrease is sometimes
0087 %       possible, but inexact floating point arithmetic will eventually
0088 %       limit the final accuracy. If tolgradnorm is set too low, the
0089 %       algorithm may end up iterating forever (or at least until another
0090 %       stopping criterion triggers).
0091 %   maxiter (1000)
0092 %       The algorithm terminates if maxiter (outer) iterations were executed.
0093 %   maxtime (Inf)
0094 %       The algorithm terminates if maxtime seconds elapsed.
0095 %   miniter (3)
0096 %       Minimum number of outer iterations (used only if useRand is true).
0097 %   mininner (1)
0098 %       Minimum number of inner iterations (for tCG).
0099 %   maxinner (problem.M.dim() : the manifold's dimension)
0100 %       Maximum number of inner iterations (for tCG).
0101 %   Delta_bar (problem.M.typicaldist() or sqrt(problem.M.dim()))
0102 %       Maximum trust-region radius. If you specify this parameter but not
0103 %       Delta0, then Delta0 will be set to 1/8 times this parameter.
0104 %   Delta0 (Delta_bar/8)
0105 %       Initial trust-region radius. If you observe a long plateau at the
0106 %       beginning of the convergence plot (gradient norm VS iteration), it
0107 %       may pay off to try to tune this parameter to shorten the plateau.
0108 %       You should not set this parameter without setting Delta_bar too (at
0109 %       a larger value).
0110 %   useRand (false)
0111 %       Set to true if the trust-region solve is to be initiated with a
0112 %       random tangent vector. If set to true, no preconditioner will be
0113 %       used. This option is set to true in some scenarios to escape saddle
0114 %       points, but is otherwise seldom activated.
0115 %   kappa (0.1)
0116 %       tCG inner kappa convergence tolerance.
0117 %       kappa > 0 is the linear convergence target rate: tCG will terminate
0118 %       early if the residual was reduced by a factor of kappa.
0119 %   theta (1.0)
0120 %       tCG inner theta convergence tolerance.
0121 %       1+theta (theta between 0 and 1) is the superlinear convergence
0122 %       target rate. tCG will terminate early if the residual was reduced
0123 %       by a power of 1+theta.
0124 %   rho_prime (0.1)
0125 %       Accept/reject threshold : if rho is at least rho_prime, the outer
0126 %       iteration is accepted. Otherwise, it is rejected. In case it is
0127 %       rejected, the trust-region radius will have been decreased.
0128 %       To ensure this, rho_prime >= 0 must be strictly smaller than 1/4.
0129 %       If rho_prime is negative, the algorithm is not guaranteed to
0130 %       produce monotonically decreasing cost values. It is strongly
0131 %       recommended to set rho_prime > 0, to aid convergence.
0132 %   rho_regularization (1e3)
0133 %       Close to convergence, evaluating the performance ratio rho is
0134 %       numerically challenging. Meanwhile, close to convergence, the
0135 %       quadratic model should be a good fit and the steps should be
0136 %       accepted. Regularization lets rho go to 1 as the model decrease and
0137 %       the actual decrease go to zero. Set this option to zero to disable
0138 %       regularization (not recommended). See in-code for the specifics.
0139 %       When this is not zero, it may happen that the iterates produced are
0140 %       not monotonically improving the cost when very close to
0141 %       convergence. This is because the corrected cost improvement could
0142 %       change sign if it is negative but very small.
0143 %   statsfun (none)
0144 %       Function handle to a function that will be called after each
0145 %       iteration to provide the opportunity to log additional statistics.
0146 %       They will be returned in the info struct. See the generic Manopt
0147 %       documentation about solvers for further information. statsfun is
0148 %       called with the point x that was reached last, after the
0149 %       accept/reject decision. See comment below.
0150 %   stopfun (none)
0151 %       Function handle to a function that will be called at each iteration
0152 %       to provide the opportunity to specify additional stopping criteria.
0153 %       See the generic Manopt documentation about solvers for further
0154 %       information.
0155 %   verbosity (2)
0156 %       Integer number used to tune the amount of output the algorithm
0157 %       generates during execution (mostly as text in the command window).
0158 %       The higher, the more output. 0 means silent. 3 and above includes a
0159 %       display of the options structure at the beginning of the execution.
0160 %   debug (false)
0161 %       Set to true to allow the algorithm to perform additional
0162 %       computations for debugging purposes. If a debugging test fails, you
0163 %       will be informed of it, usually via the command window. Be aware
0164 %       that these additional computations appear in the algorithm timings
0165 %       too, and may interfere with operations such as counting the number
0166 %       of cost evaluations, etc. (the debug calls get storedb too).
0167 %   storedepth (2)
0168 %       Maximum number of different points x of the manifold for which a
0169 %       store structure will be kept in memory in the storedb for caching.
0170 %       If memory usage is an issue, you may try to lower this number.
0171 %       Profiling or manopt counters may then help to investigate if a
0172 %       performance hit was incurred as a result.
0173 %   hook (none)
0174 %       A function handle which allows the user to change the current point
0175 %       x at the beginning of each iteration, before the stopping criterion
0176 %       is evaluated. See applyHook for help on how to use this option.
0177 %
0178 % Notice that statsfun is called with the point x that was reached last,
0179 % after the accept/reject decision. Hence: if the step was accepted, we get
0180 % that new x, with a store which only saw the call for the cost and for the
0181 % gradient. If the step was rejected, we get the same x as previously, with
0182 % the store structure containing everything that was computed at that point
0183 % (possibly including previous rejects at that same point). Hence, statsfun
0184 % should not be used in conjunction with the store to count operations for
0185 % example. Instead, you should use manopt counters: see statscounters.
0186 %
0187 %
0188 % Please cite the Manopt paper as well as the research paper:
0189 %     @Article{genrtr,
0190 %       Title    = {Trust-region methods on {Riemannian} manifolds},
0191 %       Author   = {Absil, P.-A. and Baker, C. G. and Gallivan, K. A.},
0192 %       Journal  = {Foundations of Computational Mathematics},
0193 %       Year     = {2007},
0194 %       Number   = {3},
0195 %       Pages    = {303--330},
0196 %       Volume   = {7},
0197 %       Doi      = {10.1007/s10208-005-0179-9}
0198 %     }
0199 %
0201
0202 % An explicit, general listing of this algorithm, with preconditioning,
0203 % can be found in the following paper:
0204 %     @Article{boumal2015lowrank,
0205 %       Title   = {Low-rank matrix completion via preconditioned optimization on the {G}rassmann manifold},
0206 %       Author  = {Boumal, N. and Absil, P.-A.},
0207 %       Journal = {Linear Algebra and its Applications},
0208 %       Year    = {2015},
0209 %       Pages   = {200--239},
0210 %       Volume  = {475},
0211 %       Doi     = {10.1016/j.laa.2015.02.027},
0212 %     }
0213
0214 % When the Hessian is not specified, it is approximated with
0215 % finite-differences of the gradient. The resulting method is called
0216 % RTR-FD. Some convergence theory for it is available in this paper:
0217 % @incollection{boumal2015rtrfd
0218 %    author={Boumal, N.},
0219 %    title={Riemannian trust regions with finite-difference Hessian approximations are globally convergent},
0220 %    year={2015},
0221 %    booktitle={Geometric Science of Information}
0222 % }
0223
0224
0225 % This file is part of Manopt: www.manopt.org.
0226 % This code is an adaptation to Manopt of the original GenRTR code:
0227 % RTR - Riemannian Trust-Region
0228 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan
0229 % Florida State University
0230 % School of Computational Science
0232 % See accompanying license file.
0233 % The adaptation was executed by Nicolas Boumal.
0234 %
0235 %
0236 % Change log:
0237 %
0238 %   NB April 3, 2013:
0239 %       tCG now returns the Hessian along the returned direction eta, so
0240 %       that we do not compute that Hessian redundantly: some savings at
0241 %       each iteration. Similarly, if the useRand flag is on, we spare an
0242 %       extra Hessian computation at each outer iteration too, owing to
0243 %       some modifications in the Cauchy point section of the code specific
0244 %       to useRand = true.
0245 %
0246 %   NB Aug. 22, 2013:
0247 %       This function is now Octave compatible. The transition called for
0248 %       two changes which would otherwise not be advisable. (1) tic/toc is
0249 %       now used as is, as opposed to the safer way:
0250 %       t = tic(); elapsed = toc(t);
0251 %       And (2), the (formerly inner) function savestats was moved outside
0252 %       the main function to not be nested anymore. This is arguably less
0253 %       elegant, but Octave does not (and likely will not) support nested
0254 %       functions.
0255 %
0256 %   NB Dec. 2, 2013:
0257 %       The in-code documentation was largely revised and expanded.
0258 %
0259 %   NB Dec. 2, 2013:
0260 %       The former heuristic which triggered when rhonum was very small and
0261 %       forced rho = 1 has been replaced by a smoother heuristic which
0262 %       consists in regularizing rhonum and rhoden before computing their
0263 %       ratio. It is tunable via options.rho_regularization. Furthermore,
0264 %       the solver now detects if tCG did not obtain a model decrease
0265 %       (which is theoretically impossible but may happen because of
0266 %       numerical errors and/or because of a nonlinear/nonsymmetric Hessian
0267 %       operator, which is the case for finite difference approximations).
0268 %       When such an anomaly is detected, the step is rejected and the
0269 %       trust region radius is decreased.
0270 %       Feb. 18, 2015 note: this is less useful now, as tCG now guarantees
0271 %       model decrease even for the finite difference approximation of the
0272 %       Hessian. It is still useful in case of numerical errors, but this
0273 %       is less stringent.
0274 %
0275 %   NB Dec. 3, 2013:
0276 %       The stepsize is now registered at each iteration, at a small
0277 %       additional cost. The defaults for Delta_bar and Delta0 are better
0278 %       defined. Setting Delta_bar in the options will automatically set
0279 %       Delta0 accordingly. In Manopt 1.0.4, the defaults for these options
0280 %       were not treated appropriately because of an incorrect use of the
0281 %       isfield() built-in function.
0282 %
0283 %   NB Feb. 18, 2015:
0284 %       Added some comments. Also, Octave now supports safe tic/toc usage,
0285 %       so we reverted the changes to use that again (see Aug. 22, 2013 log
0286 %       entry).
0287 %
0288 %   NB April 3, 2015:
0289 %       Works with the new StoreDB class system.
0290 %
0291 %   NB April 8, 2015:
0292 %       No Hessian warning if approximate Hessian explicitly available.
0293 %
0294 %   NB Nov. 1, 2016:
0295 %       Now uses approximate gradient via finite differences if need be.
0296 %
0297 %   NB Aug. 2, 2018:
0298 %       Using storedb.remove() to keep the cache lean, which allowed to
0299 %       reduce storedepth to 2 from 20 (by default).
0300 %
0301 %   NB July 19, 2020:
0302 %       Added support for options.hook.
0303
0304 M = problem.M;
0305
0306 % Verify that the problem description is sufficient for the solver.
0307 if ~canGetCost(problem)
0308     warning('manopt:getCost', ...
0309             'No cost provided. The algorithm will likely abort.');
0310 end
0312     % Note: we do not give a warning if an approximate gradient is
0313     % explicitly given in the problem description, as in that case the user
0314     % seems to be aware of the issue.
0317             'It may be necessary to increase options.tolgradnorm.\n' ...
0318             'To disable this warning: warning(''off'', ''manopt:getGradient:approx'')']);
0320 end
0321 if ~canGetHessian(problem) && ~canGetApproxHessian(problem)
0322     % Note: we do not give a warning if an approximate Hessian is
0323     % explicitly given in the problem description, as in that case the user
0324     % seems to be aware of the issue.
0325     warning('manopt:getHessian:approx', ...
0326            ['No Hessian provided. Using an FD approximation instead.\n' ...
0327             'To disable this warning: warning(''off'', ''manopt:getHessian:approx'')']);
0328     problem.approxhess = approxhessianFD(problem);
0329 end
0330
0331 % Define some strings for display
0332 tcg_stop_reason = {'negative curvature',...
0333                    'exceeded trust region',...
0334                    'reached target residual-kappa (linear)',...
0335                    'reached target residual-theta (superlinear)',...
0336                    'maximum inner iterations',...
0337                    'model increased'};
0338
0339 % Set local defaults here
0340 localdefaults.verbosity = 2;
0341 localdefaults.maxtime = inf;
0342 localdefaults.miniter = 3;
0343 localdefaults.maxiter = 1000;
0344 localdefaults.mininner = 1;
0345 localdefaults.maxinner = M.dim();
0347 localdefaults.kappa = 0.1;
0348 localdefaults.theta = 1.0;
0349 localdefaults.rho_prime = 0.1;
0350 localdefaults.useRand = false;
0351 localdefaults.rho_regularization = 1e3;
0352
0353 % Merge global and local defaults, then merge w/ user options, if any.
0354 localdefaults = mergeOptions(getGlobalDefaults(), localdefaults);
0355 if ~exist('options', 'var') || isempty(options)
0356     options = struct();
0357 end
0358 options = mergeOptions(localdefaults, options);
0359
0360 % Set default Delta_bar and Delta0 separately to deal with additional
0361 % logic: if Delta_bar is provided but not Delta0, let Delta0 automatically
0362 % be some fraction of the provided Delta_bar.
0363 if ~isfield(options, 'Delta_bar')
0364     if isfield(M, 'typicaldist')
0365         options.Delta_bar = M.typicaldist();
0366     else
0367         options.Delta_bar = sqrt(M.dim());
0368     end
0369 end
0370 if ~isfield(options,'Delta0')
0371     options.Delta0 = options.Delta_bar / 8;
0372 end
0373
0374 % Check some option values
0375 assert(options.rho_prime < 1/4, ...
0376         'options.rho_prime must be strictly smaller than 1/4.');
0377 assert(options.Delta_bar > 0, ...
0378         'options.Delta_bar must be positive.');
0379 assert(options.Delta0 > 0 && options.Delta0 <= options.Delta_bar, ...
0380         'options.Delta0 must be positive and smaller than Delta_bar.');
0381
0382 % It is sometimes useful to check what the actual option values are.
0383 if options.verbosity >= 3
0384     disp(options);
0385 end
0386
0387 ticstart = tic();
0388
0389 % If no initial point x is given by the user, generate one at random.
0390 if ~exist('x', 'var') || isempty(x)
0391     x = M.rand();
0392 end
0393
0394 % Create a store database and get a key for the current x
0395 storedb = StoreDB(options.storedepth);
0396 key = storedb.getNewKey();
0397
0398 %% Initializations
0399
0400 % k counts the outer (TR) iterations. The semantic is that k counts the
0401 % number of iterations fully executed so far.
0402 k = 0;
0403
0404 % Initialize solution and companion measures: f(x), fgrad(x)
0407
0409 Delta = options.Delta0;
0410
0411 % Save stats in a struct array info, and preallocate.
0412 if ~exist('used_cauchy', 'var')
0413     used_cauchy = [];
0414 end
0415 stats = savestats(problem, x, storedb, key, options, k, fx, norm_grad, Delta, ticstart);
0416 info(1) = stats;
0417 info(min(10000, options.maxiter+1)).iter = [];
0418
0419 % ** Display:
0420 if options.verbosity == 2
0421    fprintf(['%3s %3s      %5s                %5s     ',...
0423            '   ','   ','     ','     ', fx, norm_grad);
0424 elseif options.verbosity > 2
0425    fprintf('************************************************************************\n');
0426    fprintf('%3s %3s    k: %5s     num_inner: %5s     %s\n',...
0427            '','','______','______','');
0429    fprintf('      Delta : %f\n', Delta);
0430 end
0431
0432 % To keep track of consecutive radius changes, so that we can warn the
0433 % user if it appears necessary.
0434 consecutive_TRplus = 0;
0435 consecutive_TRminus = 0;
0436
0437
0438 % **********************
0439 % ** Start of TR loop **
0440 % **********************
0441 while true
0442
0443     % Start clock for this outer iteration
0444     ticstart = tic();
0445
0446     % Apply the hook function if there is one: this allows external code to
0447     % move x to another point. If the point is changed (indicated by a true
0448     % value for the boolean 'hooked'), we update our knowledge about x.
0449     [x, key, info, hooked] = applyHook(problem, x, storedb, key, options, info, k+1);
0450     if hooked
0453     end
0454
0455     % Run standard stopping criterion checks
0456     [stop, reason] = stoppingcriterion(problem, x, options, info, k+1);
0457
0458     % If the stopping criterion that triggered is the tolerance on the
0459     % gradient norm but we are using randomization, make sure we make at
0460     % least miniter iterations to give randomization a chance at escaping
0462     if stop == 2 && options.useRand && k < options.miniter
0463         stop = 0;
0464     end
0465
0466     if stop
0467         if options.verbosity >= 1
0468             fprintf([reason '\n']);
0469         end
0470         break;
0471     end
0472
0473     if options.verbosity > 2 || options.debug > 0
0474         fprintf('************************************************************************\n');
0475     end
0476
0477     % *************************
0478     % ** Begin TR Subproblem **
0479     % *************************
0480
0481     % Determine eta0
0482     if ~options.useRand
0483         % Pick the zero vector
0484         eta = M.zerovec(x);
0485     else
0486         % Random vector in T_x M (this has to be very small)
0487         eta = M.lincomb(x, 1e-6, M.randvec(x));
0488         % Must be inside trust-region
0489         while M.norm(x, eta) > Delta
0490             eta = M.lincomb(x, sqrt(sqrt(eps)), eta);
0491         end
0492     end
0493
0494     % Solve TR subproblem approximately
0495     [eta, Heta, numit, stop_inner] = ...
0496                 tCG(problem, x, fgradx, eta, Delta, options, storedb, key);
0497     srstr = tcg_stop_reason{stop_inner};
0498
0499     % If using randomized approach, compare result with the Cauchy point.
0500     % Convergence proofs assume that we achieve at least (a fraction of)
0501     % the reduction of the Cauchy point. After this if-block, either all
0502     % eta-related quantities have been changed consistently, or none of
0503     % them have changed.
0504     if options.useRand
0505         used_cauchy = false;
0506         % Check the curvature,
0507         Hg = getHessian(problem, x, fgradx, storedb, key);
0508         g_Hg = M.inner(x, fgradx, Hg);
0509         if g_Hg <= 0
0510             tau_c = 1;
0511         else
0512             tau_c = min( norm_grad^3/(Delta*g_Hg) , 1);
0513         end
0514         % and generate the Cauchy point.
0516         Heta_c = M.lincomb(x, -tau_c * Delta / norm_grad, Hg);
0517
0518         % Now that we have computed the Cauchy point in addition to the
0519         % returned eta, we might as well keep the best of them.
0520         mdle  = fx + M.inner(x, fgradx, eta) + .5*M.inner(x, Heta,   eta);
0521         mdlec = fx + M.inner(x, fgradx, eta_c) + .5*M.inner(x, Heta_c, eta_c);
0522         if mdlec < mdle
0523             eta = eta_c;
0524             Heta = Heta_c; % added April 11, 2012
0525             used_cauchy = true;
0526         end
0527     end
0528
0529
0530     % This is computed for logging purposes and may be useful for some
0531     % user-defined stopping criteria.
0532     norm_eta = M.norm(x, eta);
0533
0534     if options.debug > 0
0536     end
0537
0538
0539     % Compute the tentative next iterate (the proposal)
0540     x_prop  = M.retr(x, eta);
0541     key_prop = storedb.getNewKey();
0542
0543     % Compute the function value of the proposal
0544     fx_prop = getCost(problem, x_prop, storedb, key_prop);
0545
0546     % Will we accept the proposal or not?
0547     % Check the performance of the quadratic model against the actual cost.
0548     rhonum = fx - fx_prop;
0549     vecrho = M.lincomb(x, 1, fgradx, .5, Heta);
0550     rhoden = -M.inner(x, eta, vecrho);
0551     % rhonum could be anything.
0552     % rhoden should be nonnegative, as guaranteed by tCG, baring numerical
0553     % errors.
0554
0555     % Heuristic -- added Dec. 2, 2013 (NB) to replace the former heuristic.
0556     % This heuristic is documented in the book by Conn Gould and Toint on
0557     % trust-region methods, section 17.4.2.
0558     % rhonum measures the difference between two numbers. Close to
0559     % convergence, these two numbers are very close to each other, so
0560     % that computing their difference is numerically challenging: there may
0561     % be a significant loss in accuracy. Since the acceptance or rejection
0562     % of the step is conditioned on the ratio between rhonum and rhoden,
0563     % large errors in rhonum result in a very large error in rho, hence in
0564     % erratic acceptance / rejection. Meanwhile, close to convergence,
0565     % steps are usually trustworthy and we should transition to a Newton-
0566     % like method, with rho=1 consistently. The heuristic thus shifts both
0567     % rhonum and rhoden by a small amount such that far from convergence,
0568     % the shift is irrelevant and close to convergence, the ratio rho goes
0569     % to 1, effectively promoting acceptance of the step.
0570     % The rationale is that close to convergence, both rhonum and rhoden
0571     % are quadratic in the distance between x and x_prop. Thus, when this
0572     % distance is on the order of sqrt(eps), the value of rhonum and rhoden
0573     % is on the order of eps, which is indistinguishable from the numerical
0574     % error, resulting in badly estimated rho's.
0575     % For abs(fx) < 1, this heuristic is invariant under offsets of f but
0576     % not under scaling of f. For abs(fx) > 1, the opposite holds. This
0577     % should not alarm us, as this heuristic only triggers at the very last
0578     % iterations if very fine convergence is demanded.
0579     rho_reg = max(1, abs(fx)) * eps * options.rho_regularization;
0580     rhonum = rhonum + rho_reg;
0581     rhoden = rhoden + rho_reg;
0582
0583     if options.debug > 0
0584         fprintf('DBG:     rhonum : %e\n', rhonum);
0585         fprintf('DBG:     rhoden : %e\n', rhoden);
0586     end
0587
0588     % This is always true if a linear, symmetric operator is used for the
0589     % Hessian (approximation) and if we had infinite numerical precision.
0590     % In practice, nonlinear approximations of the Hessian such as the
0591     % built-in finite difference approximation and finite numerical
0592     % accuracy can cause the model to increase. In such scenarios, we
0593     % decide to force a rejection of the step and a reduction of the
0594     % trust-region radius. We test the sign of the regularized rhoden since
0595     % the regularization is supposed to capture the accuracy to which
0596     % rhoden is computed: if rhoden were negative before regularization but
0597     % not after, that should not be (and is not) detected as a failure.
0598     %
0600     % includes a mechanism to ensure model decrease if the Cauchy step
0601     % attained a decrease (which is theoretically the case under very lax
0602     % assumptions). This being said, it is always possible that numerical
0603     % errors will prevent this, so that it is good to keep a safeguard.
0604     %
0605     % The current strategy is that, if this should happen, then we reject
0606     % the step and reduce the trust region radius. This also ensures that
0607     % the actual cost values are monotonically decreasing.
0608     %
0609     % [This bit of code seems not to trigger because tCG already ensures
0610     %  the model decreases even in the presence of non-linearities; but as
0611     %  a result the radius is not necessarily decreased. Perhaps we should
0612     %  change this with the proposed commented line below; needs testing.]
0613     %
0614     model_decreased = (rhoden >= 0);
0615     % model_decreased = (rhoden >= 0) && (stop_inner ~= 6);
0616
0617     if ~model_decreased
0618         srstr = [srstr ', model did not decrease']; %#ok<AGROW>
0619     end
0620
0621     rho = rhonum / rhoden;
0622
0623     % Added June 30, 2015 following observation by BM.
0624     % With this modification, it is guaranteed that a step rejection is
0625     % always accompanied by a TR reduction. This prevents stagnation in
0626     % this "corner case" (NaN's really aren't supposed to occur, but it's
0627     % nice if we can handle them nonetheless).
0628     if isnan(rho)
0629         fprintf('rho is NaN! Forcing a radius decrease. This should not happen.\n');
0630         if isnan(fx_prop)
0631             fprintf('The cost function returned NaN (perhaps the retraction returned a bad point?)\n');
0632         else
0633             fprintf('The cost function did not return a NaN value.\n');
0634         end
0635     end
0636
0637     if options.debug > 0
0638         m = @(x, eta) ...
0639           getCost(problem, x, storedb, key) + ...
0640           getDirectionalDerivative(problem, x, eta, storedb, key) + ...
0641              .5*M.inner(x, getHessian(problem, x, eta, storedb, key), eta);
0642         zerovec = M.zerovec(x);
0643         actrho = (fx - fx_prop) / (m(x, zerovec) - m(x, eta));
0644         fprintf('DBG:   new f(x) : %+e\n', fx_prop);
0645         fprintf('DBG: actual rho : %e\n', actrho);
0646         fprintf('DBG:   used rho : %e\n', rho);
0647     end
0648
0649     % Choose the new TR radius based on the model performance
0650     trstr = '   ';
0651     % If the actual decrease is smaller than 1/4 of the predicted decrease,
0652     % then reduce the TR radius.
0653     if rho < 1/4 || ~model_decreased || isnan(rho)
0654         trstr = 'TR-';
0655         Delta = Delta/4;
0656         consecutive_TRplus = 0;
0657         consecutive_TRminus = consecutive_TRminus + 1;
0658         if consecutive_TRminus >= 5 && options.verbosity >= 2
0659             consecutive_TRminus = -inf;
0660             fprintf(' +++ Detected many consecutive TR- (radius decreases).\n');
0661             fprintf(' +++ Consider decreasing options.Delta_bar by an order of magnitude.\n');
0662             fprintf(' +++ Current values: options.Delta_bar = %g and options.Delta0 = %g.\n', options.Delta_bar, options.Delta0);
0663         end
0664     % If the actual decrease is at least 3/4 of the predicted decrease and
0665     % the tCG (inner solve) hit the TR boundary, increase the TR radius.
0666     % We also keep track of the number of consecutive trust-region radius
0667     % increases. If there are many, this may indicate the need to adapt the
0668     % initial and maximum radii.
0669     elseif rho > 3/4 && (stop_inner == 1 || stop_inner == 2)
0670         trstr = 'TR+';
0671         Delta = min(2*Delta, options.Delta_bar);
0672         consecutive_TRminus = 0;
0673         consecutive_TRplus = consecutive_TRplus + 1;
0674         if consecutive_TRplus >= 5 && options.verbosity >= 1
0675             consecutive_TRplus = -inf;
0676             fprintf(' +++ Detected many consecutive TR+ (radius increases).\n');
0677             fprintf(' +++ Consider increasing options.Delta_bar by an order of magnitude.\n');
0678             fprintf(' +++ Current values: options.Delta_bar = %g and options.Delta0 = %g.\n', options.Delta_bar, options.Delta0);
0679         end
0680     else
0681         % Otherwise, keep the TR radius constant.
0682         consecutive_TRplus = 0;
0683         consecutive_TRminus = 0;
0684     end
0685
0686     % Choose to accept or reject the proposed step based on the model
0687     % performance. Note the strict inequality.
0688     if model_decreased && rho > options.rho_prime
0689
0690         % April 17, 2018: a side effect of rho_regularization > 0 is that
0691         % it can happen that the cost function appears to go up (although
0692         % only by a small amount) for some accepted steps. We decide to
0693         % accept this because, numerically, computing the difference
0694         % between fx_prop and fx is more difficult than computing the
0695         % improvement in the model, because fx_prop and fx are on the same
0696         % order of magnitude yet are separated by a very small gap near
0697         % convergence, whereas the model improvement is computed as a sum
0698         % of two small terms. As a result, the step which seems bad may
0699         % turn out to be good, in that it may help reduce the gradient norm
0700         % for example. This update merely informs the user of this event.
0701         % In further updates, we could also introduce this as a stopping
0702         % criterion. It is then important to choose wisely which of x or
0703         % x_prop should be returned (perhaps the one with smallest
0705         if fx_prop > fx && options.verbosity >= 2
0706             fprintf(['Between line above and below, cost function ' ...
0707                      'increased by %s (step size: %g)\n'], ...
0708                      fx_prop - fx, norm_eta);
0709         end
0710
0711         accept = true;
0712         accstr = 'acc';
0713         % We accept the step: no need to keep the old cache.
0714         storedb.removefirstifdifferent(key, key_prop);
0715         x = x_prop;
0716         key = key_prop;
0717         fx = fx_prop;
0720     else
0721         % We reject the step: no need to keep cache related to the
0722         % tentative step.
0723         storedb.removefirstifdifferent(key_prop, key);
0724         accept = false;
0725         accstr = 'REJ';
0726     end
0727
0728     % k is the number of iterations we have accomplished.
0729     k = k + 1;
0730
0731     % Make sure we don't use too much memory for the store database
0732     storedb.purge();
0733
0734
0735     % Log statistics for freshly executed iteration.
0736     % Everything after this in the loop is not accounted for in the timing.
0737     stats = savestats(problem, x, storedb, key, options, k, fx, ...
0738                       norm_grad, Delta, ticstart, info, rho, rhonum, ...
0739                       rhoden, accept, numit, norm_eta, used_cauchy);
0740     info(k+1) = stats;
0741
0742
0743     % ** Display:
0744     if options.verbosity == 2
0745         fprintf(['%3s %3s   k: %5d     num_inner: %5d     ', ...
0746         'f: %+e   |grad|: %e   %s\n'], ...
0748     elseif options.verbosity > 2
0749         if options.useRand && used_cauchy
0750             fprintf('USED CAUCHY POINT\n');
0751         end
0752         fprintf('%3s %3s    k: %5d     num_inner: %5d     %s\n', ...
0753                 accstr, trstr, k, numit, srstr);
0755         if options.debug > 0
0756             fprintf('      Delta : %f          |eta| : %e\n',Delta,norm_eta);
0757         end
0758         fprintf('        rho : %e\n',rho);
0759     end
0760     if options.debug > 0
0762         if rho == 0
0763             fprintf('DBG: rho = 0, this will likely hinder further convergence.\n');
0764         end
0765     end
0766
0767 end  % of TR loop (counter: k)
0768
0769 % Restrict info struct-array to useful part
0770 info = info(1:k+1);
0771
0772
0773 if (options.verbosity > 2) || (options.debug > 0)
0774    fprintf('************************************************************************\n');
0775 end
0776 if (options.verbosity > 0) || (options.debug > 0)
0777     fprintf('Total time is %f [s] (excludes statsfun)\n', info(end).time);
0778 end
0779
0780 % Return the best cost reached
0781 cost = fx;
0782
0783 end
0784
0785
0786
0787
0788
0789 % Routine in charge of collecting the current iteration stats
0790 function stats = savestats(problem, x, storedb, key, options, k, fx, ...
0791                            norm_grad, Delta, ticstart, info, rho, rhonum, ...
0792                            rhoden, accept, numit, norm_eta, used_cauchy)
0793     stats.iter = k;
0794     stats.cost = fx;
0796     stats.Delta = Delta;
0797     if k == 0
0798         stats.time = toc(ticstart);
0799         stats.rho = inf;
0800         stats.rhonum = NaN;
0801         stats.rhoden = NaN;
0802         stats.accepted = true;
0803         stats.numinner = 0;
0804         stats.stepsize = NaN;
0805         if options.useRand
0806             stats.cauchy = false;
0807         end
0808     else
0809         stats.time = info(k).time + toc(ticstart);
0810         stats.rho = rho;
0811         stats.rhonum = rhonum;
0812         stats.rhoden = rhoden;
0813         stats.accepted = accept;
0814         stats.numinner = numit;
0815         stats.stepsize = norm_eta;
0816         if options.useRand
0817           stats.cauchy = used_cauchy;
0818         end
0819     end
0820
0821     % See comment about statsfun above: the x and store passed to statsfun
0822     % are that of the most recently accepted point after the iteration
0823     % fully executed.
0824     stats = applyStatsfun(problem, x, storedb, key, options, stats);
0825
0826 end```

Generated on Sun 05-Sep-2021 17:57:00 by m2html © 2005