Home > manopt > solvers > trustregions > trustregions.m

trustregions

PURPOSE ^

Riemannian trust-regions solver for optimization on manifolds.

SYNOPSIS ^

function [x, cost, info, options] = trustregions(problem, x, options)

DESCRIPTION ^

 Riemannian trust-regions solver for optimization on manifolds.

 function [x, cost, info, options] = trustregions(problem)
 function [x, cost, info, options] = trustregions(problem, x0)
 function [x, cost, info, options] = trustregions(problem, x0, options)
 function [x, cost, info, options] = trustregions(problem, [], options)

 This is the Riemannian Trust-Region solver for Manopt, named RTR.
 This solver tries to minimize the cost function described in the problem
 structure. It requires the availability of the cost function and of its
 gradient. It issues calls for the Hessian.

 If no Hessian nor approximation for it is provided, an approximation of
 Hessian-vector products is computed with finite differences of gradients.

 If no gradient is provided, an approximation of the gradient is computed,
 but this can be slow for manifolds of high dimension.

 At each iteration a subproblem is solved using a trust-region subproblem
 (TRS) solver. The default is @trs_tCG_cached. This one (and some others)
 use the preconditioner if one is supplied.

 For a description of the algorithm and theorems offering convergence
 guarantees, see the references below. Documentation for this solver is
 available online, but may be outdated.
 http://www.manopt.org/solver_documentation_trustregions.html


 The initial iterate is x0 if it is provided. Otherwise, a random point on
 the manifold is picked. To specify options whilst not specifying an
 initial iterate, give x0 as [] (the empty matrix).

 The two outputs 'x' and 'cost' are the last reached point on the manifold
 and its cost. Notice that x is not necessarily the best reached point,
 because this solver is not forced to be a descent method. In particular,
 very close to convergence, it is sometimes preferable to accept very
 slight increases in the cost value (on the order of the machine epsilon)
 in the process of reaching fine convergence. Other than that, the cost
 function value does decrease monotonically with iterations.
 
 The output 'info' is a struct-array which contains information about the
 iterations:
   iter (integer)
       The (outer) iteration number, or number of steps considered
       (whether accepted or rejected). The initial guess is 0.
   cost (double)
       The corresponding cost value.
   gradnorm (double)
       The (Riemannian) norm of the gradient.
   time (double)
       The total elapsed time in seconds to reach the corresponding cost.
   rho (double)
       The performance ratio for the iterate.
   rhonum, rhoden (double)
       Regularized numerator and denominator of the performance ratio:
       rho = rhonum/rhoden. See options.rho_regularization.
   accepted (boolean)
       Whether the proposed iterate was accepted or not.
   stepsize (double)
       The (Riemannian) norm of the vector returned by the inner solver
       and which is retracted to obtain the proposed next iterate. If
       accepted = true for the corresponding iterate, this is the size of
       the step from the previous to the new iterate. If accepted is
       false, the step was not executed and this is the size of the
       rejected step.
   Delta (double)
       The trust-region radius at the outer iteration.
   limitedbyTR (boolean)
       true if the subproblemsolver was limited by the trust-region
       radius (a boundary solution was returned).
   And possibly additional information logged by the subproblemsolver or
   by options.statsfun.
 For example, type [info.gradnorm] to obtain a vector of the successive
 gradient norms reached at each (outer) iteration.

 The options structure is used to overwrite the default values. All
 options have a default value and are hence optional. To force an option
 value, pass an options structure with a field options.optionname, where
 optionname is one of the following and the default value is indicated
 between parentheses:

   tolgradnorm (1e-6)
       The algorithm terminates if the norm of the gradient drops below
       this. For well-scaled problems, a rule of thumb is that you can
       expect to reduce the gradient norm by 8 orders of magnitude
       (sqrt(eps)) compared to the gradient norm at a "typical" point (a
       rough initial iterate for example). Further decrease is sometimes
       possible, but inexact floating point arithmetic limits the final
       accuracy. If tolgradnorm is set too low, the algorithm may end up
       iterating forever (or until another stopping criterion triggers).
   maxiter (1000)
       The algorithm terminates after at most maxiter (outer) iterations.
   maxtime (Inf)
       The algorithm terminates if maxtime seconds elapsed.
   miniter (0)
       Minimum number of outer iterations: this overrides all other
       stopping criteria. Can be helpful to escape saddle points.
   Delta_bar (problem.M.typicaldist() or sqrt(problem.M.dim()))
       Maximum trust-region radius. If you specify this parameter but not
       Delta0, then Delta0 is set to 1/8 times this parameter.
   Delta0 (Delta_bar/8)
       Initial trust-region radius. If you observe a long plateau at the
       beginning of the convergence plot (gradient norm vs iteration), it
       may pay off to try to tune this parameter to shorten the plateau.
       You should not set this parameter without setting Delta_bar too (at
       a larger value).
   subproblemsolver (@trs_tCG_cached)
       Function handle to a subproblem solver. The subproblem solver also
       sees this options structure, so that parameters can be passed to it
       through here as well. Built-in solvers include:
           trs_tCG_cached
           trs_tCG
           trs_gep
       Note that trs_gep solves the subproblem exactly which may be slow.
       It is included mainly for prototyping or for solving the subproblem
       exactly in low dimensional subspaces.
   rho_prime (0.1)
       Accept/reject threshold : if rho is at least rho_prime, the outer
       iteration is accepted. Otherwise, it is rejected. In case it is
       rejected, the trust-region radius will have been decreased.
       To ensure this, rho_prime >= 0 must be strictly smaller than 1/4.
       If rho_prime is negative, the algorithm is not guaranteed to
       produce monotonically decreasing cost values. It is strongly
       recommended to set rho_prime > 0, to aid convergence.
   rho_regularization (1e3)
       Close to convergence, evaluating the performance ratio rho is
       numerically challenging. Meanwhile, close to convergence, the
       quadratic model should be a good fit and the steps should be
       accepted. Regularization lets rho go to 1 as the model decrease and
       the actual decrease go to zero. Set this option to zero to disable
       regularization (not recommended). See in-code for the specifics.
       When this is not zero, it may happen that the iterates produced are
       not monotonically improving the cost when very close to
       convergence. This is because the corrected cost improvement could
       change sign if it is negative but very small.
   statsfun (none)
       Function handle to a function that is called after each iteration
       to provide the opportunity to log additional statistics.
       They are returned in the info struct. See the generic Manopt
       documentation about solvers for further information. statsfun is
       called with the point x that was reached last, after the
       accept/reject decision. See comment below.
   stopfun (none)
       Function handle to a function that is called at each iteration to
       provide the opportunity to specify additional stopping criteria.
       See the generic Manopt documentation about solvers for further
       information.
   verbosity (2)
       Integer number used to tune the amount of output the algorithm logs
       during execution (mostly as text in the command window).
       The higher, the more output. 0 means silent. 3 and above includes a
       display of the options structure at the beginning of the execution.
   debug (false)
       Set to true to allow the algorithm to perform additional
       computations for debugging purposes. If a debugging test fails, you
       will be informed of it, usually via the command window. Be aware
       that these additional computations appear in the algorithm timings
       too, and may interfere with operations such as counting the number
       of cost evaluations, etc. The debug calls get storedb too.
   storedepth (2)
       Maximum number of different points x of the manifold for which a
       store structure may be kept in memory in the storedb for caching.
       If memory usage is an issue, you may try to lower this number.
       Profiling or manopt counters may then help to investigate if a
       performance hit was incurred as a result.
   hook (none)
       A function handle which allows the user to change the current point
       x at the beginning of each iteration, before the stopping criterion
       is evaluated. See applyHook for help on how to use this option.

 Notice that statsfun is called with the point x that was reached last,
 after the accept/reject decision. Hence: if the step was accepted, we get
 that new x, with a store which only saw the call for the cost and for the
 gradient. If the step was rejected, we get the same x as previously, with
 the store structure containing everything that was computed at that point
 (possibly including previous rejects at that same point). Hence, statsfun
 should not be used in conjunction with the store to count operations for
 example. Instead, you should use manopt counters: see statscounters.


 Please cite the Manopt paper as well as the research paper:
     @Article{genrtr,
       Title    = {Trust-region methods on {Riemannian} manifolds},
       Author   = {Absil, P.-A. and Baker, C. G. and Gallivan, K. A.},
       Journal  = {Foundations of Computational Mathematics},
       Year     = {2007},
       Number   = {3},
       Pages    = {303--330},
       Volume   = {7},
       Doi      = {10.1007/s10208-005-0179-9}
     }

 See also: steepestdescent conjugategradient manopt/examples

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [x, cost, info, options] = trustregions(problem, x, options)
0002 % Riemannian trust-regions solver for optimization on manifolds.
0003 %
0004 % function [x, cost, info, options] = trustregions(problem)
0005 % function [x, cost, info, options] = trustregions(problem, x0)
0006 % function [x, cost, info, options] = trustregions(problem, x0, options)
0007 % function [x, cost, info, options] = trustregions(problem, [], options)
0008 %
0009 % This is the Riemannian Trust-Region solver for Manopt, named RTR.
0010 % This solver tries to minimize the cost function described in the problem
0011 % structure. It requires the availability of the cost function and of its
0012 % gradient. It issues calls for the Hessian.
0013 %
0014 % If no Hessian nor approximation for it is provided, an approximation of
0015 % Hessian-vector products is computed with finite differences of gradients.
0016 %
0017 % If no gradient is provided, an approximation of the gradient is computed,
0018 % but this can be slow for manifolds of high dimension.
0019 %
0020 % At each iteration a subproblem is solved using a trust-region subproblem
0021 % (TRS) solver. The default is @trs_tCG_cached. This one (and some others)
0022 % use the preconditioner if one is supplied.
0023 %
0024 % For a description of the algorithm and theorems offering convergence
0025 % guarantees, see the references below. Documentation for this solver is
0026 % available online, but may be outdated.
0027 % http://www.manopt.org/solver_documentation_trustregions.html
0028 %
0029 %
0030 % The initial iterate is x0 if it is provided. Otherwise, a random point on
0031 % the manifold is picked. To specify options whilst not specifying an
0032 % initial iterate, give x0 as [] (the empty matrix).
0033 %
0034 % The two outputs 'x' and 'cost' are the last reached point on the manifold
0035 % and its cost. Notice that x is not necessarily the best reached point,
0036 % because this solver is not forced to be a descent method. In particular,
0037 % very close to convergence, it is sometimes preferable to accept very
0038 % slight increases in the cost value (on the order of the machine epsilon)
0039 % in the process of reaching fine convergence. Other than that, the cost
0040 % function value does decrease monotonically with iterations.
0041 %
0042 % The output 'info' is a struct-array which contains information about the
0043 % iterations:
0044 %   iter (integer)
0045 %       The (outer) iteration number, or number of steps considered
0046 %       (whether accepted or rejected). The initial guess is 0.
0047 %   cost (double)
0048 %       The corresponding cost value.
0049 %   gradnorm (double)
0050 %       The (Riemannian) norm of the gradient.
0051 %   time (double)
0052 %       The total elapsed time in seconds to reach the corresponding cost.
0053 %   rho (double)
0054 %       The performance ratio for the iterate.
0055 %   rhonum, rhoden (double)
0056 %       Regularized numerator and denominator of the performance ratio:
0057 %       rho = rhonum/rhoden. See options.rho_regularization.
0058 %   accepted (boolean)
0059 %       Whether the proposed iterate was accepted or not.
0060 %   stepsize (double)
0061 %       The (Riemannian) norm of the vector returned by the inner solver
0062 %       and which is retracted to obtain the proposed next iterate. If
0063 %       accepted = true for the corresponding iterate, this is the size of
0064 %       the step from the previous to the new iterate. If accepted is
0065 %       false, the step was not executed and this is the size of the
0066 %       rejected step.
0067 %   Delta (double)
0068 %       The trust-region radius at the outer iteration.
0069 %   limitedbyTR (boolean)
0070 %       true if the subproblemsolver was limited by the trust-region
0071 %       radius (a boundary solution was returned).
0072 %   And possibly additional information logged by the subproblemsolver or
0073 %   by options.statsfun.
0074 % For example, type [info.gradnorm] to obtain a vector of the successive
0075 % gradient norms reached at each (outer) iteration.
0076 %
0077 % The options structure is used to overwrite the default values. All
0078 % options have a default value and are hence optional. To force an option
0079 % value, pass an options structure with a field options.optionname, where
0080 % optionname is one of the following and the default value is indicated
0081 % between parentheses:
0082 %
0083 %   tolgradnorm (1e-6)
0084 %       The algorithm terminates if the norm of the gradient drops below
0085 %       this. For well-scaled problems, a rule of thumb is that you can
0086 %       expect to reduce the gradient norm by 8 orders of magnitude
0087 %       (sqrt(eps)) compared to the gradient norm at a "typical" point (a
0088 %       rough initial iterate for example). Further decrease is sometimes
0089 %       possible, but inexact floating point arithmetic limits the final
0090 %       accuracy. If tolgradnorm is set too low, the algorithm may end up
0091 %       iterating forever (or until another stopping criterion triggers).
0092 %   maxiter (1000)
0093 %       The algorithm terminates after at most maxiter (outer) iterations.
0094 %   maxtime (Inf)
0095 %       The algorithm terminates if maxtime seconds elapsed.
0096 %   miniter (0)
0097 %       Minimum number of outer iterations: this overrides all other
0098 %       stopping criteria. Can be helpful to escape saddle points.
0099 %   Delta_bar (problem.M.typicaldist() or sqrt(problem.M.dim()))
0100 %       Maximum trust-region radius. If you specify this parameter but not
0101 %       Delta0, then Delta0 is set to 1/8 times this parameter.
0102 %   Delta0 (Delta_bar/8)
0103 %       Initial trust-region radius. If you observe a long plateau at the
0104 %       beginning of the convergence plot (gradient norm vs iteration), it
0105 %       may pay off to try to tune this parameter to shorten the plateau.
0106 %       You should not set this parameter without setting Delta_bar too (at
0107 %       a larger value).
0108 %   subproblemsolver (@trs_tCG_cached)
0109 %       Function handle to a subproblem solver. The subproblem solver also
0110 %       sees this options structure, so that parameters can be passed to it
0111 %       through here as well. Built-in solvers include:
0112 %           trs_tCG_cached
0113 %           trs_tCG
0114 %           trs_gep
0115 %       Note that trs_gep solves the subproblem exactly which may be slow.
0116 %       It is included mainly for prototyping or for solving the subproblem
0117 %       exactly in low dimensional subspaces.
0118 %   rho_prime (0.1)
0119 %       Accept/reject threshold : if rho is at least rho_prime, the outer
0120 %       iteration is accepted. Otherwise, it is rejected. In case it is
0121 %       rejected, the trust-region radius will have been decreased.
0122 %       To ensure this, rho_prime >= 0 must be strictly smaller than 1/4.
0123 %       If rho_prime is negative, the algorithm is not guaranteed to
0124 %       produce monotonically decreasing cost values. It is strongly
0125 %       recommended to set rho_prime > 0, to aid convergence.
0126 %   rho_regularization (1e3)
0127 %       Close to convergence, evaluating the performance ratio rho is
0128 %       numerically challenging. Meanwhile, close to convergence, the
0129 %       quadratic model should be a good fit and the steps should be
0130 %       accepted. Regularization lets rho go to 1 as the model decrease and
0131 %       the actual decrease go to zero. Set this option to zero to disable
0132 %       regularization (not recommended). See in-code for the specifics.
0133 %       When this is not zero, it may happen that the iterates produced are
0134 %       not monotonically improving the cost when very close to
0135 %       convergence. This is because the corrected cost improvement could
0136 %       change sign if it is negative but very small.
0137 %   statsfun (none)
0138 %       Function handle to a function that is called after each iteration
0139 %       to provide the opportunity to log additional statistics.
0140 %       They are returned in the info struct. See the generic Manopt
0141 %       documentation about solvers for further information. statsfun is
0142 %       called with the point x that was reached last, after the
0143 %       accept/reject decision. See comment below.
0144 %   stopfun (none)
0145 %       Function handle to a function that is called at each iteration to
0146 %       provide the opportunity to specify additional stopping criteria.
0147 %       See the generic Manopt documentation about solvers for further
0148 %       information.
0149 %   verbosity (2)
0150 %       Integer number used to tune the amount of output the algorithm logs
0151 %       during execution (mostly as text in the command window).
0152 %       The higher, the more output. 0 means silent. 3 and above includes a
0153 %       display of the options structure at the beginning of the execution.
0154 %   debug (false)
0155 %       Set to true to allow the algorithm to perform additional
0156 %       computations for debugging purposes. If a debugging test fails, you
0157 %       will be informed of it, usually via the command window. Be aware
0158 %       that these additional computations appear in the algorithm timings
0159 %       too, and may interfere with operations such as counting the number
0160 %       of cost evaluations, etc. The debug calls get storedb too.
0161 %   storedepth (2)
0162 %       Maximum number of different points x of the manifold for which a
0163 %       store structure may be kept in memory in the storedb for caching.
0164 %       If memory usage is an issue, you may try to lower this number.
0165 %       Profiling or manopt counters may then help to investigate if a
0166 %       performance hit was incurred as a result.
0167 %   hook (none)
0168 %       A function handle which allows the user to change the current point
0169 %       x at the beginning of each iteration, before the stopping criterion
0170 %       is evaluated. See applyHook for help on how to use this option.
0171 %
0172 % Notice that statsfun is called with the point x that was reached last,
0173 % after the accept/reject decision. Hence: if the step was accepted, we get
0174 % that new x, with a store which only saw the call for the cost and for the
0175 % gradient. If the step was rejected, we get the same x as previously, with
0176 % the store structure containing everything that was computed at that point
0177 % (possibly including previous rejects at that same point). Hence, statsfun
0178 % should not be used in conjunction with the store to count operations for
0179 % example. Instead, you should use manopt counters: see statscounters.
0180 %
0181 %
0182 % Please cite the Manopt paper as well as the research paper:
0183 %     @Article{genrtr,
0184 %       Title    = {Trust-region methods on {Riemannian} manifolds},
0185 %       Author   = {Absil, P.-A. and Baker, C. G. and Gallivan, K. A.},
0186 %       Journal  = {Foundations of Computational Mathematics},
0187 %       Year     = {2007},
0188 %       Number   = {3},
0189 %       Pages    = {303--330},
0190 %       Volume   = {7},
0191 %       Doi      = {10.1007/s10208-005-0179-9}
0192 %     }
0193 %
0194 % See also: steepestdescent conjugategradient manopt/examples
0195 
0196 % An explicit, general listing of this algorithm, with preconditioning,
0197 % can be found in the following paper:
0198 %     @Article{boumal2015lowrank,
0199 %       Title   = {Low-rank matrix completion via preconditioned optimization on the {G}rassmann manifold},
0200 %       Author  = {Boumal, N. and Absil, P.-A.},
0201 %       Journal = {Linear Algebra and its Applications},
0202 %       Year    = {2015},
0203 %       Pages   = {200--239},
0204 %       Volume  = {475},
0205 %       Doi     = {10.1016/j.laa.2015.02.027},
0206 %     }
0207 
0208 % When the Hessian is not specified, it is approximated with
0209 % finite-differences of the gradient. The resulting method is called
0210 % RTR-FD. Some convergence theory for it is available in this paper:
0211 % @incollection{boumal2015rtrfd
0212 %    author={Boumal, N.},
0213 %    title={Riemannian trust regions with finite-difference Hessian approximations are globally convergent},
0214 %    year={2015},
0215 %    booktitle={Geometric Science of Information}
0216 % }
0217 
0218 
0219 % This file is part of Manopt: www.manopt.org.
0220 % This code is an adaptation to Manopt of the original GenRTR code:
0221 % RTR - Riemannian Trust-Region
0222 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan
0223 % Florida State University
0224 % School of Computational Science
0225 % (http://www.math.fsu.edu/~cbaker/GenRTR/?page=download)
0226 % See accompanying license file.
0227 % The adaptation was executed by Nicolas Boumal.
0228 %
0229 %
0230 % Change log:
0231 %
0232 %   NB April 3, 2013:
0233 %       tCG now returns the Hessian along the returned direction eta, so
0234 %       that we do not compute that Hessian redundantly: some savings at
0235 %       each iteration. Similarly, if the useRand flag is on, we spare an
0236 %       extra Hessian computation at each outer iteration too, owing to
0237 %       some modifications in the Cauchy point section of the code specific
0238 %       to useRand = true.
0239 %
0240 %   NB Aug. 22, 2013:
0241 %       This function is now Octave compatible. The transition called for
0242 %       two changes which would otherwise not be advisable. (1) tic/toc is
0243 %       now used as is, as opposed to the safer way:
0244 %       t = tic(); elapsed = toc(t);
0245 %       And (2), the (formerly inner) function savestats was moved outside
0246 %       the main function to not be nested anymore. This is arguably less
0247 %       elegant, but Octave does not (and likely will not) support nested
0248 %       functions.
0249 %
0250 %   NB Dec. 2, 2013:
0251 %       The in-code documentation was largely revised and expanded.
0252 %
0253 %   NB Dec. 2, 2013:
0254 %       The former heuristic which triggered when rhonum was very small and
0255 %       forced rho = 1 has been replaced by a smoother heuristic which
0256 %       consists in regularizing rhonum and rhoden before computing their
0257 %       ratio. It is tunable via options.rho_regularization. Furthermore,
0258 %       the solver now detects if tCG did not obtain a model decrease
0259 %       (which is theoretically impossible but may happen because of
0260 %       numerical errors and/or because of a nonlinear/nonsymmetric Hessian
0261 %       operator, which is the case for finite difference approximations).
0262 %       When such an anomaly is detected, the step is rejected and the
0263 %       trust region radius is decreased.
0264 %       Feb. 18, 2015 note: this is less useful now, as tCG now guarantees
0265 %       model decrease even for the finite difference approximation of the
0266 %       Hessian. It is still useful in case of numerical errors, but this
0267 %       is less stringent.
0268 %
0269 %   NB Dec. 3, 2013:
0270 %       The stepsize is now registered at each iteration, at a small
0271 %       additional cost. The defaults for Delta_bar and Delta0 are better
0272 %       defined. Setting Delta_bar in the options will automatically set
0273 %       Delta0 accordingly. In Manopt 1.0.4, the defaults for these options
0274 %       were not treated appropriately because of an incorrect use of the
0275 %       isfield() built-in function.
0276 %
0277 %   NB Feb. 18, 2015:
0278 %       Added some comments. Also, Octave now supports safe tic/toc usage,
0279 %       so we reverted the changes to use that again (see Aug. 22, 2013 log
0280 %       entry).
0281 %
0282 %   NB April 3, 2015:
0283 %       Works with the new StoreDB class system.
0284 %
0285 %   NB April 8, 2015:
0286 %       No Hessian warning if approximate Hessian explicitly available.
0287 %
0288 %   NB Nov. 1, 2016:
0289 %       Now uses approximate gradient via finite differences if need be.
0290 %
0291 %   NB Aug. 2, 2018:
0292 %       Using storedb.remove() to keep the cache lean, which allowed to
0293 %       reduce storedepth to 2 from 20 (by default).
0294 %
0295 %   NB July 19, 2020:
0296 %       Added support for options.hook.
0297 %
0298 %   VL Aug. 17, 2022:
0299 %       Refactored code to use various subproblem solvers with a new input
0300 %       output pattern. Modified how information about iterations is
0301 %       printed to accomodate new subproblem solvers. Moved all useRand and
0302 %       cauchy logic to trs_tCG. Options pertaining to tCG are still
0303 %       available but have moved to that file. Made trs_tCG_cached default.
0304 
0305 
0306 % Verify that the problem description is sufficient for the solver.
0307 
0308 if ~canGetCost(problem)
0309     warning('manopt:getCost', ...
0310             'No cost provided. The algorithm will likely abort.');  
0311 end
0312 if ~canGetGradient(problem) && ~canGetApproxGradient(problem)
0313     % Note: we do not give a warning if an approximate gradient is
0314     % explicitly given in the problem description, as in that case the user
0315     % seems to be aware of the issue.
0316     warning('manopt:getGradient:approx', ...
0317            ['No gradient provided. Using FD approximation (slow).\n' ...
0318             'It may be necessary to increase options.tolgradnorm.\n' ...
0319             'To disable this warning: ' ...
0320             'warning(''off'', ''manopt:getGradient:approx'')']);
0321     problem.approxgrad = approxgradientFD(problem);
0322 end
0323 if ~canGetHessian(problem) && ~canGetApproxHessian(problem)
0324     % Note: we do not give a warning if an approximate Hessian is
0325     % explicitly given in the problem description, as in that case the user
0326     % seems to be aware of the issue.
0327     warning('manopt:getHessian:approx', ...
0328            ['No Hessian provided. Using FD approximation.\n' ...
0329             'To disable this warning: ' ...
0330             'warning(''off'', ''manopt:getHessian:approx'')']);
0331     problem.approxhess = approxhessianFD(problem);
0332 end
0333 
0334 % Set local defaults here
0335 localdefaults.verbosity = 2;
0336 localdefaults.maxtime = inf;
0337 localdefaults.miniter = 0;
0338 localdefaults.maxiter = 1000;
0339 localdefaults.rho_prime = 0.1;
0340 localdefaults.rho_regularization = 1e3;
0341 localdefaults.subproblemsolver = @trs_tCG_cached;
0342 localdefaults.tolgradnorm = 1e-6;
0343 
0344 % Merge global and local defaults, then merge w/ user options, if any.
0345 localdefaults = mergeOptions(getGlobalDefaults(), localdefaults);
0346 if ~exist('options', 'var') || isempty(options)
0347     options = struct();
0348 end
0349 options = mergeOptions(localdefaults, options);
0350 
0351 M = problem.M;
0352 
0353 % If no initial point x is given by the user, generate one at random.
0354 if ~exist('x', 'var') || isempty(x)
0355     x = M.rand();
0356 end
0357 
0358 % Set default Delta_bar and Delta0 separately to deal with additional
0359 % logic: if Delta_bar is provided but not Delta0, let Delta0 automatically
0360 % be some fraction of the provided Delta_bar.
0361 if ~isfield(options, 'Delta_bar')
0362     if isfield(M, 'typicaldist')
0363         options.Delta_bar = M.typicaldist();
0364     else
0365         options.Delta_bar = sqrt(M.dim());
0366     end 
0367 end
0368 if ~isfield(options, 'Delta0')
0369     options.Delta0 = options.Delta_bar / 8;
0370 end
0371 
0372 % Check some option values
0373 assert(options.rho_prime < 1/4, ...
0374         'options.rho_prime must be strictly smaller than 1/4.');
0375 assert(options.Delta_bar > 0, ...
0376         'options.Delta_bar must be positive.');
0377 assert(options.Delta0 > 0 && options.Delta0 <= options.Delta_bar, ...
0378         'options.Delta0 must be positive and smaller than Delta_bar.');
0379 
0380 % It is sometimes useful to check what the actual option values are.
0381 if options.verbosity >= 3
0382     disp(options);
0383 end
0384 
0385 % Create a store database and get a key for the current x
0386 storedb = StoreDB(options.storedepth);
0387 key = storedb.getNewKey();
0388 
0389 ticstart = tic();
0390 
0391 %% Initializations
0392 
0393 % k counts the outer (TR) iterations. The semantic is that k counts the
0394 % number of iterations fully executed so far.
0395 k = 0;
0396 
0397 % accept tracks if the proposed step is accepted (true) or declined (false)
0398 accept = true;
0399 
0400 % Initialize solution and companion measures: f(x), fgrad(x)
0401 [fx, fgradx] = getCostGrad(problem, x, storedb, key);
0402 norm_grad = M.norm(x, fgradx);
0403 
0404 % Initialize trust-region radius
0405 Delta = options.Delta0;
0406 
0407 % Depending on the subproblem solver, different kinds of statistics are
0408 % logged and displayed. This initial call to the solver tells us ahead of
0409 % time what to write in the column headers for displayed information, and
0410 % how to initialize the info struct-array.
0411 trsinfo = options.subproblemsolver([], [], options);
0412 
0413 % printheader is a string that contains the header for the subproblem
0414 % solver's printed output.
0415 printheader = trsinfo.printheader;
0416 
0417 % initstats is a struct of initial values for the stats that the subproblem
0418 % solver wishes to store.
0419 initstats = trsinfo.initstats;
0420 
0421 stats = savestats(problem, x, storedb, key, options, k, fx, norm_grad, ...
0422                                          Delta, ticstart, initstats);
0423 
0424 info(1) = stats;
0425 info(min(10000, options.maxiter+1)).iter = [];
0426 
0427 % Display headers, depending on verbosity level, then also the initial row.
0428 if options.verbosity == 2
0429     fprintf(['%3s %3s    iter   ', ...
0430              '%15scost val   %2sgrad. norm   %s\n'], ...
0431              '   ', '   ', '        ', '  ', printheader);
0432     fprintf(['%3s %3s   %5d   ', ...
0433              '%+.16e   %12e\n'], ...
0434              '   ', '   ', k, fx, norm_grad);
0435 elseif options.verbosity > 2
0436     fprintf(['%3s %3s    iter   ', ...
0437              '%15scost val   %2sgrad. norm   %10srho   %4srho_noreg   ' ...
0438              '%7sDelta   %s\n'], ...
0439              '   ', '   ', '        ', '  ', '         ', '   ', ...
0440              '       ', printheader);
0441     fprintf(['%3s %3s   %5d   ', ...
0442              '%+.16e   %12e\n'], ...
0443              '   ','   ',k, fx, norm_grad);
0444 end
0445 
0446 % To keep track of consecutive radius changes, so that we can warn the
0447 % user if it appears necessary.
0448 consecutive_TRplus = 0;
0449 consecutive_TRminus = 0;
0450 
0451 
0452 % **********************
0453 % ** Start of TR loop **
0454 % **********************
0455 while true
0456     
0457     % Start clock for this outer iteration
0458     ticstart = tic();
0459 
0460     % Apply the hook function if there is one: this allows external code to
0461     % move x to another point. If the point is changed (indicated by a true
0462     % value for the boolean 'hooked'), we update our knowledge about x.
0463     [x, key, info, hooked] = applyHook(problem, x, storedb, key, ...
0464                                                        options, info, k+1);
0465     if hooked
0466         [fx, fgradx] = getCostGrad(problem, x, storedb, key);
0467         norm_grad = M.norm(x, fgradx);
0468     end
0469     
0470     % Run standard stopping criterion checks
0471     [stop, reason] = stoppingcriterion(problem, x, options, info, k+1);
0472     
0473     % Ensure trustregions runs at least options.miniter iterations
0474     if k < options.miniter
0475         stop = 0;
0476     end
0477     
0478     if stop
0479         if options.verbosity >= 1
0480             fprintf([reason '\n']);
0481         end
0482         break;
0483     end
0484 
0485     if options.debug > 0
0486         fprintf([repmat('*', 1, 98) '\n']);
0487     end
0488 
0489     % *************************
0490     % ** Begin TR Subproblem **
0491     % *************************
0492   
0493     % Solve TR subproblem with solver specified by options.subproblemsolver
0494     trsinput = struct('x', x, 'fgradx', fgradx, 'Delta', Delta, ...
0495                       'accept', accept);
0496 
0497     trsoutput = options.subproblemsolver(problem, trsinput, options, ...
0498                                          storedb, key);
0499     
0500     eta = trsoutput.eta;
0501     Heta = trsoutput.Heta;
0502     limitedbyTR = trsoutput.limitedbyTR;
0503     trsprintstr = trsoutput.printstr;
0504     trsstats = trsoutput.stats;
0505 
0506         
0507     % This is computed for logging purposes and may be useful for some
0508     % user-defined stopping criteria.
0509     norm_eta = M.norm(x, eta);
0510     
0511     if options.debug > 0
0512         testangle = M.inner(x, eta, fgradx) / (norm_eta*norm_grad);
0513     end
0514     
0515 
0516     % Compute the tentative next iterate (the proposal)
0517     x_prop = M.retr(x, eta);
0518     key_prop = storedb.getNewKey();
0519 
0520     % Compute the function value of the proposal
0521     fx_prop = getCost(problem, x_prop, storedb, key_prop);
0522 
0523     % Will we accept the proposal or not?
0524     % Check the performance of the quadratic model against the actual cost.
0525     rhonum = fx - fx_prop;
0526     vecrho = M.lincomb(x, 1, fgradx, .5, Heta);
0527     rhoden = -M.inner(x, eta, vecrho);
0528     rho_noreg = rhonum/rhoden;
0529     % rhonum could be anything.
0530     % rhoden should be nonnegative, as guaranteed by tCG, barring
0531     % numerical errors.
0532     
0533     % Heuristic -- added Dec. 2, 2013 (NB) to replace the former heuristic.
0534     % This heuristic is documented in the book by Conn Gould and Toint on
0535     % trust-region methods, section 17.4.2.
0536     % rhonum measures the difference between two numbers. Close to
0537     % convergence, these two numbers are very close to each other, so
0538     % that computing their difference is numerically challenging: there may
0539     % be a significant loss in accuracy. Since the acceptance or rejection
0540     % of the step is conditioned on the ratio between rhonum and rhoden,
0541     % large errors in rhonum result in a very large error in rho, hence in
0542     % erratic acceptance / rejection. Meanwhile, close to convergence,
0543     % steps are usually trustworthy and we should transition to a Newton-
0544     % like method, with rho=1 consistently. The heuristic thus shifts both
0545     % rhonum and rhoden by a small amount such that far from convergence,
0546     % the shift is irrelevant and close to convergence, the ratio rho goes
0547     % to 1, effectively promoting acceptance of the step.
0548     % The rationale is that close to convergence, both rhonum and rhoden
0549     % are quadratic in the distance between x and x_prop. Thus, when this
0550     % distance is on the order of sqrt(eps), the value of rhonum and rhoden
0551     % is on the order of eps, which is indistinguishable from the numerical
0552     % error, resulting in badly estimated rho's.
0553     % For abs(fx) < 1, this heuristic is invariant under offsets of f but
0554     % not under scaling of f. For abs(fx) > 1, the opposite holds. This
0555     % should not alarm us, as this heuristic only triggers at the very last
0556     % iterations if very fine convergence is demanded.
0557     rho_reg_offset = max(1, abs(fx)) * eps * options.rho_regularization;
0558     rhonum = rhonum + rho_reg_offset;
0559     rhoden = rhoden + rho_reg_offset;
0560    
0561     if options.debug > 0
0562         fprintf('DBG:     rhonum : %e\n', rhonum);
0563         fprintf('DBG:     rhoden : %e\n', rhoden);
0564     end
0565     
0566     % This is always true if a linear, symmetric operator is used for the
0567     % Hessian (approximation) and if we had infinite numerical precision.
0568     % In practice, nonlinear approximations of the Hessian such as the
0569     % built-in finite difference approximation and finite numerical
0570     % accuracy can cause the model to increase. In such scenarios, we
0571     % decide to force a rejection of the step and a reduction of the
0572     % trust-region radius. We test the sign of the regularized rhoden since
0573     % the regularization is supposed to capture the accuracy to which
0574     % rhoden is computed: if rhoden were negative before regularization but
0575     % not after, that should not be (and is not) detected as a failure.
0576     %
0577     % Note (Feb. 17, 2015, NB): the most recent version of trs_tCG already
0578     % includes a mechanism to ensure model decrease if the Cauchy step
0579     % attained a decrease (which is theoretically the case under very lax
0580     % assumptions). This being said, it is always possible that numerical
0581     % errors will prevent this, so that it is good to keep a safeguard.
0582     %
0583     % The current strategy is that, if this should happen, then we reject
0584     % the step and reduce the trust region radius. This also ensures that
0585     % the actual cost values are monotonically decreasing.
0586     %
0587     % [This bit of code seems not to trigger since trs_tCG already ensures
0588     %  the model decreases even in the presence of non-linearities; but as
0589     %  a result the radius is not necessarily decreased. Perhaps we should
0590     %  change this with the proposed commented line below; needs testing.]
0591     %
0592     model_decreased = (rhoden >= 0);
0593     
0594     if ~model_decreased
0595         trsprintstr = [trsprintstr ', model did not decrease']; %#ok<AGROW>
0596     end
0597     rho = rhonum / rhoden;
0598     
0599     % Added June 30, 2015 following observation by BM.
0600     % With this modification, it is guaranteed that a step rejection is
0601     % always accompanied by a TR reduction. This prevents stagnation in
0602     % this "corner case" (NaN's really aren't supposed to occur, but it's
0603     % nice if we can handle them nonetheless).
0604     if isnan(rho)
0605         fprintf(['rho is NaN! Forcing a radius decrease. ' ...
0606                  'This should not happen.\n']);
0607         if isnan(fx_prop)
0608             fprintf(['The cost function returned NaN (perhaps the ' ...
0609                      'retraction returned a bad point?)\n']);
0610         else
0611             fprintf('The cost function did not return a NaN value.\n');
0612         end
0613     end
0614    
0615     if options.debug > 0
0616         m = @(x, eta) ...
0617           getCost(problem, x, storedb, key) + ...
0618           getDirectionalDerivative(problem, x, eta, storedb, key) + ...
0619              .5*M.inner(x, getHessian(problem, x, eta, storedb, key), eta);
0620         zerovec = M.zerovec(x);
0621         actrho = (fx - fx_prop) / (m(x, zerovec) - m(x, eta));
0622         fprintf('DBG:   new f(x) : %+e\n', fx_prop);
0623         fprintf('DBG: actual rho : %e\n', actrho);
0624         fprintf('DBG:   used rho : %e\n', rho);
0625     end
0626 
0627     % Choose the new TR radius based on the model performance
0628     trstr = '   ';
0629     % If the actual decrease is smaller than 1/4 of the predicted decrease,
0630     % then reduce the TR radius.
0631     if rho < 1/4 || ~model_decreased || isnan(rho)
0632         trstr = 'TR-';
0633         Delta = Delta/4;
0634         consecutive_TRplus = 0;
0635         consecutive_TRminus = consecutive_TRminus + 1;
0636         if consecutive_TRminus >= 5 && options.verbosity >= 2
0637             consecutive_TRminus = -inf;
0638             fprintf([' +++ Detected many consecutive TR- (radius ' ...
0639                      'decreases).\n' ...
0640                      ' +++ Consider dividing options.Delta_bar by 10.\n' ...
0641                      ' +++ Current values: options.Delta_bar = %g and ' ...
0642                      'options.Delta0 = %g.\n'], options.Delta_bar, ...
0643                                                 options.Delta0);
0644         end
0645     % If the actual decrease is at least 3/4 of the predicted decrease and
0646     % the trs_tCG (inner solve) hit the TR boundary, increase TR radius.
0647     % We also keep track of the number of consecutive trust-region radius
0648     % increases. If there are many, this may indicate the need to adapt the
0649     % initial and maximum radii.
0650     elseif rho > 3/4 && limitedbyTR
0651         trstr = 'TR+';
0652         Delta = min(2*Delta, options.Delta_bar);
0653         consecutive_TRminus = 0;
0654         consecutive_TRplus = consecutive_TRplus + 1;
0655         if consecutive_TRplus >= 5 && options.verbosity >= 1
0656             consecutive_TRplus = -inf;
0657             fprintf([' +++ Detected many consecutive TR+ (radius ' ...
0658                      'increases).\n' ...
0659                      ' +++ Consider multiplying options.Delta_bar by 10.\n' ...
0660                      ' +++ Current values: options.Delta_bar = %g and ' ...
0661                      'options.Delta0 = %g.\n'], options.Delta_bar, ...
0662                                                 options.Delta0);
0663         end
0664     else
0665         % Otherwise, keep the TR radius constant.
0666         consecutive_TRplus = 0;
0667         consecutive_TRminus = 0;
0668     end
0669 
0670     % Choose to accept or reject the proposed step based on the model
0671     % performance. Note the strict inequality.
0672     if model_decreased && rho > options.rho_prime
0673         
0674         % April 17, 2018: a side effect of rho_regularization > 0 is that
0675         % it can happen that the cost function appears to go up (although
0676         % only by a small amount) for some accepted steps. We decide to
0677         % accept this because, numerically, computing the difference
0678         % between fx_prop and fx is more difficult than computing the
0679         % improvement in the model, because fx_prop and fx are on the same
0680         % order of magnitude yet are separated by a very small gap near
0681         % convergence, whereas the model improvement is computed as a sum
0682         % of two small terms. As a result, the step which seems bad may
0683         % turn out to be good, in that it may help reduce the gradient norm
0684         % for example. This update merely informs the user of this event.
0685         % In further updates, we could also introduce this as a stopping
0686         % criterion. It is then important to choose wisely which of x or
0687         % x_prop should be returned (perhaps the one with smallest
0688         % gradient?)
0689         if fx_prop > fx && options.verbosity >= 2
0690             fprintf(['Between line above and below, cost function ' ...
0691                      'increased by %.2g (step size: %.2g)\n'], ...
0692                      fx_prop - fx, norm_eta);
0693         end
0694         
0695         accept = true;
0696         accstr = 'acc';
0697         % We accept the step: no need to keep the old cache.
0698         storedb.removefirstifdifferent(key, key_prop);
0699         x = x_prop;
0700         key = key_prop;
0701         fx = fx_prop;
0702         fgradx = getGradient(problem, x, storedb, key);
0703         norm_grad = M.norm(x, fgradx);
0704     else
0705         % We reject the step: no need to keep cache related to the
0706         % tentative step.
0707         storedb.removefirstifdifferent(key_prop, key);
0708         accept = false;
0709         accstr = 'REJ';
0710     end
0711     
0712     % k is the number of iterations we have accomplished.
0713     k = k + 1;
0714     
0715     % Make sure we don't use too much memory for the store database.
0716     storedb.purge();
0717 
0718     % Log statistics for freshly executed iteration.
0719     % Everything after this in the loop is not accounted for in the timing.
0720     stats = savestats(problem, x, storedb, key, options, k, fx, ...
0721                       norm_grad, Delta, ticstart, trsstats, ...
0722                       info, rho, rhonum, rhoden, accept, norm_eta, ...
0723                       limitedbyTR);
0724     info(k+1) = stats;
0725 
0726     % Display
0727     if options.verbosity == 2
0728         fprintf('%3s %3s   %5d   %+.16e   %12e   %s\n', ...
0729                 accstr, trstr, k, fx, norm_grad, trsprintstr);
0730     elseif options.verbosity > 2
0731         fprintf(['%3s %3s   %5d   %+.16e   %.6e   %+.6e   ' ...
0732                  '%+.6e   %.6e   %s\n'], ...
0733                 accstr, trstr, k, fx, norm_grad, rho, ...
0734                 rho_noreg, Delta, trsprintstr);
0735         if options.debug > 0
0736             fprintf('      Delta : %f          |eta| : %e\n', ...
0737                     Delta, norm_eta);
0738         end
0739     end
0740     if options.debug > 0
0741         fprintf('DBG: cos ang(eta, gradf): %d\n', testangle);
0742         if rho == 0
0743             fprintf('DBG: rho = 0: likely to hinder convergence.\n');
0744         end
0745     end
0746 
0747 end  % of TR loop (counter: k)
0748 
0749 % Restrict info struct-array to useful part
0750 info = info(1:k+1);
0751 
0752 
0753 if options.debug > 0
0754    fprintf([repmat('*', 1, 98) '\n']);
0755 end
0756 if options.verbosity > 0
0757     fprintf('Total time is %f [s] (excludes statsfun)\n', info(end).time);
0758 end
0759 
0760 % Return the best cost reached
0761 cost = fx;
0762 
0763 end
0764 
0765 
0766 
0767     
0768 
0769 % Routine in charge of collecting the current iteration stats
0770 function stats = savestats(problem, x, storedb, key, options, k, fx, ...
0771                            norm_grad, Delta, ticstart, trsstats, info, rho, ...
0772                            rhonum, rhoden, accept, norm_eta, limitedbyTR)
0773     stats.iter = k;
0774     stats.cost = fx;
0775     stats.gradnorm = norm_grad;
0776     stats.Delta = Delta;
0777     if k == 0
0778         stats.time = toc(ticstart);
0779         stats.rho = inf;
0780         stats.rhonum = NaN;
0781         stats.rhoden = NaN;
0782         stats.accepted = true;
0783         stats.stepsize = NaN;
0784         stats.limitedbyTR = false;
0785         fields = fieldnames(trsstats);
0786         for i = 1 : length(fields)
0787             stats.(fields{i}) = trsstats.(fields{i});
0788         end
0789     else
0790         stats.time = info(k).time + toc(ticstart);
0791         stats.rho = rho;
0792         stats.rhonum = rhonum;
0793         stats.rhoden = rhoden;
0794         stats.accepted = accept;
0795         stats.stepsize = norm_eta;
0796         stats.limitedbyTR = limitedbyTR;
0797         fields = fieldnames(trsstats);
0798         for i = 1 : length(fields)
0799             stats.(fields{i}) = trsstats.(fields{i});
0800         end
0801     end
0802     
0803     % See comment about statsfun above: the x and store passed to statsfun
0804     % are that of the most recently accepted point after the iteration
0805     % fully executed.
0806     stats = applyStatsfun(problem, x, storedb, key, options, stats);
0807     
0808 end

Generated on Fri 30-Sep-2022 13:18:25 by m2html © 2005