Riemannian trust-regions solver for optimization on manifolds. function [x, cost, info, options] = trustregions(problem) function [x, cost, info, options] = trustregions(problem, x0) function [x, cost, info, options] = trustregions(problem, x0, options) function [x, cost, info, options] = trustregions(problem, [], options) This is the Riemannian Trust-Region solver for Manopt, named RTR. This solver tries to minimize the cost function described in the problem structure. It requires the availability of the cost function and of its gradient. It issues calls for the Hessian. If no Hessian nor approximation for it is provided, an approximation of Hessian-vector products is computed with finite differences of gradients. If no gradient is provided, an approximation of the gradient is computed, but this can be slow for manifolds of high dimension. At each iteration a subproblem is solved using a trust-region subproblem (TRS) solver. The default is @trs_tCG_cached. This one (and some others) use the preconditioner if one is supplied. For a description of the algorithm and theorems offering convergence guarantees, see the references below. Documentation for this solver is available online, but may be outdated. http://www.manopt.org/solver_documentation_trustregions.html The initial iterate is x0 if it is provided. Otherwise, a random point on the manifold is picked. To specify options whilst not specifying an initial iterate, give x0 as [] (the empty matrix). The two outputs 'x' and 'cost' are the last reached point on the manifold and its cost. Notice that x is not necessarily the best reached point, because this solver is not forced to be a descent method. In particular, very close to convergence, it is sometimes preferable to accept very slight increases in the cost value (on the order of the machine epsilon) in the process of reaching fine convergence. Other than that, the cost function value does decrease monotonically with iterations. The output 'info' is a struct-array which contains information about the iterations: iter (integer) The (outer) iteration number, or number of steps considered (whether accepted or rejected). The initial guess is 0. cost (double) The corresponding cost value. gradnorm (double) The (Riemannian) norm of the gradient. time (double) The total elapsed time in seconds to reach the corresponding cost. rho (double) The performance ratio for the iterate. rhonum, rhoden (double) Regularized numerator and denominator of the performance ratio: rho = rhonum/rhoden. See options.rho_regularization. accepted (boolean) Whether the proposed iterate was accepted or not. stepsize (double) The (Riemannian) norm of the vector returned by the inner solver and which is retracted to obtain the proposed next iterate. If accepted = true for the corresponding iterate, this is the size of the step from the previous to the new iterate. If accepted is false, the step was not executed and this is the size of the rejected step. Delta (double) The trust-region radius at the outer iteration. limitedbyTR (boolean) true if the subproblemsolver was limited by the trust-region radius (a boundary solution was returned). And possibly additional information logged by the subproblemsolver or by options.statsfun. For example, type [info.gradnorm] to obtain a vector of the successive gradient norms reached at each (outer) iteration. The options structure is used to overwrite the default values. All options have a default value and are hence optional. To force an option value, pass an options structure with a field options.optionname, where optionname is one of the following and the default value is indicated between parentheses: tolgradnorm (1e-6) The algorithm terminates if the norm of the gradient drops below this. For well-scaled problems, a rule of thumb is that you can expect to reduce the gradient norm by 8 orders of magnitude (sqrt(eps)) compared to the gradient norm at a "typical" point (a rough initial iterate for example). Further decrease is sometimes possible, but inexact floating point arithmetic limits the final accuracy. If tolgradnorm is set too low, the algorithm may end up iterating forever (or until another stopping criterion triggers). maxiter (1000) The algorithm terminates after at most maxiter (outer) iterations. maxtime (Inf) The algorithm terminates if maxtime seconds elapsed. miniter (0) Minimum number of outer iterations: this overrides all other stopping criteria. Can be helpful to escape saddle points. Delta_bar (problem.M.typicaldist() or sqrt(problem.M.dim())) Maximum trust-region radius. If you specify this parameter but not Delta0, then Delta0 is set to 1/8 times this parameter. Delta0 (Delta_bar/8) Initial trust-region radius. If you observe a long plateau at the beginning of the convergence plot (gradient norm vs iteration), it may pay off to try to tune this parameter to shorten the plateau. You should not set this parameter without setting Delta_bar too (at a larger value). subproblemsolver (@trs_tCG_cached) Function handle to a subproblem solver. The subproblem solver also sees this options structure, so that parameters can be passed to it through here as well. Built-in solvers include: trs_tCG_cached trs_tCG trs_gep Note that trs_gep solves the subproblem exactly which may be slow. It is included mainly for prototyping or for solving the subproblem exactly in low dimensional subspaces. rho_prime (0.1) Accept/reject threshold : if rho is at least rho_prime, the outer iteration is accepted. Otherwise, it is rejected. In case it is rejected, the trust-region radius will have been decreased. To ensure this, rho_prime >= 0 must be strictly smaller than 1/4. If rho_prime is negative, the algorithm is not guaranteed to produce monotonically decreasing cost values. It is strongly recommended to set rho_prime > 0, to aid convergence. rho_regularization (1e3) Close to convergence, evaluating the performance ratio rho is numerically challenging. Meanwhile, close to convergence, the quadratic model should be a good fit and the steps should be accepted. Regularization lets rho go to 1 as the model decrease and the actual decrease go to zero. Set this option to zero to disable regularization (not recommended). See in-code for the specifics. When this is not zero, it may happen that the iterates produced are not monotonically improving the cost when very close to convergence. This is because the corrected cost improvement could change sign if it is negative but very small. statsfun (none) Function handle to a function that is called after each iteration to provide the opportunity to log additional statistics. They are returned in the info struct. See the generic Manopt documentation about solvers for further information. statsfun is called with the point x that was reached last, after the accept/reject decision. See comment below. stopfun (none) Function handle to a function that is called at each iteration to provide the opportunity to specify additional stopping criteria. See the generic Manopt documentation about solvers for further information. verbosity (2) Integer number used to tune the amount of output the algorithm logs during execution (mostly as text in the command window). The higher, the more output. 0 means silent. 3 and above includes a display of the options structure at the beginning of the execution. debug (false) Set to true to allow the algorithm to perform additional computations for debugging purposes. If a debugging test fails, you will be informed of it, usually via the command window. Be aware that these additional computations appear in the algorithm timings too, and may interfere with operations such as counting the number of cost evaluations, etc. The debug calls get storedb too. storedepth (2) Maximum number of different points x of the manifold for which a store structure may be kept in memory in the storedb for caching. If memory usage is an issue, you may try to lower this number. Profiling or manopt counters may then help to investigate if a performance hit was incurred as a result. hook (none) A function handle which allows the user to change the current point x at the beginning of each iteration, before the stopping criterion is evaluated. See applyHook for help on how to use this option. Notice that statsfun is called with the point x that was reached last, after the accept/reject decision. Hence: if the step was accepted, we get that new x, with a store which only saw the call for the cost and for the gradient. If the step was rejected, we get the same x as previously, with the store structure containing everything that was computed at that point (possibly including previous rejects at that same point). Hence, statsfun should not be used in conjunction with the store to count operations for example. Instead, you should use manopt counters: see statscounters. Please cite the Manopt paper as well as the research paper: @Article{genrtr, Title = {Trust-region methods on {Riemannian} manifolds}, Author = {Absil, P.-A. and Baker, C. G. and Gallivan, K. A.}, Journal = {Foundations of Computational Mathematics}, Year = {2007}, Number = {3}, Pages = {303--330}, Volume = {7}, Doi = {10.1007/s10208-005-0179-9} } See also: steepestdescent conjugategradient manopt/examples
0001 function [x, cost, info, options] = trustregions(problem, x, options) 0002 % Riemannian trust-regions solver for optimization on manifolds. 0003 % 0004 % function [x, cost, info, options] = trustregions(problem) 0005 % function [x, cost, info, options] = trustregions(problem, x0) 0006 % function [x, cost, info, options] = trustregions(problem, x0, options) 0007 % function [x, cost, info, options] = trustregions(problem, [], options) 0008 % 0009 % This is the Riemannian Trust-Region solver for Manopt, named RTR. 0010 % This solver tries to minimize the cost function described in the problem 0011 % structure. It requires the availability of the cost function and of its 0012 % gradient. It issues calls for the Hessian. 0013 % 0014 % If no Hessian nor approximation for it is provided, an approximation of 0015 % Hessian-vector products is computed with finite differences of gradients. 0016 % 0017 % If no gradient is provided, an approximation of the gradient is computed, 0018 % but this can be slow for manifolds of high dimension. 0019 % 0020 % At each iteration a subproblem is solved using a trust-region subproblem 0021 % (TRS) solver. The default is @trs_tCG_cached. This one (and some others) 0022 % use the preconditioner if one is supplied. 0023 % 0024 % For a description of the algorithm and theorems offering convergence 0025 % guarantees, see the references below. Documentation for this solver is 0026 % available online, but may be outdated. 0027 % http://www.manopt.org/solver_documentation_trustregions.html 0028 % 0029 % 0030 % The initial iterate is x0 if it is provided. Otherwise, a random point on 0031 % the manifold is picked. To specify options whilst not specifying an 0032 % initial iterate, give x0 as [] (the empty matrix). 0033 % 0034 % The two outputs 'x' and 'cost' are the last reached point on the manifold 0035 % and its cost. Notice that x is not necessarily the best reached point, 0036 % because this solver is not forced to be a descent method. In particular, 0037 % very close to convergence, it is sometimes preferable to accept very 0038 % slight increases in the cost value (on the order of the machine epsilon) 0039 % in the process of reaching fine convergence. Other than that, the cost 0040 % function value does decrease monotonically with iterations. 0041 % 0042 % The output 'info' is a struct-array which contains information about the 0043 % iterations: 0044 % iter (integer) 0045 % The (outer) iteration number, or number of steps considered 0046 % (whether accepted or rejected). The initial guess is 0. 0047 % cost (double) 0048 % The corresponding cost value. 0049 % gradnorm (double) 0050 % The (Riemannian) norm of the gradient. 0051 % time (double) 0052 % The total elapsed time in seconds to reach the corresponding cost. 0053 % rho (double) 0054 % The performance ratio for the iterate. 0055 % rhonum, rhoden (double) 0056 % Regularized numerator and denominator of the performance ratio: 0057 % rho = rhonum/rhoden. See options.rho_regularization. 0058 % accepted (boolean) 0059 % Whether the proposed iterate was accepted or not. 0060 % stepsize (double) 0061 % The (Riemannian) norm of the vector returned by the inner solver 0062 % and which is retracted to obtain the proposed next iterate. If 0063 % accepted = true for the corresponding iterate, this is the size of 0064 % the step from the previous to the new iterate. If accepted is 0065 % false, the step was not executed and this is the size of the 0066 % rejected step. 0067 % Delta (double) 0068 % The trust-region radius at the outer iteration. 0069 % limitedbyTR (boolean) 0070 % true if the subproblemsolver was limited by the trust-region 0071 % radius (a boundary solution was returned). 0072 % And possibly additional information logged by the subproblemsolver or 0073 % by options.statsfun. 0074 % For example, type [info.gradnorm] to obtain a vector of the successive 0075 % gradient norms reached at each (outer) iteration. 0076 % 0077 % The options structure is used to overwrite the default values. All 0078 % options have a default value and are hence optional. To force an option 0079 % value, pass an options structure with a field options.optionname, where 0080 % optionname is one of the following and the default value is indicated 0081 % between parentheses: 0082 % 0083 % tolgradnorm (1e-6) 0084 % The algorithm terminates if the norm of the gradient drops below 0085 % this. For well-scaled problems, a rule of thumb is that you can 0086 % expect to reduce the gradient norm by 8 orders of magnitude 0087 % (sqrt(eps)) compared to the gradient norm at a "typical" point (a 0088 % rough initial iterate for example). Further decrease is sometimes 0089 % possible, but inexact floating point arithmetic limits the final 0090 % accuracy. If tolgradnorm is set too low, the algorithm may end up 0091 % iterating forever (or until another stopping criterion triggers). 0092 % maxiter (1000) 0093 % The algorithm terminates after at most maxiter (outer) iterations. 0094 % maxtime (Inf) 0095 % The algorithm terminates if maxtime seconds elapsed. 0096 % miniter (0) 0097 % Minimum number of outer iterations: this overrides all other 0098 % stopping criteria. Can be helpful to escape saddle points. 0099 % Delta_bar (problem.M.typicaldist() or sqrt(problem.M.dim())) 0100 % Maximum trust-region radius. If you specify this parameter but not 0101 % Delta0, then Delta0 is set to 1/8 times this parameter. 0102 % Delta0 (Delta_bar/8) 0103 % Initial trust-region radius. If you observe a long plateau at the 0104 % beginning of the convergence plot (gradient norm vs iteration), it 0105 % may pay off to try to tune this parameter to shorten the plateau. 0106 % You should not set this parameter without setting Delta_bar too (at 0107 % a larger value). 0108 % subproblemsolver (@trs_tCG_cached) 0109 % Function handle to a subproblem solver. The subproblem solver also 0110 % sees this options structure, so that parameters can be passed to it 0111 % through here as well. Built-in solvers include: 0112 % trs_tCG_cached 0113 % trs_tCG 0114 % trs_gep 0115 % Note that trs_gep solves the subproblem exactly which may be slow. 0116 % It is included mainly for prototyping or for solving the subproblem 0117 % exactly in low dimensional subspaces. 0118 % rho_prime (0.1) 0119 % Accept/reject threshold : if rho is at least rho_prime, the outer 0120 % iteration is accepted. Otherwise, it is rejected. In case it is 0121 % rejected, the trust-region radius will have been decreased. 0122 % To ensure this, rho_prime >= 0 must be strictly smaller than 1/4. 0123 % If rho_prime is negative, the algorithm is not guaranteed to 0124 % produce monotonically decreasing cost values. It is strongly 0125 % recommended to set rho_prime > 0, to aid convergence. 0126 % rho_regularization (1e3) 0127 % Close to convergence, evaluating the performance ratio rho is 0128 % numerically challenging. Meanwhile, close to convergence, the 0129 % quadratic model should be a good fit and the steps should be 0130 % accepted. Regularization lets rho go to 1 as the model decrease and 0131 % the actual decrease go to zero. Set this option to zero to disable 0132 % regularization (not recommended). See in-code for the specifics. 0133 % When this is not zero, it may happen that the iterates produced are 0134 % not monotonically improving the cost when very close to 0135 % convergence. This is because the corrected cost improvement could 0136 % change sign if it is negative but very small. 0137 % statsfun (none) 0138 % Function handle to a function that is called after each iteration 0139 % to provide the opportunity to log additional statistics. 0140 % They are returned in the info struct. See the generic Manopt 0141 % documentation about solvers for further information. statsfun is 0142 % called with the point x that was reached last, after the 0143 % accept/reject decision. See comment below. 0144 % stopfun (none) 0145 % Function handle to a function that is called at each iteration to 0146 % provide the opportunity to specify additional stopping criteria. 0147 % See the generic Manopt documentation about solvers for further 0148 % information. 0149 % verbosity (2) 0150 % Integer number used to tune the amount of output the algorithm logs 0151 % during execution (mostly as text in the command window). 0152 % The higher, the more output. 0 means silent. 3 and above includes a 0153 % display of the options structure at the beginning of the execution. 0154 % debug (false) 0155 % Set to true to allow the algorithm to perform additional 0156 % computations for debugging purposes. If a debugging test fails, you 0157 % will be informed of it, usually via the command window. Be aware 0158 % that these additional computations appear in the algorithm timings 0159 % too, and may interfere with operations such as counting the number 0160 % of cost evaluations, etc. The debug calls get storedb too. 0161 % storedepth (2) 0162 % Maximum number of different points x of the manifold for which a 0163 % store structure may be kept in memory in the storedb for caching. 0164 % If memory usage is an issue, you may try to lower this number. 0165 % Profiling or manopt counters may then help to investigate if a 0166 % performance hit was incurred as a result. 0167 % hook (none) 0168 % A function handle which allows the user to change the current point 0169 % x at the beginning of each iteration, before the stopping criterion 0170 % is evaluated. See applyHook for help on how to use this option. 0171 % 0172 % Notice that statsfun is called with the point x that was reached last, 0173 % after the accept/reject decision. Hence: if the step was accepted, we get 0174 % that new x, with a store which only saw the call for the cost and for the 0175 % gradient. If the step was rejected, we get the same x as previously, with 0176 % the store structure containing everything that was computed at that point 0177 % (possibly including previous rejects at that same point). Hence, statsfun 0178 % should not be used in conjunction with the store to count operations for 0179 % example. Instead, you should use manopt counters: see statscounters. 0180 % 0181 % 0182 % Please cite the Manopt paper as well as the research paper: 0183 % @Article{genrtr, 0184 % Title = {Trust-region methods on {Riemannian} manifolds}, 0185 % Author = {Absil, P.-A. and Baker, C. G. and Gallivan, K. A.}, 0186 % Journal = {Foundations of Computational Mathematics}, 0187 % Year = {2007}, 0188 % Number = {3}, 0189 % Pages = {303--330}, 0190 % Volume = {7}, 0191 % Doi = {10.1007/s10208-005-0179-9} 0192 % } 0193 % 0194 % See also: steepestdescent conjugategradient manopt/examples 0195 0196 % An explicit, general listing of this algorithm, with preconditioning, 0197 % can be found in the following paper: 0198 % @Article{boumal2015lowrank, 0199 % Title = {Low-rank matrix completion via preconditioned optimization on the {G}rassmann manifold}, 0200 % Author = {Boumal, N. and Absil, P.-A.}, 0201 % Journal = {Linear Algebra and its Applications}, 0202 % Year = {2015}, 0203 % Pages = {200--239}, 0204 % Volume = {475}, 0205 % Doi = {10.1016/j.laa.2015.02.027}, 0206 % } 0207 0208 % When the Hessian is not specified, it is approximated with 0209 % finite-differences of the gradient. The resulting method is called 0210 % RTR-FD. Some convergence theory for it is available in this paper: 0211 % @incollection{boumal2015rtrfd 0212 % author={Boumal, N.}, 0213 % title={Riemannian trust regions with finite-difference Hessian approximations are globally convergent}, 0214 % year={2015}, 0215 % booktitle={Geometric Science of Information} 0216 % } 0217 0218 0219 % This file is part of Manopt: www.manopt.org. 0220 % This code is an adaptation to Manopt of the original GenRTR code: 0221 % RTR - Riemannian Trust-Region 0222 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan 0223 % Florida State University 0224 % School of Computational Science 0225 % (http://www.math.fsu.edu/~cbaker/GenRTR/?page=download) 0226 % See accompanying license file. 0227 % The adaptation was executed by Nicolas Boumal. 0228 % 0229 % 0230 % Change log: 0231 % 0232 % NB April 3, 2013: 0233 % tCG now returns the Hessian along the returned direction eta, so 0234 % that we do not compute that Hessian redundantly: some savings at 0235 % each iteration. Similarly, if the useRand flag is on, we spare an 0236 % extra Hessian computation at each outer iteration too, owing to 0237 % some modifications in the Cauchy point section of the code specific 0238 % to useRand = true. 0239 % 0240 % NB Aug. 22, 2013: 0241 % This function is now Octave compatible. The transition called for 0242 % two changes which would otherwise not be advisable. (1) tic/toc is 0243 % now used as is, as opposed to the safer way: 0244 % t = tic(); elapsed = toc(t); 0245 % And (2), the (formerly inner) function savestats was moved outside 0246 % the main function to not be nested anymore. This is arguably less 0247 % elegant, but Octave does not (and likely will not) support nested 0248 % functions. 0249 % 0250 % NB Dec. 2, 2013: 0251 % The in-code documentation was largely revised and expanded. 0252 % 0253 % NB Dec. 2, 2013: 0254 % The former heuristic which triggered when rhonum was very small and 0255 % forced rho = 1 has been replaced by a smoother heuristic which 0256 % consists in regularizing rhonum and rhoden before computing their 0257 % ratio. It is tunable via options.rho_regularization. Furthermore, 0258 % the solver now detects if tCG did not obtain a model decrease 0259 % (which is theoretically impossible but may happen because of 0260 % numerical errors and/or because of a nonlinear/nonsymmetric Hessian 0261 % operator, which is the case for finite difference approximations). 0262 % When such an anomaly is detected, the step is rejected and the 0263 % trust region radius is decreased. 0264 % Feb. 18, 2015 note: this is less useful now, as tCG now guarantees 0265 % model decrease even for the finite difference approximation of the 0266 % Hessian. It is still useful in case of numerical errors, but this 0267 % is less stringent. 0268 % 0269 % NB Dec. 3, 2013: 0270 % The stepsize is now registered at each iteration, at a small 0271 % additional cost. The defaults for Delta_bar and Delta0 are better 0272 % defined. Setting Delta_bar in the options will automatically set 0273 % Delta0 accordingly. In Manopt 1.0.4, the defaults for these options 0274 % were not treated appropriately because of an incorrect use of the 0275 % isfield() built-in function. 0276 % 0277 % NB Feb. 18, 2015: 0278 % Added some comments. Also, Octave now supports safe tic/toc usage, 0279 % so we reverted the changes to use that again (see Aug. 22, 2013 log 0280 % entry). 0281 % 0282 % NB April 3, 2015: 0283 % Works with the new StoreDB class system. 0284 % 0285 % NB April 8, 2015: 0286 % No Hessian warning if approximate Hessian explicitly available. 0287 % 0288 % NB Nov. 1, 2016: 0289 % Now uses approximate gradient via finite differences if need be. 0290 % 0291 % NB Aug. 2, 2018: 0292 % Using storedb.remove() to keep the cache lean, which allowed to 0293 % reduce storedepth to 2 from 20 (by default). 0294 % 0295 % NB July 19, 2020: 0296 % Added support for options.hook. 0297 % 0298 % VL Aug. 17, 2022: 0299 % Refactored code to use various subproblem solvers with a new input 0300 % output pattern. Modified how information about iterations is 0301 % printed to accomodate new subproblem solvers. Moved all useRand and 0302 % cauchy logic to trs_tCG. Options pertaining to tCG are still 0303 % available but have moved to that file. Made trs_tCG_cached default. 0304 0305 0306 % Verify that the problem description is sufficient for the solver. 0307 0308 if ~canGetCost(problem) 0309 warning('manopt:getCost', ... 0310 'No cost provided. The algorithm will likely abort.'); 0311 end 0312 if ~canGetGradient(problem) && ~canGetApproxGradient(problem) 0313 % Note: we do not give a warning if an approximate gradient is 0314 % explicitly given in the problem description, as in that case the user 0315 % seems to be aware of the issue. 0316 warning('manopt:getGradient:approx', ... 0317 ['No gradient provided. Using FD approximation (slow).\n' ... 0318 'It may be necessary to increase options.tolgradnorm.\n' ... 0319 'To disable this warning: ' ... 0320 'warning(''off'', ''manopt:getGradient:approx'')']); 0321 problem.approxgrad = approxgradientFD(problem); 0322 end 0323 if ~canGetHessian(problem) && ~canGetApproxHessian(problem) 0324 % Note: we do not give a warning if an approximate Hessian is 0325 % explicitly given in the problem description, as in that case the user 0326 % seems to be aware of the issue. 0327 warning('manopt:getHessian:approx', ... 0328 ['No Hessian provided. Using FD approximation.\n' ... 0329 'To disable this warning: ' ... 0330 'warning(''off'', ''manopt:getHessian:approx'')']); 0331 problem.approxhess = approxhessianFD(problem); 0332 end 0333 0334 % Set local defaults here 0335 localdefaults.verbosity = 2; 0336 localdefaults.maxtime = inf; 0337 localdefaults.miniter = 0; 0338 localdefaults.maxiter = 1000; 0339 localdefaults.rho_prime = 0.1; 0340 localdefaults.rho_regularization = 1e3; 0341 localdefaults.subproblemsolver = @trs_tCG_cached; 0342 localdefaults.tolgradnorm = 1e-6; 0343 0344 % Merge global and local defaults, then merge w/ user options, if any. 0345 localdefaults = mergeOptions(getGlobalDefaults(), localdefaults); 0346 if ~exist('options', 'var') || isempty(options) 0347 options = struct(); 0348 end 0349 options = mergeOptions(localdefaults, options); 0350 0351 M = problem.M; 0352 0353 % If no initial point x is given by the user, generate one at random. 0354 if ~exist('x', 'var') || isempty(x) 0355 x = M.rand(); 0356 end 0357 0358 % Set default Delta_bar and Delta0 separately to deal with additional 0359 % logic: if Delta_bar is provided but not Delta0, let Delta0 automatically 0360 % be some fraction of the provided Delta_bar. 0361 if ~isfield(options, 'Delta_bar') 0362 if isfield(M, 'typicaldist') 0363 options.Delta_bar = M.typicaldist(); 0364 else 0365 options.Delta_bar = sqrt(M.dim()); 0366 end 0367 end 0368 if ~isfield(options, 'Delta0') 0369 options.Delta0 = options.Delta_bar / 8; 0370 end 0371 0372 % Check some option values 0373 assert(options.rho_prime < 1/4, ... 0374 'options.rho_prime must be strictly smaller than 1/4.'); 0375 assert(options.Delta_bar > 0, ... 0376 'options.Delta_bar must be positive.'); 0377 assert(options.Delta0 > 0 && options.Delta0 <= options.Delta_bar, ... 0378 'options.Delta0 must be positive and smaller than Delta_bar.'); 0379 0380 % It is sometimes useful to check what the actual option values are. 0381 if options.verbosity >= 3 0382 disp(options); 0383 end 0384 0385 % Create a store database and get a key for the current x 0386 storedb = StoreDB(options.storedepth); 0387 key = storedb.getNewKey(); 0388 0389 ticstart = tic(); 0390 0391 %% Initializations 0392 0393 % k counts the outer (TR) iterations. The semantic is that k counts the 0394 % number of iterations fully executed so far. 0395 k = 0; 0396 0397 % accept tracks if the proposed step is accepted (true) or declined (false) 0398 accept = true; 0399 0400 % Initialize solution and companion measures: f(x), fgrad(x) 0401 [fx, fgradx] = getCostGrad(problem, x, storedb, key); 0402 norm_grad = M.norm(x, fgradx); 0403 0404 % Initialize trust-region radius 0405 Delta = options.Delta0; 0406 0407 % Depending on the subproblem solver, different kinds of statistics are 0408 % logged and displayed. This initial call to the solver tells us ahead of 0409 % time what to write in the column headers for displayed information, and 0410 % how to initialize the info struct-array. 0411 trsinfo = options.subproblemsolver([], [], options); 0412 0413 % printheader is a string that contains the header for the subproblem 0414 % solver's printed output. 0415 printheader = trsinfo.printheader; 0416 0417 % initstats is a struct of initial values for the stats that the subproblem 0418 % solver wishes to store. 0419 initstats = trsinfo.initstats; 0420 0421 stats = savestats(problem, x, storedb, key, options, k, fx, norm_grad, ... 0422 Delta, ticstart, initstats); 0423 0424 info(1) = stats; 0425 info(min(10000, options.maxiter+1)).iter = []; 0426 0427 % Display headers, depending on verbosity level, then also the initial row. 0428 if options.verbosity == 2 0429 fprintf(['%3s %3s iter ', ... 0430 '%15scost val %2sgrad. norm %s\n'], ... 0431 ' ', ' ', ' ', ' ', printheader); 0432 fprintf(['%3s %3s %5d ', ... 0433 '%+.16e %12e\n'], ... 0434 ' ', ' ', k, fx, norm_grad); 0435 elseif options.verbosity > 2 0436 fprintf(['%3s %3s iter ', ... 0437 '%15scost val %2sgrad. norm %10srho %4srho_noreg ' ... 0438 '%7sDelta %s\n'], ... 0439 ' ', ' ', ' ', ' ', ' ', ' ', ... 0440 ' ', printheader); 0441 fprintf(['%3s %3s %5d ', ... 0442 '%+.16e %12e\n'], ... 0443 ' ',' ',k, fx, norm_grad); 0444 end 0445 0446 % To keep track of consecutive radius changes, so that we can warn the 0447 % user if it appears necessary. 0448 consecutive_TRplus = 0; 0449 consecutive_TRminus = 0; 0450 0451 0452 % ********************** 0453 % ** Start of TR loop ** 0454 % ********************** 0455 while true 0456 0457 % Start clock for this outer iteration 0458 ticstart = tic(); 0459 0460 % Apply the hook function if there is one: this allows external code to 0461 % move x to another point. If the point is changed (indicated by a true 0462 % value for the boolean 'hooked'), we update our knowledge about x. 0463 [x, key, info, hooked] = applyHook(problem, x, storedb, key, ... 0464 options, info, k+1); 0465 if hooked 0466 [fx, fgradx] = getCostGrad(problem, x, storedb, key); 0467 norm_grad = M.norm(x, fgradx); 0468 end 0469 0470 % Run standard stopping criterion checks 0471 [stop, reason] = stoppingcriterion(problem, x, options, info, k+1); 0472 0473 % Ensure trustregions runs at least options.miniter iterations 0474 if k < options.miniter 0475 stop = 0; 0476 end 0477 0478 if stop 0479 if options.verbosity >= 1 0480 fprintf([reason '\n']); 0481 end 0482 break; 0483 end 0484 0485 if options.debug > 0 0486 fprintf([repmat('*', 1, 98) '\n']); 0487 end 0488 0489 % ************************* 0490 % ** Begin TR Subproblem ** 0491 % ************************* 0492 0493 % Solve TR subproblem with solver specified by options.subproblemsolver 0494 trsinput = struct('x', x, 'fgradx', fgradx, 'Delta', Delta, ... 0495 'accept', accept); 0496 0497 trsoutput = options.subproblemsolver(problem, trsinput, options, ... 0498 storedb, key); 0499 0500 eta = trsoutput.eta; 0501 Heta = trsoutput.Heta; 0502 limitedbyTR = trsoutput.limitedbyTR; 0503 trsprintstr = trsoutput.printstr; 0504 trsstats = trsoutput.stats; 0505 0506 0507 % This is computed for logging purposes and may be useful for some 0508 % user-defined stopping criteria. 0509 norm_eta = M.norm(x, eta); 0510 0511 if options.debug > 0 0512 testangle = M.inner(x, eta, fgradx) / (norm_eta*norm_grad); 0513 end 0514 0515 0516 % Compute the tentative next iterate (the proposal) 0517 x_prop = M.retr(x, eta); 0518 key_prop = storedb.getNewKey(); 0519 0520 % Compute the function value of the proposal 0521 fx_prop = getCost(problem, x_prop, storedb, key_prop); 0522 0523 % Will we accept the proposal or not? 0524 % Check the performance of the quadratic model against the actual cost. 0525 rhonum = fx - fx_prop; 0526 vecrho = M.lincomb(x, 1, fgradx, .5, Heta); 0527 rhoden = -M.inner(x, eta, vecrho); 0528 rho_noreg = rhonum/rhoden; 0529 % rhonum could be anything. 0530 % rhoden should be nonnegative, as guaranteed by tCG, barring 0531 % numerical errors. 0532 0533 % Heuristic -- added Dec. 2, 2013 (NB) to replace the former heuristic. 0534 % This heuristic is documented in the book by Conn Gould and Toint on 0535 % trust-region methods, section 17.4.2. 0536 % rhonum measures the difference between two numbers. Close to 0537 % convergence, these two numbers are very close to each other, so 0538 % that computing their difference is numerically challenging: there may 0539 % be a significant loss in accuracy. Since the acceptance or rejection 0540 % of the step is conditioned on the ratio between rhonum and rhoden, 0541 % large errors in rhonum result in a very large error in rho, hence in 0542 % erratic acceptance / rejection. Meanwhile, close to convergence, 0543 % steps are usually trustworthy and we should transition to a Newton- 0544 % like method, with rho=1 consistently. The heuristic thus shifts both 0545 % rhonum and rhoden by a small amount such that far from convergence, 0546 % the shift is irrelevant and close to convergence, the ratio rho goes 0547 % to 1, effectively promoting acceptance of the step. 0548 % The rationale is that close to convergence, both rhonum and rhoden 0549 % are quadratic in the distance between x and x_prop. Thus, when this 0550 % distance is on the order of sqrt(eps), the value of rhonum and rhoden 0551 % is on the order of eps, which is indistinguishable from the numerical 0552 % error, resulting in badly estimated rho's. 0553 % For abs(fx) < 1, this heuristic is invariant under offsets of f but 0554 % not under scaling of f. For abs(fx) > 1, the opposite holds. This 0555 % should not alarm us, as this heuristic only triggers at the very last 0556 % iterations if very fine convergence is demanded. 0557 rho_reg_offset = max(1, abs(fx)) * eps * options.rho_regularization; 0558 rhonum = rhonum + rho_reg_offset; 0559 rhoden = rhoden + rho_reg_offset; 0560 0561 if options.debug > 0 0562 fprintf('DBG: rhonum : %e\n', rhonum); 0563 fprintf('DBG: rhoden : %e\n', rhoden); 0564 end 0565 0566 % This is always true if a linear, symmetric operator is used for the 0567 % Hessian (approximation) and if we had infinite numerical precision. 0568 % In practice, nonlinear approximations of the Hessian such as the 0569 % built-in finite difference approximation and finite numerical 0570 % accuracy can cause the model to increase. In such scenarios, we 0571 % decide to force a rejection of the step and a reduction of the 0572 % trust-region radius. We test the sign of the regularized rhoden since 0573 % the regularization is supposed to capture the accuracy to which 0574 % rhoden is computed: if rhoden were negative before regularization but 0575 % not after, that should not be (and is not) detected as a failure. 0576 % 0577 % Note (Feb. 17, 2015, NB): the most recent version of trs_tCG already 0578 % includes a mechanism to ensure model decrease if the Cauchy step 0579 % attained a decrease (which is theoretically the case under very lax 0580 % assumptions). This being said, it is always possible that numerical 0581 % errors will prevent this, so that it is good to keep a safeguard. 0582 % 0583 % The current strategy is that, if this should happen, then we reject 0584 % the step and reduce the trust region radius. This also ensures that 0585 % the actual cost values are monotonically decreasing. 0586 % 0587 % [This bit of code seems not to trigger since trs_tCG already ensures 0588 % the model decreases even in the presence of non-linearities; but as 0589 % a result the radius is not necessarily decreased. Perhaps we should 0590 % change this with the proposed commented line below; needs testing.] 0591 % 0592 model_decreased = (rhoden >= 0); 0593 0594 if ~model_decreased 0595 trsprintstr = [trsprintstr ', model did not decrease']; %#ok<AGROW> 0596 end 0597 rho = rhonum / rhoden; 0598 0599 % Added June 30, 2015 following observation by BM. 0600 % With this modification, it is guaranteed that a step rejection is 0601 % always accompanied by a TR reduction. This prevents stagnation in 0602 % this "corner case" (NaN's really aren't supposed to occur, but it's 0603 % nice if we can handle them nonetheless). 0604 if isnan(rho) 0605 fprintf(['rho is NaN! Forcing a radius decrease. ' ... 0606 'This should not happen.\n']); 0607 if isnan(fx_prop) 0608 fprintf(['The cost function returned NaN (perhaps the ' ... 0609 'retraction returned a bad point?)\n']); 0610 else 0611 fprintf('The cost function did not return a NaN value.\n'); 0612 end 0613 end 0614 0615 if options.debug > 0 0616 m = @(x, eta) ... 0617 getCost(problem, x, storedb, key) + ... 0618 getDirectionalDerivative(problem, x, eta, storedb, key) + ... 0619 .5*M.inner(x, getHessian(problem, x, eta, storedb, key), eta); 0620 zerovec = M.zerovec(x); 0621 actrho = (fx - fx_prop) / (m(x, zerovec) - m(x, eta)); 0622 fprintf('DBG: new f(x) : %+e\n', fx_prop); 0623 fprintf('DBG: actual rho : %e\n', actrho); 0624 fprintf('DBG: used rho : %e\n', rho); 0625 end 0626 0627 % Choose the new TR radius based on the model performance 0628 trstr = ' '; 0629 % If the actual decrease is smaller than 1/4 of the predicted decrease, 0630 % then reduce the TR radius. 0631 if rho < 1/4 || ~model_decreased || isnan(rho) 0632 trstr = 'TR-'; 0633 Delta = Delta/4; 0634 consecutive_TRplus = 0; 0635 consecutive_TRminus = consecutive_TRminus + 1; 0636 if consecutive_TRminus >= 5 && options.verbosity >= 2 0637 consecutive_TRminus = -inf; 0638 fprintf([' +++ Detected many consecutive TR- (radius ' ... 0639 'decreases).\n' ... 0640 ' +++ Consider dividing options.Delta_bar by 10.\n' ... 0641 ' +++ Current values: options.Delta_bar = %g and ' ... 0642 'options.Delta0 = %g.\n'], options.Delta_bar, ... 0643 options.Delta0); 0644 end 0645 % If the actual decrease is at least 3/4 of the predicted decrease and 0646 % the trs_tCG (inner solve) hit the TR boundary, increase TR radius. 0647 % We also keep track of the number of consecutive trust-region radius 0648 % increases. If there are many, this may indicate the need to adapt the 0649 % initial and maximum radii. 0650 elseif rho > 3/4 && limitedbyTR 0651 trstr = 'TR+'; 0652 Delta = min(2*Delta, options.Delta_bar); 0653 consecutive_TRminus = 0; 0654 consecutive_TRplus = consecutive_TRplus + 1; 0655 if consecutive_TRplus >= 5 && options.verbosity >= 1 0656 consecutive_TRplus = -inf; 0657 fprintf([' +++ Detected many consecutive TR+ (radius ' ... 0658 'increases).\n' ... 0659 ' +++ Consider multiplying options.Delta_bar by 10.\n' ... 0660 ' +++ Current values: options.Delta_bar = %g and ' ... 0661 'options.Delta0 = %g.\n'], options.Delta_bar, ... 0662 options.Delta0); 0663 end 0664 else 0665 % Otherwise, keep the TR radius constant. 0666 consecutive_TRplus = 0; 0667 consecutive_TRminus = 0; 0668 end 0669 0670 % Choose to accept or reject the proposed step based on the model 0671 % performance. Note the strict inequality. 0672 if model_decreased && rho > options.rho_prime 0673 0674 % April 17, 2018: a side effect of rho_regularization > 0 is that 0675 % it can happen that the cost function appears to go up (although 0676 % only by a small amount) for some accepted steps. We decide to 0677 % accept this because, numerically, computing the difference 0678 % between fx_prop and fx is more difficult than computing the 0679 % improvement in the model, because fx_prop and fx are on the same 0680 % order of magnitude yet are separated by a very small gap near 0681 % convergence, whereas the model improvement is computed as a sum 0682 % of two small terms. As a result, the step which seems bad may 0683 % turn out to be good, in that it may help reduce the gradient norm 0684 % for example. This update merely informs the user of this event. 0685 % In further updates, we could also introduce this as a stopping 0686 % criterion. It is then important to choose wisely which of x or 0687 % x_prop should be returned (perhaps the one with smallest 0688 % gradient?) 0689 if fx_prop > fx && options.verbosity >= 2 0690 fprintf(['Between line above and below, cost function ' ... 0691 'increased by %.2g (step size: %.2g)\n'], ... 0692 fx_prop - fx, norm_eta); 0693 end 0694 0695 accept = true; 0696 accstr = 'acc'; 0697 % We accept the step: no need to keep the old cache. 0698 storedb.removefirstifdifferent(key, key_prop); 0699 x = x_prop; 0700 key = key_prop; 0701 fx = fx_prop; 0702 fgradx = getGradient(problem, x, storedb, key); 0703 norm_grad = M.norm(x, fgradx); 0704 else 0705 % We reject the step: no need to keep cache related to the 0706 % tentative step. 0707 storedb.removefirstifdifferent(key_prop, key); 0708 accept = false; 0709 accstr = 'REJ'; 0710 end 0711 0712 % k is the number of iterations we have accomplished. 0713 k = k + 1; 0714 0715 % Make sure we don't use too much memory for the store database. 0716 storedb.purge(); 0717 0718 % Log statistics for freshly executed iteration. 0719 % Everything after this in the loop is not accounted for in the timing. 0720 stats = savestats(problem, x, storedb, key, options, k, fx, ... 0721 norm_grad, Delta, ticstart, trsstats, ... 0722 info, rho, rhonum, rhoden, accept, norm_eta, ... 0723 limitedbyTR); 0724 info(k+1) = stats; 0725 0726 % Display 0727 if options.verbosity == 2 0728 fprintf('%3s %3s %5d %+.16e %12e %s\n', ... 0729 accstr, trstr, k, fx, norm_grad, trsprintstr); 0730 elseif options.verbosity > 2 0731 fprintf(['%3s %3s %5d %+.16e %.6e %+.6e ' ... 0732 '%+.6e %.6e %s\n'], ... 0733 accstr, trstr, k, fx, norm_grad, rho, ... 0734 rho_noreg, Delta, trsprintstr); 0735 if options.debug > 0 0736 fprintf(' Delta : %f |eta| : %e\n', ... 0737 Delta, norm_eta); 0738 end 0739 end 0740 if options.debug > 0 0741 fprintf('DBG: cos ang(eta, gradf): %d\n', testangle); 0742 if rho == 0 0743 fprintf('DBG: rho = 0: likely to hinder convergence.\n'); 0744 end 0745 end 0746 0747 end % of TR loop (counter: k) 0748 0749 % Restrict info struct-array to useful part 0750 info = info(1:k+1); 0751 0752 0753 if options.debug > 0 0754 fprintf([repmat('*', 1, 98) '\n']); 0755 end 0756 if options.verbosity > 0 0757 fprintf('Total time is %f [s] (excludes statsfun)\n', info(end).time); 0758 end 0759 0760 % Return the best cost reached 0761 cost = fx; 0762 0763 end 0764 0765 0766 0767 0768 0769 % Routine in charge of collecting the current iteration stats 0770 function stats = savestats(problem, x, storedb, key, options, k, fx, ... 0771 norm_grad, Delta, ticstart, trsstats, info, rho, ... 0772 rhonum, rhoden, accept, norm_eta, limitedbyTR) 0773 stats.iter = k; 0774 stats.cost = fx; 0775 stats.gradnorm = norm_grad; 0776 stats.Delta = Delta; 0777 if k == 0 0778 stats.time = toc(ticstart); 0779 stats.rho = inf; 0780 stats.rhonum = NaN; 0781 stats.rhoden = NaN; 0782 stats.accepted = true; 0783 stats.stepsize = NaN; 0784 stats.limitedbyTR = false; 0785 fields = fieldnames(trsstats); 0786 for i = 1 : length(fields) 0787 stats.(fields{i}) = trsstats.(fields{i}); 0788 end 0789 else 0790 stats.time = info(k).time + toc(ticstart); 0791 stats.rho = rho; 0792 stats.rhonum = rhonum; 0793 stats.rhoden = rhoden; 0794 stats.accepted = accept; 0795 stats.stepsize = norm_eta; 0796 stats.limitedbyTR = limitedbyTR; 0797 fields = fieldnames(trsstats); 0798 for i = 1 : length(fields) 0799 stats.(fields{i}) = trsstats.(fields{i}); 0800 end 0801 end 0802 0803 % See comment about statsfun above: the x and store passed to statsfun 0804 % are that of the most recently accepted point after the iteration 0805 % fully executed. 0806 stats = applyStatsfun(problem, x, storedb, key, options, stats); 0807 0808 end