Riemannian trust-regions solver for optimization on manifolds. function [x, cost, info, options] = trustregions(problem) function [x, cost, info, options] = trustregions(problem, x0) function [x, cost, info, options] = trustregions(problem, x0, options) function [x, cost, info, options] = trustregions(problem, [], options) This is the Riemannian Trust-Region solver (with tCG inner solve), named RTR. This solver will attempt to minimize the cost function described in the problem structure. It requires the availability of the cost function and of its gradient. It will issue calls for the Hessian. If no Hessian nor approximate Hessian is provided, a standard approximation of the Hessian based on the gradient will be computed. If a preconditioner for the Hessian is provided, it will be used. If no gradient is provided, an approximation of the gradient is computed, but this can be slow for manifolds of high dimension. For a description of the algorithm and theorems offering convergence guarantees, see the references below. Documentation for this solver is available online at: http://www.manopt.org/solver_documentation_trustregions.html The initial iterate is x0 if it is provided. Otherwise, a random point on the manifold is picked. To specify options whilst not specifying an initial iterate, give x0 as [] (the empty matrix). The two outputs 'x' and 'cost' are the last reached point on the manifold and its cost. Notice that x is not necessarily the best reached point, because this solver is not forced to be a descent method. In particular, very close to convergence, it is sometimes preferable to accept very slight increases in the cost value (on the order of the machine epsilon) in the process of reaching fine convergence. The output 'info' is a struct-array which contains information about the iterations: iter (integer) The (outer) iteration number, or number of steps considered (whether accepted or rejected). The initial guess is 0. cost (double) The corresponding cost value. gradnorm (double) The (Riemannian) norm of the gradient. numinner (integer) The number of inner iterations executed to compute this iterate. Inner iterations are truncated-CG steps. Each one requires a Hessian (or approximate Hessian) evaluation. time (double) The total elapsed time in seconds to reach the corresponding cost. rho (double) The performance ratio for the iterate. rhonum, rhoden (double) Regularized numerator and denominator of the performance ratio: rho = rhonum/rhoden. See options.rho_regularization. accepted (boolean) Whether the proposed iterate was accepted or not. stepsize (double) The (Riemannian) norm of the vector returned by the inner solver tCG and which is retracted to obtain the proposed next iterate. If accepted = true for the corresponding iterate, this is the size of the step from the previous to the new iterate. If accepted is false, the step was not executed and this is the size of the rejected step. Delta (double) The trust-region radius at the outer iteration. cauchy (boolean) Whether the Cauchy point was used or not (if useRand is true). And possibly additional information logged by options.statsfun. For example, type [info.gradnorm] to obtain a vector of the successive gradient norms reached at each (outer) iteration. The options structure is used to overwrite the default values. All options have a default value and are hence optional. To force an option value, pass an options structure with a field options.optionname, where optionname is one of the following and the default value is indicated between parentheses: tolgradnorm (1e-6) The algorithm terminates if the norm of the gradient drops below this. For well-scaled problems, a rule of thumb is that you can expect to reduce the gradient norm by 8 orders of magnitude (sqrt(eps)) compared to the gradient norm at a "typical" point (a rough initial iterate for example). Further decrease is sometimes possible, but inexact floating point arithmetic will eventually limit the final accuracy. If tolgradnorm is set too low, the algorithm may end up iterating forever (or at least until another stopping criterion triggers). maxiter (1000) The algorithm terminates if maxiter (outer) iterations were executed. maxtime (Inf) The algorithm terminates if maxtime seconds elapsed. miniter (3) Minimum number of outer iterations (used only if useRand is true). mininner (1) Minimum number of inner iterations (for tCG). maxinner (problem.M.dim() : the manifold's dimension) Maximum number of inner iterations (for tCG). Delta_bar (problem.M.typicaldist() or sqrt(problem.M.dim())) Maximum trust-region radius. If you specify this parameter but not Delta0, then Delta0 will be set to 1/8 times this parameter. Delta0 (Delta_bar/8) Initial trust-region radius. If you observe a long plateau at the beginning of the convergence plot (gradient norm VS iteration), it may pay off to try to tune this parameter to shorten the plateau. You should not set this parameter without setting Delta_bar too (at a larger value). useRand (false) Set to true if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated. kappa (0.1) tCG inner kappa convergence tolerance. kappa > 0 is the linear convergence target rate: tCG will terminate early if the residual was reduced by a factor of kappa. theta (1.0) tCG inner theta convergence tolerance. 1+theta (theta between 0 and 1) is the superlinear convergence target rate. tCG will terminate early if the residual was reduced by a power of 1+theta. rho_prime (0.1) Accept/reject threshold : if rho is at least rho_prime, the outer iteration is accepted. Otherwise, it is rejected. In case it is rejected, the trust-region radius will have been decreased. To ensure this, rho_prime >= 0 must be strictly smaller than 1/4. If rho_prime is negative, the algorithm is not guaranteed to produce monotonically decreasing cost values. It is strongly recommended to set rho_prime > 0, to aid convergence. rho_regularization (1e3) Close to convergence, evaluating the performance ratio rho is numerically challenging. Meanwhile, close to convergence, the quadratic model should be a good fit and the steps should be accepted. Regularization lets rho go to 1 as the model decrease and the actual decrease go to zero. Set this option to zero to disable regularization (not recommended). See in-code for the specifics. When this is not zero, it may happen that the iterates produced are not monotonically improving the cost when very close to convergence. This is because the corrected cost improvement could change sign if it is negative but very small. statsfun (none) Function handle to a function that will be called after each iteration to provide the opportunity to log additional statistics. They will be returned in the info struct. See the generic Manopt documentation about solvers for further information. statsfun is called with the point x that was reached last, after the accept/reject decision. See comment below. stopfun (none) Function handle to a function that will be called at each iteration to provide the opportunity to specify additional stopping criteria. See the generic Manopt documentation about solvers for further information. verbosity (2) Integer number used to tune the amount of output the algorithm generates during execution (mostly as text in the command window). The higher, the more output. 0 means silent. 3 and above includes a display of the options structure at the beginning of the execution. debug (false) Set to true to allow the algorithm to perform additional computations for debugging purposes. If a debugging test fails, you will be informed of it, usually via the command window. Be aware that these additional computations appear in the algorithm timings too, and may interfere with operations such as counting the number of cost evaluations, etc. (the debug calls get storedb too). storedepth (2) Maximum number of different points x of the manifold for which a store structure will be kept in memory in the storedb for caching. If memory usage is an issue, you may try to lower this number. Profiling or manopt counters may then help to investigate if a performance hit was incurred as a result. Notice that statsfun is called with the point x that was reached last, after the accept/reject decision. Hence: if the step was accepted, we get that new x, with a store which only saw the call for the cost and for the gradient. If the step was rejected, we get the same x as previously, with the store structure containing everything that was computed at that point (possibly including previous rejects at that same point). Hence, statsfun should not be used in conjunction with the store to count operations for example. Instead, you should use manopt counters: see statscounters. Please cite the Manopt paper as well as the research paper: @Article{genrtr, Title = {Trust-region methods on {Riemannian} manifolds}, Author = {Absil, P.-A. and Baker, C. G. and Gallivan, K. A.}, Journal = {Foundations of Computational Mathematics}, Year = {2007}, Number = {3}, Pages = {303--330}, Volume = {7}, Doi = {10.1007/s10208-005-0179-9} } See also: steepestdescent conjugategradient manopt/examples

- StoreDB
- applyStatsfun Apply the statsfun function to a stats structure (for solvers).
- canGetApproxGradient Checks whether an approximate gradient can be computed for this problem.
- canGetApproxHessian Checks whether an approximate Hessian can be computed for this problem.
- canGetCost Checks whether the cost function can be computed for a problem structure.
- canGetGradient Checks whether the gradient can be computed for a problem structure.
- canGetHessian Checks whether the Hessian can be computed for a problem structure.
- getCost Computes the cost function at x.
- getCostGrad Computes the cost function and the gradient at x in one call if possible.
- getDirectionalDerivative Computes the directional derivative of the cost function at x along d.
- getGlobalDefaults Returns a structure with default option values for Manopt.
- getGradient Computes the gradient of the cost function at x.
- getHessian Computes the Hessian of the cost function at x along d.
- mergeOptions Merges two options structures with one having precedence over the other.
- stoppingcriterion Checks for standard stopping criteria, as a helper to solvers.
- approxgradientFD Gradient approx. fnctn handle based on finite differences of the cost.
- approxhessianFD Hessian approx. fnctn handle based on finite differences of the gradient.
- tCG tCG - Truncated (Steihaug-Toint) Conjugate-Gradient method

- dominant_invariant_subspace Returns an orthonormal basis of the dominant invariant p-subspace of A.
- dominant_invariant_subspace_complex Returns a unitary basis of the dominant invariant p-subspace of A.
- doubly_stochastic_denoising Find a doubly stochastic matrix closest to a given matrix, in Frobenius norm.
- elliptope_SDP Solver for semidefinite programs (SDP's) with unit diagonal constraints.
- elliptope_SDP_complex Solver for complex semidefinite programs (SDP's) with unit diagonal.
- essential_svd Sample solution of an optimization problem on the essential manifold.
- generalized_eigenvalue_computation Returns orthonormal basis of the dominant invariant p-subspace of B^-1 A.
- generalized_procrustes Rotationally align clouds of points (generalized Procrustes problem)
- low_rank_dist_completion Perform low-rank distance matrix completion w/ automatic rank detection.
- low_rank_matrix_completion Given partial observation of a low rank matrix, attempts to complete it.
- low_rank_tensor_completion Given partial observation of a low rank tensor, attempts to complete it.
- maxcut Algorithm to (try to) compute a maximum cut of a graph, via SDP approach.
- radio_interferometric_calibration Returns the gain matrices of N stations with K receivers.
- robust_pca Computes a robust version of PCA (principal component analysis) on data.
- shapefit_smoothed ShapeFit formulation for sensor network localization from pair directions
- sparse_pca Sparse principal component analysis based on optimization over Stiefel.
- truncated_svd Returns an SVD decomposition of A truncated to rank p.
- using_counters Manopt example on how to use counters during optimization. Typical uses,
- using_gpu Manopt example on how to use GPU with manifold factories that allow it.
- centroid Attempts the computation of a centroid of a set of points on a manifold.
- preconhessiansolve Preconditioner based on the inverse Hessian, by solving linear systems.
- hessianextreme Compute an extreme eigenvector / eigenvalue of the Hessian of a problem.
- manoptsolve Gateway helper function to call a Manopt solver, chosen in the options.

0001 function [x, cost, info, options] = trustregions(problem, x, options) 0002 % Riemannian trust-regions solver for optimization on manifolds. 0003 % 0004 % function [x, cost, info, options] = trustregions(problem) 0005 % function [x, cost, info, options] = trustregions(problem, x0) 0006 % function [x, cost, info, options] = trustregions(problem, x0, options) 0007 % function [x, cost, info, options] = trustregions(problem, [], options) 0008 % 0009 % This is the Riemannian Trust-Region solver (with tCG inner solve), named 0010 % RTR. This solver will attempt to minimize the cost function described in 0011 % the problem structure. It requires the availability of the cost function 0012 % and of its gradient. It will issue calls for the Hessian. If no Hessian 0013 % nor approximate Hessian is provided, a standard approximation of the 0014 % Hessian based on the gradient will be computed. If a preconditioner for 0015 % the Hessian is provided, it will be used. 0016 % 0017 % If no gradient is provided, an approximation of the gradient is computed, 0018 % but this can be slow for manifolds of high dimension. 0019 % 0020 % For a description of the algorithm and theorems offering convergence 0021 % guarantees, see the references below. Documentation for this solver is 0022 % available online at: 0023 % 0024 % http://www.manopt.org/solver_documentation_trustregions.html 0025 % 0026 % 0027 % The initial iterate is x0 if it is provided. Otherwise, a random point on 0028 % the manifold is picked. To specify options whilst not specifying an 0029 % initial iterate, give x0 as [] (the empty matrix). 0030 % 0031 % The two outputs 'x' and 'cost' are the last reached point on the manifold 0032 % and its cost. Notice that x is not necessarily the best reached point, 0033 % because this solver is not forced to be a descent method. In particular, 0034 % very close to convergence, it is sometimes preferable to accept very 0035 % slight increases in the cost value (on the order of the machine epsilon) 0036 % in the process of reaching fine convergence. 0037 % 0038 % The output 'info' is a struct-array which contains information about the 0039 % iterations: 0040 % iter (integer) 0041 % The (outer) iteration number, or number of steps considered 0042 % (whether accepted or rejected). The initial guess is 0. 0043 % cost (double) 0044 % The corresponding cost value. 0045 % gradnorm (double) 0046 % The (Riemannian) norm of the gradient. 0047 % numinner (integer) 0048 % The number of inner iterations executed to compute this iterate. 0049 % Inner iterations are truncated-CG steps. Each one requires a 0050 % Hessian (or approximate Hessian) evaluation. 0051 % time (double) 0052 % The total elapsed time in seconds to reach the corresponding cost. 0053 % rho (double) 0054 % The performance ratio for the iterate. 0055 % rhonum, rhoden (double) 0056 % Regularized numerator and denominator of the performance ratio: 0057 % rho = rhonum/rhoden. See options.rho_regularization. 0058 % accepted (boolean) 0059 % Whether the proposed iterate was accepted or not. 0060 % stepsize (double) 0061 % The (Riemannian) norm of the vector returned by the inner solver 0062 % tCG and which is retracted to obtain the proposed next iterate. If 0063 % accepted = true for the corresponding iterate, this is the size of 0064 % the step from the previous to the new iterate. If accepted is 0065 % false, the step was not executed and this is the size of the 0066 % rejected step. 0067 % Delta (double) 0068 % The trust-region radius at the outer iteration. 0069 % cauchy (boolean) 0070 % Whether the Cauchy point was used or not (if useRand is true). 0071 % And possibly additional information logged by options.statsfun. 0072 % For example, type [info.gradnorm] to obtain a vector of the successive 0073 % gradient norms reached at each (outer) iteration. 0074 % 0075 % The options structure is used to overwrite the default values. All 0076 % options have a default value and are hence optional. To force an option 0077 % value, pass an options structure with a field options.optionname, where 0078 % optionname is one of the following and the default value is indicated 0079 % between parentheses: 0080 % 0081 % tolgradnorm (1e-6) 0082 % The algorithm terminates if the norm of the gradient drops below 0083 % this. For well-scaled problems, a rule of thumb is that you can 0084 % expect to reduce the gradient norm by 8 orders of magnitude 0085 % (sqrt(eps)) compared to the gradient norm at a "typical" point (a 0086 % rough initial iterate for example). Further decrease is sometimes 0087 % possible, but inexact floating point arithmetic will eventually 0088 % limit the final accuracy. If tolgradnorm is set too low, the 0089 % algorithm may end up iterating forever (or at least until another 0090 % stopping criterion triggers). 0091 % maxiter (1000) 0092 % The algorithm terminates if maxiter (outer) iterations were executed. 0093 % maxtime (Inf) 0094 % The algorithm terminates if maxtime seconds elapsed. 0095 % miniter (3) 0096 % Minimum number of outer iterations (used only if useRand is true). 0097 % mininner (1) 0098 % Minimum number of inner iterations (for tCG). 0099 % maxinner (problem.M.dim() : the manifold's dimension) 0100 % Maximum number of inner iterations (for tCG). 0101 % Delta_bar (problem.M.typicaldist() or sqrt(problem.M.dim())) 0102 % Maximum trust-region radius. If you specify this parameter but not 0103 % Delta0, then Delta0 will be set to 1/8 times this parameter. 0104 % Delta0 (Delta_bar/8) 0105 % Initial trust-region radius. If you observe a long plateau at the 0106 % beginning of the convergence plot (gradient norm VS iteration), it 0107 % may pay off to try to tune this parameter to shorten the plateau. 0108 % You should not set this parameter without setting Delta_bar too (at 0109 % a larger value). 0110 % useRand (false) 0111 % Set to true if the trust-region solve is to be initiated with a 0112 % random tangent vector. If set to true, no preconditioner will be 0113 % used. This option is set to true in some scenarios to escape saddle 0114 % points, but is otherwise seldom activated. 0115 % kappa (0.1) 0116 % tCG inner kappa convergence tolerance. 0117 % kappa > 0 is the linear convergence target rate: tCG will terminate 0118 % early if the residual was reduced by a factor of kappa. 0119 % theta (1.0) 0120 % tCG inner theta convergence tolerance. 0121 % 1+theta (theta between 0 and 1) is the superlinear convergence 0122 % target rate. tCG will terminate early if the residual was reduced 0123 % by a power of 1+theta. 0124 % rho_prime (0.1) 0125 % Accept/reject threshold : if rho is at least rho_prime, the outer 0126 % iteration is accepted. Otherwise, it is rejected. In case it is 0127 % rejected, the trust-region radius will have been decreased. 0128 % To ensure this, rho_prime >= 0 must be strictly smaller than 1/4. 0129 % If rho_prime is negative, the algorithm is not guaranteed to 0130 % produce monotonically decreasing cost values. It is strongly 0131 % recommended to set rho_prime > 0, to aid convergence. 0132 % rho_regularization (1e3) 0133 % Close to convergence, evaluating the performance ratio rho is 0134 % numerically challenging. Meanwhile, close to convergence, the 0135 % quadratic model should be a good fit and the steps should be 0136 % accepted. Regularization lets rho go to 1 as the model decrease and 0137 % the actual decrease go to zero. Set this option to zero to disable 0138 % regularization (not recommended). See in-code for the specifics. 0139 % When this is not zero, it may happen that the iterates produced are 0140 % not monotonically improving the cost when very close to 0141 % convergence. This is because the corrected cost improvement could 0142 % change sign if it is negative but very small. 0143 % statsfun (none) 0144 % Function handle to a function that will be called after each 0145 % iteration to provide the opportunity to log additional statistics. 0146 % They will be returned in the info struct. See the generic Manopt 0147 % documentation about solvers for further information. statsfun is 0148 % called with the point x that was reached last, after the 0149 % accept/reject decision. See comment below. 0150 % stopfun (none) 0151 % Function handle to a function that will be called at each iteration 0152 % to provide the opportunity to specify additional stopping criteria. 0153 % See the generic Manopt documentation about solvers for further 0154 % information. 0155 % verbosity (2) 0156 % Integer number used to tune the amount of output the algorithm 0157 % generates during execution (mostly as text in the command window). 0158 % The higher, the more output. 0 means silent. 3 and above includes a 0159 % display of the options structure at the beginning of the execution. 0160 % debug (false) 0161 % Set to true to allow the algorithm to perform additional 0162 % computations for debugging purposes. If a debugging test fails, you 0163 % will be informed of it, usually via the command window. Be aware 0164 % that these additional computations appear in the algorithm timings 0165 % too, and may interfere with operations such as counting the number 0166 % of cost evaluations, etc. (the debug calls get storedb too). 0167 % storedepth (2) 0168 % Maximum number of different points x of the manifold for which a 0169 % store structure will be kept in memory in the storedb for caching. 0170 % If memory usage is an issue, you may try to lower this number. 0171 % Profiling or manopt counters may then help to investigate if a 0172 % performance hit was incurred as a result. 0173 % 0174 % Notice that statsfun is called with the point x that was reached last, 0175 % after the accept/reject decision. Hence: if the step was accepted, we get 0176 % that new x, with a store which only saw the call for the cost and for the 0177 % gradient. If the step was rejected, we get the same x as previously, with 0178 % the store structure containing everything that was computed at that point 0179 % (possibly including previous rejects at that same point). Hence, statsfun 0180 % should not be used in conjunction with the store to count operations for 0181 % example. Instead, you should use manopt counters: see statscounters. 0182 % 0183 % 0184 % Please cite the Manopt paper as well as the research paper: 0185 % @Article{genrtr, 0186 % Title = {Trust-region methods on {Riemannian} manifolds}, 0187 % Author = {Absil, P.-A. and Baker, C. G. and Gallivan, K. A.}, 0188 % Journal = {Foundations of Computational Mathematics}, 0189 % Year = {2007}, 0190 % Number = {3}, 0191 % Pages = {303--330}, 0192 % Volume = {7}, 0193 % Doi = {10.1007/s10208-005-0179-9} 0194 % } 0195 % 0196 % See also: steepestdescent conjugategradient manopt/examples 0197 0198 % An explicit, general listing of this algorithm, with preconditioning, 0199 % can be found in the following paper: 0200 % @Article{boumal2015lowrank, 0201 % Title = {Low-rank matrix completion via preconditioned optimization on the {G}rassmann manifold}, 0202 % Author = {Boumal, N. and Absil, P.-A.}, 0203 % Journal = {Linear Algebra and its Applications}, 0204 % Year = {2015}, 0205 % Pages = {200--239}, 0206 % Volume = {475}, 0207 % Doi = {10.1016/j.laa.2015.02.027}, 0208 % } 0209 0210 % When the Hessian is not specified, it is approximated with 0211 % finite-differences of the gradient. The resulting method is called 0212 % RTR-FD. Some convergence theory for it is available in this paper: 0213 % @incollection{boumal2015rtrfd 0214 % author={Boumal, N.}, 0215 % title={Riemannian trust regions with finite-difference Hessian approximations are globally convergent}, 0216 % year={2015}, 0217 % booktitle={Geometric Science of Information} 0218 % } 0219 0220 0221 % This file is part of Manopt: www.manopt.org. 0222 % This code is an adaptation to Manopt of the original GenRTR code: 0223 % RTR - Riemannian Trust-Region 0224 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan 0225 % Florida State University 0226 % School of Computational Science 0227 % (http://www.math.fsu.edu/~cbaker/GenRTR/?page=download) 0228 % See accompanying license file. 0229 % The adaptation was executed by Nicolas Boumal. 0230 % 0231 % 0232 % Change log: 0233 % 0234 % NB April 3, 2013: 0235 % tCG now returns the Hessian along the returned direction eta, so 0236 % that we do not compute that Hessian redundantly: some savings at 0237 % each iteration. Similarly, if the useRand flag is on, we spare an 0238 % extra Hessian computation at each outer iteration too, owing to 0239 % some modifications in the Cauchy point section of the code specific 0240 % to useRand = true. 0241 % 0242 % NB Aug. 22, 2013: 0243 % This function is now Octave compatible. The transition called for 0244 % two changes which would otherwise not be advisable. (1) tic/toc is 0245 % now used as is, as opposed to the safer way: 0246 % t = tic(); elapsed = toc(t); 0247 % And (2), the (formerly inner) function savestats was moved outside 0248 % the main function to not be nested anymore. This is arguably less 0249 % elegant, but Octave does not (and likely will not) support nested 0250 % functions. 0251 % 0252 % NB Dec. 2, 2013: 0253 % The in-code documentation was largely revised and expanded. 0254 % 0255 % NB Dec. 2, 2013: 0256 % The former heuristic which triggered when rhonum was very small and 0257 % forced rho = 1 has been replaced by a smoother heuristic which 0258 % consists in regularizing rhonum and rhoden before computing their 0259 % ratio. It is tunable via options.rho_regularization. Furthermore, 0260 % the solver now detects if tCG did not obtain a model decrease 0261 % (which is theoretically impossible but may happen because of 0262 % numerical errors and/or because of a nonlinear/nonsymmetric Hessian 0263 % operator, which is the case for finite difference approximations). 0264 % When such an anomaly is detected, the step is rejected and the 0265 % trust region radius is decreased. 0266 % Feb. 18, 2015 note: this is less useful now, as tCG now guarantees 0267 % model decrease even for the finite difference approximation of the 0268 % Hessian. It is still useful in case of numerical errors, but this 0269 % is less stringent. 0270 % 0271 % NB Dec. 3, 2013: 0272 % The stepsize is now registered at each iteration, at a small 0273 % additional cost. The defaults for Delta_bar and Delta0 are better 0274 % defined. Setting Delta_bar in the options will automatically set 0275 % Delta0 accordingly. In Manopt 1.0.4, the defaults for these options 0276 % were not treated appropriately because of an incorrect use of the 0277 % isfield() built-in function. 0278 % 0279 % NB Feb. 18, 2015: 0280 % Added some comments. Also, Octave now supports safe tic/toc usage, 0281 % so we reverted the changes to use that again (see Aug. 22, 2013 log 0282 % entry). 0283 % 0284 % NB April 3, 2015: 0285 % Works with the new StoreDB class system. 0286 % 0287 % NB April 8, 2015: 0288 % No Hessian warning if approximate Hessian explicitly available. 0289 % 0290 % NB Nov. 1, 2016: 0291 % Now uses approximate gradient via finite differences if need be. 0292 % 0293 % NB Aug. 2, 2018: 0294 % Using storedb.remove() to keep the cache lean, which allowed to 0295 % reduce storedepth to 2 from 20 (by default). 0296 0297 M = problem.M; 0298 0299 % Verify that the problem description is sufficient for the solver. 0300 if ~canGetCost(problem) 0301 warning('manopt:getCost', ... 0302 'No cost provided. The algorithm will likely abort.'); 0303 end 0304 if ~canGetGradient(problem) && ~canGetApproxGradient(problem) 0305 % Note: we do not give a warning if an approximate gradient is 0306 % explicitly given in the problem description, as in that case the user 0307 % seems to be aware of the issue. 0308 warning('manopt:getGradient:approx', ... 0309 ['No gradient provided. Using an FD approximation instead (slow).\n' ... 0310 'It may be necessary to increase options.tolgradnorm.\n' ... 0311 'To disable this warning: warning(''off'', ''manopt:getGradient:approx'')']); 0312 problem.approxgrad = approxgradientFD(problem); 0313 end 0314 if ~canGetHessian(problem) && ~canGetApproxHessian(problem) 0315 % Note: we do not give a warning if an approximate Hessian is 0316 % explicitly given in the problem description, as in that case the user 0317 % seems to be aware of the issue. 0318 warning('manopt:getHessian:approx', ... 0319 ['No Hessian provided. Using an FD approximation instead.\n' ... 0320 'To disable this warning: warning(''off'', ''manopt:getHessian:approx'')']); 0321 problem.approxhess = approxhessianFD(problem); 0322 end 0323 0324 % Define some strings for display 0325 tcg_stop_reason = {'negative curvature',... 0326 'exceeded trust region',... 0327 'reached target residual-kappa (linear)',... 0328 'reached target residual-theta (superlinear)',... 0329 'maximum inner iterations',... 0330 'model increased'}; 0331 0332 % Set local defaults here 0333 localdefaults.verbosity = 2; 0334 localdefaults.maxtime = inf; 0335 localdefaults.miniter = 3; 0336 localdefaults.maxiter = 1000; 0337 localdefaults.mininner = 1; 0338 localdefaults.maxinner = M.dim(); 0339 localdefaults.tolgradnorm = 1e-6; 0340 localdefaults.kappa = 0.1; 0341 localdefaults.theta = 1.0; 0342 localdefaults.rho_prime = 0.1; 0343 localdefaults.useRand = false; 0344 localdefaults.rho_regularization = 1e3; 0345 0346 % Merge global and local defaults, then merge w/ user options, if any. 0347 localdefaults = mergeOptions(getGlobalDefaults(), localdefaults); 0348 if ~exist('options', 'var') || isempty(options) 0349 options = struct(); 0350 end 0351 options = mergeOptions(localdefaults, options); 0352 0353 % Set default Delta_bar and Delta0 separately to deal with additional 0354 % logic: if Delta_bar is provided but not Delta0, let Delta0 automatically 0355 % be some fraction of the provided Delta_bar. 0356 if ~isfield(options, 'Delta_bar') 0357 if isfield(M, 'typicaldist') 0358 options.Delta_bar = M.typicaldist(); 0359 else 0360 options.Delta_bar = sqrt(M.dim()); 0361 end 0362 end 0363 if ~isfield(options,'Delta0') 0364 options.Delta0 = options.Delta_bar / 8; 0365 end 0366 0367 % Check some option values 0368 assert(options.rho_prime < 1/4, ... 0369 'options.rho_prime must be strictly smaller than 1/4.'); 0370 assert(options.Delta_bar > 0, ... 0371 'options.Delta_bar must be positive.'); 0372 assert(options.Delta0 > 0 && options.Delta0 < options.Delta_bar, ... 0373 'options.Delta0 must be positive and smaller than Delta_bar.'); 0374 0375 % It is sometimes useful to check what the actual option values are. 0376 if options.verbosity >= 3 0377 disp(options); 0378 end 0379 0380 ticstart = tic(); 0381 0382 % If no initial point x is given by the user, generate one at random. 0383 if ~exist('x', 'var') || isempty(x) 0384 x = M.rand(); 0385 end 0386 0387 % Create a store database and get a key for the current x 0388 storedb = StoreDB(options.storedepth); 0389 key = storedb.getNewKey(); 0390 0391 %% Initializations 0392 0393 % k counts the outer (TR) iterations. The semantic is that k counts the 0394 % number of iterations fully executed so far. 0395 k = 0; 0396 0397 % Initialize solution and companion measures: f(x), fgrad(x) 0398 [fx, fgradx] = getCostGrad(problem, x, storedb, key); 0399 norm_grad = M.norm(x, fgradx); 0400 0401 % Initialize trust-region radius 0402 Delta = options.Delta0; 0403 0404 % Save stats in a struct array info, and preallocate. 0405 if ~exist('used_cauchy', 'var') 0406 used_cauchy = []; 0407 end 0408 stats = savestats(problem, x, storedb, key, options, k, fx, norm_grad, Delta, ticstart); 0409 info(1) = stats; 0410 info(min(10000, options.maxiter+1)).iter = []; 0411 0412 % ** Display: 0413 if options.verbosity == 2 0414 fprintf(['%3s %3s %5s %5s ',... 0415 'f: %+e |grad|: %e\n'],... 0416 ' ',' ',' ',' ', fx, norm_grad); 0417 elseif options.verbosity > 2 0418 fprintf('************************************************************************\n'); 0419 fprintf('%3s %3s k: %5s num_inner: %5s %s\n',... 0420 '','','______','______',''); 0421 fprintf(' f(x) : %+e |grad| : %e\n', fx, norm_grad); 0422 fprintf(' Delta : %f\n', Delta); 0423 end 0424 0425 % To keep track of consecutive radius changes, so that we can warn the 0426 % user if it appears necessary. 0427 consecutive_TRplus = 0; 0428 consecutive_TRminus = 0; 0429 0430 0431 % ********************** 0432 % ** Start of TR loop ** 0433 % ********************** 0434 while true 0435 0436 % Start clock for this outer iteration 0437 ticstart = tic(); 0438 0439 % Run standard stopping criterion checks 0440 [stop, reason] = stoppingcriterion(problem, x, options, info, k+1); 0441 0442 % If the stopping criterion that triggered is the tolerance on the 0443 % gradient norm but we are using randomization, make sure we make at 0444 % least miniter iterations to give randomization a chance at escaping 0445 % saddle points. 0446 if stop == 2 && options.useRand && k < options.miniter 0447 stop = 0; 0448 end 0449 0450 if stop 0451 if options.verbosity >= 1 0452 fprintf([reason '\n']); 0453 end 0454 break; 0455 end 0456 0457 if options.verbosity > 2 || options.debug > 0 0458 fprintf('************************************************************************\n'); 0459 end 0460 0461 % ************************* 0462 % ** Begin TR Subproblem ** 0463 % ************************* 0464 0465 % Determine eta0 0466 if ~options.useRand 0467 % Pick the zero vector 0468 eta = M.zerovec(x); 0469 else 0470 % Random vector in T_x M (this has to be very small) 0471 eta = M.lincomb(x, 1e-6, M.randvec(x)); 0472 % Must be inside trust-region 0473 while M.norm(x, eta) > Delta 0474 eta = M.lincomb(x, sqrt(sqrt(eps)), eta); 0475 end 0476 end 0477 0478 % Solve TR subproblem approximately 0479 [eta, Heta, numit, stop_inner] = ... 0480 tCG(problem, x, fgradx, eta, Delta, options, storedb, key); 0481 srstr = tcg_stop_reason{stop_inner}; 0482 0483 % If using randomized approach, compare result with the Cauchy point. 0484 % Convergence proofs assume that we achieve at least (a fraction of) 0485 % the reduction of the Cauchy point. After this if-block, either all 0486 % eta-related quantities have been changed consistently, or none of 0487 % them have changed. 0488 if options.useRand 0489 used_cauchy = false; 0490 % Check the curvature, 0491 Hg = getHessian(problem, x, fgradx, storedb, key); 0492 g_Hg = M.inner(x, fgradx, Hg); 0493 if g_Hg <= 0 0494 tau_c = 1; 0495 else 0496 tau_c = min( norm_grad^3/(Delta*g_Hg) , 1); 0497 end 0498 % and generate the Cauchy point. 0499 eta_c = M.lincomb(x, -tau_c * Delta / norm_grad, fgradx); 0500 Heta_c = M.lincomb(x, -tau_c * Delta / norm_grad, Hg); 0501 0502 % Now that we have computed the Cauchy point in addition to the 0503 % returned eta, we might as well keep the best of them. 0504 mdle = fx + M.inner(x, fgradx, eta) + .5*M.inner(x, Heta, eta); 0505 mdlec = fx + M.inner(x, fgradx, eta_c) + .5*M.inner(x, Heta_c, eta_c); 0506 if mdlec < mdle 0507 eta = eta_c; 0508 Heta = Heta_c; % added April 11, 2012 0509 used_cauchy = true; 0510 end 0511 end 0512 0513 0514 % This is only computed for logging purposes, because it may be useful 0515 % for some user-defined stopping criteria. If this is not cheap for 0516 % specific applications (compared to evaluating the cost), we should 0517 % reconsider this. 0518 norm_eta = M.norm(x, eta); 0519 0520 if options.debug > 0 0521 testangle = M.inner(x, eta, fgradx) / (norm_eta*norm_grad); 0522 end 0523 0524 0525 % Compute the tentative next iterate (the proposal) 0526 x_prop = M.retr(x, eta); 0527 key_prop = storedb.getNewKey(); 0528 0529 % Compute the function value of the proposal 0530 fx_prop = getCost(problem, x_prop, storedb, key_prop); 0531 0532 % Will we accept the proposal or not? 0533 % Check the performance of the quadratic model against the actual cost. 0534 rhonum = fx - fx_prop; 0535 vecrho = M.lincomb(x, 1, fgradx, .5, Heta); 0536 rhoden = -M.inner(x, eta, vecrho); 0537 % rhonum could be anything. 0538 % rhoden should be nonnegative, as guaranteed by tCG, baring numerical 0539 % errors. 0540 0541 % Heuristic -- added Dec. 2, 2013 (NB) to replace the former heuristic. 0542 % This heuristic is documented in the book by Conn Gould and Toint on 0543 % trust-region methods, section 17.4.2. 0544 % rhonum measures the difference between two numbers. Close to 0545 % convergence, these two numbers are very close to each other, so 0546 % that computing their difference is numerically challenging: there may 0547 % be a significant loss in accuracy. Since the acceptance or rejection 0548 % of the step is conditioned on the ratio between rhonum and rhoden, 0549 % large errors in rhonum result in a very large error in rho, hence in 0550 % erratic acceptance / rejection. Meanwhile, close to convergence, 0551 % steps are usually trustworthy and we should transition to a Newton- 0552 % like method, with rho=1 consistently. The heuristic thus shifts both 0553 % rhonum and rhoden by a small amount such that far from convergence, 0554 % the shift is irrelevant and close to convergence, the ratio rho goes 0555 % to 1, effectively promoting acceptance of the step. 0556 % The rationale is that close to convergence, both rhonum and rhoden 0557 % are quadratic in the distance between x and x_prop. Thus, when this 0558 % distance is on the order of sqrt(eps), the value of rhonum and rhoden 0559 % is on the order of eps, which is indistinguishable from the numerical 0560 % error, resulting in badly estimated rho's. 0561 % For abs(fx) < 1, this heuristic is invariant under offsets of f but 0562 % not under scaling of f. For abs(fx) > 1, the opposite holds. This 0563 % should not alarm us, as this heuristic only triggers at the very last 0564 % iterations if very fine convergence is demanded. 0565 rho_reg = max(1, abs(fx)) * eps * options.rho_regularization; 0566 rhonum = rhonum + rho_reg; 0567 rhoden = rhoden + rho_reg; 0568 0569 if options.debug > 0 0570 fprintf('DBG: rhonum : %e\n', rhonum); 0571 fprintf('DBG: rhoden : %e\n', rhoden); 0572 end 0573 0574 % This is always true if a linear, symmetric operator is used for the 0575 % Hessian (approximation) and if we had infinite numerical precision. 0576 % In practice, nonlinear approximations of the Hessian such as the 0577 % built-in finite difference approximation and finite numerical 0578 % accuracy can cause the model to increase. In such scenarios, we 0579 % decide to force a rejection of the step and a reduction of the 0580 % trust-region radius. We test the sign of the regularized rhoden since 0581 % the regularization is supposed to capture the accuracy to which 0582 % rhoden is computed: if rhoden were negative before regularization but 0583 % not after, that should not be (and is not) detected as a failure. 0584 % 0585 % Note (Feb. 17, 2015, NB): the most recent version of tCG already 0586 % includes a mechanism to ensure model decrease if the Cauchy step 0587 % attained a decrease (which is theoretically the case under very lax 0588 % assumptions). This being said, it is always possible that numerical 0589 % errors will prevent this, so that it is good to keep a safeguard. 0590 % 0591 % The current strategy is that, if this should happen, then we reject 0592 % the step and reduce the trust region radius. This also ensures that 0593 % the actual cost values are monotonically decreasing. 0594 % 0595 % [This bit of code seems not to trigger because tCG already ensures 0596 % the model decreases even in the presence of non-linearities; but as 0597 % a result the radius is not necessarily decreased. Perhaps we should 0598 % change this with the proposed commented line below; needs testing.] 0599 % 0600 model_decreased = (rhoden >= 0); 0601 % model_decreased = (rhoden >= 0) && (stop_inner ~= 6); 0602 0603 if ~model_decreased 0604 srstr = [srstr ', model did not decrease']; %#ok<AGROW> 0605 end 0606 0607 rho = rhonum / rhoden; 0608 0609 % Added June 30, 2015 following observation by BM. 0610 % With this modification, it is guaranteed that a step rejection is 0611 % always accompanied by a TR reduction. This prevents stagnation in 0612 % this "corner case" (NaN's really aren't supposed to occur, but it's 0613 % nice if we can handle them nonetheless). 0614 if isnan(rho) 0615 fprintf('rho is NaN! Forcing a radius decrease. This should not happen.\n'); 0616 if isnan(fx_prop) 0617 fprintf('The cost function returned NaN (perhaps the retraction returned a bad point?)\n'); 0618 else 0619 fprintf('The cost function did not return a NaN value.\n'); 0620 end 0621 end 0622 0623 if options.debug > 0 0624 m = @(x, eta) ... 0625 getCost(problem, x, storedb, key) + ... 0626 getDirectionalDerivative(problem, x, eta, storedb, key) + ... 0627 .5*M.inner(x, getHessian(problem, x, eta, storedb, key), eta); 0628 zerovec = M.zerovec(x); 0629 actrho = (fx - fx_prop) / (m(x, zerovec) - m(x, eta)); 0630 fprintf('DBG: new f(x) : %+e\n', fx_prop); 0631 fprintf('DBG: actual rho : %e\n', actrho); 0632 fprintf('DBG: used rho : %e\n', rho); 0633 end 0634 0635 % Choose the new TR radius based on the model performance 0636 trstr = ' '; 0637 % If the actual decrease is smaller than 1/4 of the predicted decrease, 0638 % then reduce the TR radius. 0639 if rho < 1/4 || ~model_decreased || isnan(rho) 0640 trstr = 'TR-'; 0641 Delta = Delta/4; 0642 consecutive_TRplus = 0; 0643 consecutive_TRminus = consecutive_TRminus + 1; 0644 if consecutive_TRminus >= 5 && options.verbosity >= 2 0645 consecutive_TRminus = -inf; 0646 fprintf(' +++ Detected many consecutive TR- (radius decreases).\n'); 0647 fprintf(' +++ Consider decreasing options.Delta_bar by an order of magnitude.\n'); 0648 fprintf(' +++ Current values: options.Delta_bar = %g and options.Delta0 = %g.\n', options.Delta_bar, options.Delta0); 0649 end 0650 % If the actual decrease is at least 3/4 of the precicted decrease and 0651 % the tCG (inner solve) hit the TR boundary, increase the TR radius. 0652 % We also keep track of the number of consecutive trust-region radius 0653 % increases. If there are many, this may indicate the need to adapt the 0654 % initial and maximum radii. 0655 elseif rho > 3/4 && (stop_inner == 1 || stop_inner == 2) 0656 trstr = 'TR+'; 0657 Delta = min(2*Delta, options.Delta_bar); 0658 consecutive_TRminus = 0; 0659 consecutive_TRplus = consecutive_TRplus + 1; 0660 if consecutive_TRplus >= 5 && options.verbosity >= 1 0661 consecutive_TRplus = -inf; 0662 fprintf(' +++ Detected many consecutive TR+ (radius increases).\n'); 0663 fprintf(' +++ Consider increasing options.Delta_bar by an order of magnitude.\n'); 0664 fprintf(' +++ Current values: options.Delta_bar = %g and options.Delta0 = %g.\n', options.Delta_bar, options.Delta0); 0665 end 0666 else 0667 % Otherwise, keep the TR radius constant. 0668 consecutive_TRplus = 0; 0669 consecutive_TRminus = 0; 0670 end 0671 0672 % Choose to accept or reject the proposed step based on the model 0673 % performance. Note the strict inequality. 0674 if model_decreased && rho > options.rho_prime 0675 0676 % April 17, 2018: a side effect of rho_regularization > 0 is that 0677 % it can happen that the cost function appears to go up (although 0678 % only by a small amount) for some accepted steps. We decide to 0679 % accept this because, numerically, computing the difference 0680 % between fx_prop and fx is more difficult than computing the 0681 % improvement in the model, because fx_prop and fx are on the same 0682 % order of magnitude yet are separated by a very small gap near 0683 % convergence, whereas the model improvement is computed as a sum 0684 % of two small terms. As a result, the step which seems bad may 0685 % turn out to be good, in that it may help reduce the gradient norm 0686 % for example. This update merely informs the user of this event. 0687 % In further updates, we could also introduce this as a stopping 0688 % criterion. It is then important to choose wisely which of x or 0689 % x_prop should be returned (perhaps the one with smallest 0690 % gradient?) 0691 if fx_prop > fx && options.verbosity >= 2 0692 fprintf(['Between line above and below, cost function ' ... 0693 'increased by %s (step size: %g)\n'], ... 0694 fx_prop - fx, norm_eta); 0695 end 0696 0697 accept = true; 0698 accstr = 'acc'; 0699 % We accept the step: no need to keep the old cache. 0700 storedb.removefirstifdifferent(key, key_prop); 0701 x = x_prop; 0702 key = key_prop; 0703 fx = fx_prop; 0704 fgradx = getGradient(problem, x, storedb, key); 0705 norm_grad = M.norm(x, fgradx); 0706 else 0707 % We reject the step: no need to keep cache related to the 0708 % tentative step. 0709 storedb.removefirstifdifferent(key_prop, key); 0710 accept = false; 0711 accstr = 'REJ'; 0712 end 0713 0714 % k is the number of iterations we have accomplished. 0715 k = k + 1; 0716 0717 % Make sure we don't use too much memory for the store database 0718 storedb.purge(); 0719 0720 0721 % Log statistics for freshly executed iteration. 0722 % Everything after this in the loop is not accounted for in the timing. 0723 stats = savestats(problem, x, storedb, key, options, k, fx, ... 0724 norm_grad, Delta, ticstart, info, rho, rhonum, ... 0725 rhoden, accept, numit, norm_eta, used_cauchy); 0726 info(k+1) = stats; 0727 0728 0729 % ** Display: 0730 if options.verbosity == 2 0731 fprintf(['%3s %3s k: %5d num_inner: %5d ', ... 0732 'f: %+e |grad|: %e %s\n'], ... 0733 accstr,trstr,k,numit,fx,norm_grad,srstr); 0734 elseif options.verbosity > 2 0735 if options.useRand && used_cauchy 0736 fprintf('USED CAUCHY POINT\n'); 0737 end 0738 fprintf('%3s %3s k: %5d num_inner: %5d %s\n', ... 0739 accstr, trstr, k, numit, srstr); 0740 fprintf(' f(x) : %+e |grad| : %e\n',fx,norm_grad); 0741 if options.debug > 0 0742 fprintf(' Delta : %f |eta| : %e\n',Delta,norm_eta); 0743 end 0744 fprintf(' rho : %e\n',rho); 0745 end 0746 if options.debug > 0 0747 fprintf('DBG: cos ang(eta,gradf): %d\n',testangle); 0748 if rho == 0 0749 fprintf('DBG: rho = 0, this will likely hinder further convergence.\n'); 0750 end 0751 end 0752 0753 end % of TR loop (counter: k) 0754 0755 % Restrict info struct-array to useful part 0756 info = info(1:k+1); 0757 0758 0759 if (options.verbosity > 2) || (options.debug > 0) 0760 fprintf('************************************************************************\n'); 0761 end 0762 if (options.verbosity > 0) || (options.debug > 0) 0763 fprintf('Total time is %f [s] (excludes statsfun)\n', info(end).time); 0764 end 0765 0766 % Return the best cost reached 0767 cost = fx; 0768 0769 end 0770 0771 0772 0773 0774 0775 % Routine in charge of collecting the current iteration stats 0776 function stats = savestats(problem, x, storedb, key, options, k, fx, ... 0777 norm_grad, Delta, ticstart, info, rho, rhonum, ... 0778 rhoden, accept, numit, norm_eta, used_cauchy) 0779 stats.iter = k; 0780 stats.cost = fx; 0781 stats.gradnorm = norm_grad; 0782 stats.Delta = Delta; 0783 if k == 0 0784 stats.time = toc(ticstart); 0785 stats.rho = inf; 0786 stats.rhonum = NaN; 0787 stats.rhoden = NaN; 0788 stats.accepted = true; 0789 stats.numinner = 0; 0790 stats.stepsize = NaN; 0791 if options.useRand 0792 stats.cauchy = false; 0793 end 0794 else 0795 stats.time = info(k).time + toc(ticstart); 0796 stats.rho = rho; 0797 stats.rhonum = rhonum; 0798 stats.rhoden = rhoden; 0799 stats.accepted = accept; 0800 stats.numinner = numit; 0801 stats.stepsize = norm_eta; 0802 if options.useRand 0803 stats.cauchy = used_cauchy; 0804 end 0805 end 0806 0807 % See comment about statsfun above: the x and store passed to statsfun 0808 % are that of the most recently accepted point after the iteration 0809 % fully executed. 0810 stats = applyStatsfun(problem, x, storedb, key, options, stats); 0811 0812 end

Generated on Mon 10-Sep-2018 11:48:06 by