Truncated (Steihaug-Toint) Conjugate-Gradient method with caching. minimize <eta,grad> + .5*<eta,Hess(eta)> subject to <eta,eta>_[inverse precon] <= Delta^2 function trsoutput = trs_tCG_cached(problem, trsinput, options, storedb, key) trs_tCG_cached stores information (when options.trscache == true) which can help avoid redundant computations (using tCG_rejectedstep) upon step rejection by trustregions compared to trs_tCG at the cost of using extra memory. Inputs: problem: Manopt optimization problem structure trsinput: structure with the following fields: x: point on the manifold problem.M fgradx: gradient of the cost function of the problem at x Delta = trust-region radius options: structure containing options for the subproblem solver storedb, key: manopt's caching system for the point x Options specific to this subproblem solver: kappa (0.1) kappa convergence tolerance. kappa > 0 is the linear convergence target rate: trs_tCG_cached terminates early if the residual was reduced by a factor of kappa. theta (1.0) theta convergence tolerance. 1+theta (theta between 0 and 1) is the superlinear convergence target rate. trs_tCG_cached terminates early if the residual was reduced by a power of 1+theta. mininner (1) Minimum number of inner iterations. maxinner (problem.M.dim()) Maximum number of inner iterations. trscache (true) Set to false if no caching for the trs_tCG_cached is desired. It is default true to improve computation time if there are many step rejections in trustregions. Setting trscache to false can reduce memory usage. memorytCG_warningtol (1000) Tolerance memory value in MB before issuing warning when trscache = true. The default is 1GB but this value can be increased depending on the user's machine. To disable the warning completely use: warning('off', 'manopt:trs_tCG_cached:memory') Output: the structure trsoutput contains the following fields: eta: approximate solution to the trust-region subproblem at x Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it is often naturally available to the subproblem solver at the end of execution, so that it may be cheaper to return it here. limitedbyTR: true if a boundary solution is returned printstr: logged information to be printed by trustregions. stats: structure with the following statistics: numinner: number of inner loops before returning hessvecevals: number of Hessian calls issued memorytCG_MB: memory of store_iters and store_last in MB Stored Information: store_iters: a struct array with enough information to compute the next step upon step rejection when the algorithm exits due to negative curvature or trust-region radius violation. store_last: an additional struct to store_iters to compute the next step upon step rejection when the algorithm exits but not due to negative curvature or trust-region radius violation trs_tCG_cached can also be called in the following way (by trustregions) to obtain part of the header to print and an initial stats structure: function trsoutput = trs_tCG_cached([], [], options) In this case trsoutput contains the following fields: printheader: subproblem header to be printed before the first pass of trustregions initstats: struct with initial values for stored stats in subsequent calls to trs_tCG_cached. Used in the first call to savestats in trustregions to initialize the info struct properly. See also: trustregions trs_tCG tCG_rejectedstep
0001 function trsoutput = trs_tCG_cached(problem, trsinput, options, storedb, key) 0002 % Truncated (Steihaug-Toint) Conjugate-Gradient method with caching. 0003 % 0004 % minimize <eta,grad> + .5*<eta,Hess(eta)> 0005 % subject to <eta,eta>_[inverse precon] <= Delta^2 0006 % 0007 % function trsoutput = trs_tCG_cached(problem, trsinput, options, storedb, key) 0008 % 0009 % trs_tCG_cached stores information (when options.trscache == true) 0010 % which can help avoid redundant computations (using tCG_rejectedstep) 0011 % upon step rejection by trustregions compared to trs_tCG at the cost of 0012 % using extra memory. 0013 % 0014 % Inputs: 0015 % problem: Manopt optimization problem structure 0016 % trsinput: structure with the following fields: 0017 % x: point on the manifold problem.M 0018 % fgradx: gradient of the cost function of the problem at x 0019 % Delta = trust-region radius 0020 % options: structure containing options for the subproblem solver 0021 % storedb, key: manopt's caching system for the point x 0022 % 0023 % Options specific to this subproblem solver: 0024 % kappa (0.1) 0025 % kappa convergence tolerance. 0026 % kappa > 0 is the linear convergence target rate: trs_tCG_cached 0027 % terminates early if the residual was reduced by a factor of kappa. 0028 % theta (1.0) 0029 % theta convergence tolerance. 0030 % 1+theta (theta between 0 and 1) is the superlinear convergence 0031 % target rate. trs_tCG_cached terminates early if the residual 0032 % was reduced by a power of 1+theta. 0033 % mininner (1) 0034 % Minimum number of inner iterations. 0035 % maxinner (problem.M.dim()) 0036 % Maximum number of inner iterations. 0037 % trscache (true) 0038 % Set to false if no caching for the trs_tCG_cached is desired. It is 0039 % default true to improve computation time if there are many step 0040 % rejections in trustregions. Setting trscache to false can reduce 0041 % memory usage. 0042 % memorytCG_warningtol (1000) 0043 % Tolerance memory value in MB before issuing warning when 0044 % trscache = true. 0045 % The default is 1GB but this value can be increased depending on the 0046 % user's machine. To disable the warning completely use: 0047 % warning('off', 'manopt:trs_tCG_cached:memory') 0048 % 0049 % Output: the structure trsoutput contains the following fields: 0050 % eta: approximate solution to the trust-region subproblem at x 0051 % Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it 0052 % is often naturally available to the subproblem solver at the 0053 % end of execution, so that it may be cheaper to return it here. 0054 % limitedbyTR: true if a boundary solution is returned 0055 % printstr: logged information to be printed by trustregions. 0056 % stats: structure with the following statistics: 0057 % numinner: number of inner loops before returning 0058 % hessvecevals: number of Hessian calls issued 0059 % memorytCG_MB: memory of store_iters and store_last in MB 0060 % 0061 % Stored Information: 0062 % store_iters: a struct array with enough information to compute the next 0063 % step upon step rejection when the algorithm exits due to negative 0064 % curvature or trust-region radius violation. 0065 % store_last: an additional struct to store_iters to compute the next 0066 % step upon step rejection when the algorithm exits but not due to 0067 % negative curvature or trust-region radius violation 0068 % 0069 % 0070 % trs_tCG_cached can also be called in the following way (by trustregions) 0071 % to obtain part of the header to print and an initial stats structure: 0072 % 0073 % function trsoutput = trs_tCG_cached([], [], options) 0074 % 0075 % In this case trsoutput contains the following fields: 0076 % printheader: subproblem header to be printed before the first pass of 0077 % trustregions 0078 % initstats: struct with initial values for stored stats in subsequent 0079 % calls to trs_tCG_cached. Used in the first call to savestats 0080 % in trustregions to initialize the info struct properly. 0081 % 0082 % See also: trustregions trs_tCG tCG_rejectedstep 0083 0084 % This file is part of Manopt: www.manopt.org. 0085 % This code is an adaptation to Manopt of the original GenRTR code: 0086 % RTR - Riemannian Trust-Region 0087 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan 0088 % Florida State University 0089 % School of Computational Science 0090 % (http://www.math.fsu.edu/~cbaker/GenRTR/?page=download) 0091 % See accompanying license file. 0092 % The adaptation was executed by Nicolas Boumal. 0093 % 0094 % Change log: 0095 % 0096 % VL June 24, 2022: 0097 % trs_tCG_cached by default stores information at each iteration 0098 % compared to trs_tCG. 0099 % This can be useful for the next call to trs_tCG_cached and the work 0100 % is passed to tCG_rejectedstep rather than the normal tCG loop. 0101 0102 0103 % See trs_tCG for references to relevant equations in 0104 % [CGT2000] Conn, Gould and Toint: Trust-region methods, 2000. 0105 0106 % trustregions only wants header and default values for stats. 0107 if nargin == 3 && isempty(problem) && isempty(trsinput) 0108 trsoutput.printheader = ''; 0109 if options.verbosity == 2 0110 trsoutput.printheader = sprintf('%9s %9s %9s %s', ... 0111 'numinner', 'hessvec', 'numstored', ... 0112 'stopreason'); 0113 elseif options.verbosity > 2 0114 trsoutput.printheader = sprintf('%9s %9s %9s %9s %s', ... 0115 'numinner', 'hessvec', 'numstored', ... 0116 'memtCG_MB', 'stopreason'); 0117 end 0118 trsoutput.initstats = struct('numinner', 0, 'hessvecevals', 0, ... 0119 'memorytCG_MB', 0); 0120 return; 0121 end 0122 0123 if isfield(options, 'useRand') && options.useRand 0124 warning('manopt:trs_tCG_cached:rand', ... 0125 ['options.useRand = true but @trs_tCG_cached ignores it.\n' ... 0126 'You may set options.subproblemsolver = @trs_tCG;\n', ... 0127 'Alternatively, set options.useRand = false;']); 0128 end 0129 0130 x = trsinput.x; 0131 Delta = trsinput.Delta; 0132 grad = trsinput.fgradx; 0133 0134 inner = @(u, v) problem.M.inner(x, u, v); 0135 lincomb = @(a, u, b, v) problem.M.lincomb(x, a, u, b, v); 0136 tangent = @(u) problem.M.tangent(x, u); 0137 0138 % Set local defaults here 0139 localdefaults.kappa = 0.1; 0140 localdefaults.theta = 1.0; 0141 localdefaults.mininner = 1; 0142 localdefaults.maxinner = problem.M.dim(); 0143 localdefaults.trscache = true; 0144 localdefaults.memorytCG_warningtol = 1000; 0145 0146 % Merge local defaults with user options, if any 0147 if ~exist('options', 'var') || isempty(options) 0148 options = struct(); 0149 end 0150 options = mergeOptions(localdefaults, options); 0151 0152 % If the previous step was rejected and we want to use caching, 0153 if ~trsinput.accept && options.trscache 0154 % Then check if there is cached information for the current point. 0155 store = storedb.get(key); 0156 if isfield(store, 'store_iters') 0157 % If so, use that cache to produce the same output as would have 0158 % been produced by running the code below (after the 'return'), but 0159 % without issuing Hessian-vector calls that were already issued. 0160 trsoutput = tCG_rejectedstep(problem, trsinput, options, store); 0161 return; 0162 end 0163 end 0164 0165 % returned boolean to trustregions. true if we are limited by the TR 0166 % boundary (returns boundary solution). Otherwise false. 0167 limitedbyTR = false; 0168 0169 theta = options.theta; 0170 kappa = options.kappa; 0171 0172 eta = problem.M.zerovec(x); 0173 Heta = problem.M.zerovec(x); 0174 r = grad; 0175 e_Pe = 0; 0176 0177 r_r = inner(r, r); 0178 norm_r = sqrt(r_r); 0179 norm_r0 = norm_r; 0180 0181 % Precondition the residual. 0182 z = getPrecon(problem, x, r, storedb, key); 0183 0184 % Compute z'*r. 0185 z_r = inner(z, r); 0186 d_Pd = z_r; 0187 0188 % Initial search direction (we maintain -delta in memory, called mdelta, to 0189 % avoid a change of sign of the tangent vector.) 0190 mdelta = z; 0191 e_Pd = 0; 0192 0193 % If the Hessian or a linear Hessian approximation is in use, it is 0194 % theoretically guaranteed that the model value decreases strictly 0195 % with each iteration of trs_tCG. Hence, there is no need to monitor the model 0196 % value. But, when a nonlinear Hessian approximation is used (such as the 0197 % built-in finite-difference approximation for example), the model may 0198 % increase. It is then important to terminate the trs_tCG iterations and return 0199 % the previous (the best-so-far) iterate. The variable below will hold the 0200 % model value. 0201 % 0202 % This computation could be further improved based on Section 17.4.1 in 0203 % Conn, Gould, Toint, Trust Region Methods, 2000. 0204 % If we make this change, then also modify trustregions to gather this 0205 % value from trs_tCG rather than recomputing it itself. 0206 model_fun = @(eta, Heta) inner(eta, grad) + .5*inner(eta, Heta); 0207 model_value = 0; 0208 0209 % Pre-assume termination because j == end. 0210 stopreason_str = 'maximum inner iterations'; 0211 0212 % Track certain iterations in case step is rejected. 0213 % store_iters tracks candidate etas with increasing squared 0214 % norm relevant when limitedbyTR = true, or when <eta, Heta> <= 0 0215 store_iters = struct('normsq', [], 'numinner', [], 'e_Pe', [], ... 0216 'd_Pd', [], 'e_Pd', [], 'd_Hd', [], 'eta', [], 'Heta', [], ... 0217 'mdelta', [], 'Hmdelta', []); 0218 0219 max_normsq = 0; 0220 0221 % only need to compute memory for one item in store_iters in Megabytes(MB) 0222 peritermemory_MB = 0; 0223 0224 % total cached memory stored in MB 0225 memorytCG_MB = 0; 0226 0227 % number of iterations where trs_tCG_cached stores information. This value 0228 % will be length(store_iters) plus 1 if store_last is used. 0229 numstored = 0; 0230 0231 % string that is printed by trustregions. For printing 0232 % per-iteration information 0233 printstr = ''; 0234 0235 % Begin inner/trs_tCG loop. 0236 for j = 1 : options.maxinner 0237 0238 % This call is the computationally expensive step. 0239 Hmdelta = getHessian(problem, x, mdelta, storedb, key); 0240 0241 % Compute curvature (often called kappa). 0242 d_Hd = inner(mdelta, Hmdelta); 0243 0244 0245 % Note that if d_Hd == 0, we will exit at the next "if" anyway. 0246 alpha = z_r/d_Hd; 0247 % <neweta,neweta>_P = 0248 % <eta,eta>_P + 2*alpha*<eta,delta>_P + alpha*alpha*<delta,delta>_P 0249 e_Pe_new = e_Pe + 2.0*alpha*e_Pd + alpha*alpha*d_Pd; 0250 0251 if options.debug > 2 0252 fprintf('DBG: (r,r) : %e\n', r_r); 0253 fprintf('DBG: (d,Hd) : %e\n', d_Hd); 0254 fprintf('DBG: alpha : %e\n', alpha); 0255 end 0256 0257 if options.trscache 0258 % Selectively store info in store_iter. 0259 % next_smallest = (1/4^n Delta)^2 with n the smallest integer such 0260 % that max_normsq <= next_smallest. 0261 % We use this condition to only store relevant iterations in case 0262 % of rejection in trustregions. 0263 if max_normsq > 0 0264 next_smallest = (1/16)^floor(-(1/4)*(log2(max_normsq) - ... 0265 log2(Delta^2))) * Delta^2; 0266 else 0267 next_smallest = 0; 0268 end 0269 0270 if d_Hd <= 0 || e_Pe_new >= next_smallest 0271 numstored = numstored + 1; 0272 0273 store_iters(numstored) = struct('normsq', e_Pe_new, 'numinner', ... 0274 j, 'e_Pe', e_Pe, 'd_Pd', d_Pd, 'e_Pd', e_Pd,... 0275 'd_Hd', d_Hd, 'eta', eta, 'Heta', Heta, ... 0276 'mdelta', mdelta, 'Hmdelta', Hmdelta); 0277 max_normsq = e_Pe_new; 0278 0279 % getSize for one entry in store_iters which will be the same 0280 % for all others. 0281 if peritermemory_MB == 0 0282 peritermemory_MB = getsize(store_iters(numstored))/1024^2; 0283 end 0284 0285 memorytCG_MB = memorytCG_MB + peritermemory_MB; 0286 0287 if memorytCG_MB > options.memorytCG_warningtol 0288 warning('manopt:trs_tCG_cached:memory', ... 0289 [sprintf('trs_tCG_cached will cache %.2f [MB] for at least one iteration of trustregions until a step is accepted.', memorytCG_MB) ... 0290 'If memory is limited turn off caching by options.trscache = false.\n' ... 0291 'To disable this warning: warning(''off'', ''manopt:trs_tCG_cached:memory'')']); 0292 end 0293 0294 end 0295 end 0296 0297 % Check against negative curvature and trust-region radius violation. 0298 % If either condition triggers, we bail out. 0299 if d_Hd <= 0 || e_Pe_new >= Delta^2 0300 % want 0301 % ee = <eta,eta>_prec,x 0302 % ed = <eta,delta>_prec,x 0303 % dd = <delta,delta>_prec,x 0304 % Note (Nov. 26, 2021, NB): numerically, it might be better to call 0305 % tau = max(real(roots([d_Pd, 2*e_Pd, e_Pe-Delta^2]))); 0306 % This should be checked. 0307 % Also, we should safe-guard against 0/0: could happen if grad = 0. 0308 0309 % store new struct containing all the required info in store_iter 0310 tau = (-e_Pd + sqrt(e_Pd*e_Pd + d_Pd*(Delta^2-e_Pe))) / d_Pd; 0311 if options.debug > 2 0312 fprintf('DBG: tau : %e\n', tau); 0313 end 0314 eta = lincomb(1, eta, -tau, mdelta); 0315 0316 % If only a nonlinear Hessian approximation is available, this is 0317 % only approximately correct, but saves an additional Hessian call. 0318 Heta = lincomb(1, Heta, -tau, Hmdelta); 0319 0320 % Technically, we may want to verify that this new eta is indeed 0321 % better than the previous eta before returning it (this is always 0322 % the case if the Hessian approximation is linear, but I am unsure 0323 % whether it is the case or not for nonlinear approximations.) 0324 % At any rate, the impact should be limited, so in the interest of 0325 % code conciseness (if we can still hope for that), we omit this. 0326 0327 limitedbyTR = true; 0328 0329 if d_Hd <= 0 0330 stopreason_str = 'negative curvature'; 0331 else 0332 stopreason_str = 'exceeded trust region'; 0333 end 0334 break; 0335 end 0336 0337 % No negative curvature and eta_prop inside TR: accept it. 0338 e_Pe = e_Pe_new; 0339 new_eta = lincomb(1, eta, -alpha, mdelta); 0340 0341 % If only a nonlinear Hessian approximation is available, this is 0342 % only approximately correct, but saves an additional Hessian call. 0343 % TODO: this computation is redundant with that of r, L241. Clean up. 0344 new_Heta = lincomb(1, Heta, -alpha, Hmdelta); 0345 0346 % Verify that the model cost decreased in going from eta to new_eta. If 0347 % it did not (which can only occur if the Hessian approximation is 0348 % nonlinear or because of numerical errors), then we return the 0349 % previous eta (which necessarily is the best reached so far, according 0350 % to the model cost). Otherwise, we accept the new eta and go on. 0351 new_model_value = model_fun(new_eta, new_Heta); 0352 if new_model_value >= model_value 0353 stopreason_str = 'model increased'; 0354 break; 0355 end 0356 0357 eta = new_eta; 0358 Heta = new_Heta; 0359 model_value = new_model_value; %% added Feb. 17, 2015 0360 0361 % Update the residual. 0362 r = lincomb(1, r, -alpha, Hmdelta); 0363 0364 % Compute new norm of r. 0365 r_r = inner(r, r); 0366 norm_r = sqrt(r_r); 0367 0368 % Check kappa/theta stopping criterion. 0369 % Note that it is somewhat arbitrary whether to check this stopping 0370 % criterion on the r's (the gradients) or on the z's (the 0371 % preconditioned gradients). [CGT2000], page 206, mentions both as 0372 % acceptable criteria. 0373 if j >= options.mininner && norm_r <= norm_r0*min(norm_r0^theta, kappa) 0374 % Residual is small enough to quit 0375 if kappa < norm_r0^theta 0376 stopreason_str = 'reached target residual-kappa (linear)'; 0377 else 0378 stopreason_str = 'reached target residual-theta (superlinear)'; 0379 end 0380 break; 0381 end 0382 0383 % Precondition the residual. 0384 z = getPrecon(problem, x, r, storedb, key); 0385 0386 % Save the old z'*r. 0387 zold_rold = z_r; 0388 % Compute new z'*r. 0389 z_r = inner(z, r); 0390 0391 % Compute new search direction. 0392 beta = z_r/zold_rold; 0393 mdelta = lincomb(1, z, beta, mdelta); 0394 0395 % Since mdelta is passed to getHessian, which is the part of the code 0396 % we have least control over from here, we want to make sure mdelta is 0397 % a tangent vector up to numerical errors that should remain small. 0398 % For this reason, we re-project mdelta to the tangent space. 0399 % In limited tests, it was observed that it is a good idea to project 0400 % at every iteration rather than only every k iterations, the reason 0401 % being that loss of tangency can lead to more inner iterations being 0402 % run, which leads to an overall higher computational cost. 0403 mdelta = tangent(mdelta); 0404 0405 % Update new P-norms and P-dots [CGT2000, eq. 7.5.6 & 7.5.7]. 0406 e_Pd = beta*(e_Pd + alpha*d_Pd); 0407 d_Pd = z_r + beta*beta*d_Pd; 0408 0409 end % of trs_tCG loop 0410 0411 if options.trscache 0412 store = storedb.get(key); 0413 store.store_iters = store_iters; 0414 if ~limitedbyTR 0415 % Store extra information since we did not exit because we were 0416 % limited by TR (model value increased or kappa/theta stopping 0417 % criterion satisfied) 0418 store_last = struct('numinner', j, 'stopreason_str', ... 0419 stopreason_str, 'eta', eta, 'Heta', Heta); 0420 memorytCG_MB = memorytCG_MB + getsize(store_last)/1024^2; 0421 0422 if memorytCG_MB > options.memorytCG_warningtol 0423 warning('manopt:trs_tCG_cached:memory', ... 0424 [sprintf('trs_tCG_cached will cache %.2f [MB] for at least one iteration of trustregions until a step is accepted.', memorytCG_MB) ... 0425 'If memory is limited turn off caching by options.trscache = false.\n' ... 0426 'If more memory can be used without problem increase options.memorytCG_warningtol accordingly.\n' ... 0427 'To disable this warning: warning(''off'', ''manopt:trs_tCG_cached:memory'')']); 0428 end 0429 store.store_last = store_last; 0430 0431 numstored = numstored + 1; 0432 end 0433 0434 storedb.set(store, key); 0435 end 0436 stats = struct('numinner', j, 'hessvecevals', j, ... 0437 'memorytCG_MB', memorytCG_MB); 0438 0439 if options.verbosity == 2 0440 printstr = sprintf('%9d %9d %9d %s', j, j, numstored, ... 0441 stopreason_str); 0442 elseif options.verbosity > 2 0443 printstr = sprintf('%9d %9d %9d %9.2f %s', j, j, numstored, ... 0444 memorytCG_MB, stopreason_str); 0445 end 0446 0447 trsoutput.eta = eta; 0448 trsoutput.Heta = Heta; 0449 trsoutput.limitedbyTR = limitedbyTR; 0450 trsoutput.printstr = printstr; 0451 trsoutput.stats = stats; 0452 end