Truncated (Steihaug-Toint) Conjugate-Gradient method. minimize <eta,grad> + .5*<eta,Hess(eta)> subject to <eta,eta>_[inverse precon] <= Delta^2 function trsoutput = trs_tCG(problem, trsinput, options, storedb, key) Inputs: problem: Manopt optimization problem structure trsinput: structure with the following fields: x: point on the manifold problem.M fgradx: gradient of the cost function of the problem at x vector if options.useRand == true. Delta = trust-region radius options: structure containing options for the subproblem solver storedb, key: manopt's caching system for the point x Options specific to this subproblem solver: kappa (0.1) kappa convergence tolerance. kappa > 0 is the linear convergence target rate: trs_tCG will terminate early if the residual was reduced by a factor of kappa. theta (1.0) theta convergence tolerance. 1+theta (theta between 0 and 1) is the superlinear convergence target rate. trs_tCG will terminate early if the residual was reduced by a power of 1+theta. mininner (1) Minimum number of inner iterations for trs_tCG. maxinner (problem.M.dim() : the manifold's dimension) Maximum number of inner iterations for trs_tCG. useRand (false) Set to true if the trust-region solve is to be initiated with a random tangent vector. If set to true, no preconditioner will be used. This option is set to true in some scenarios to escape saddle points, but is otherwise seldom activated. Output: the structure trsoutput contains the following fields: eta: approximate solution to the trust-region subproblem at x Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it is often naturally available to the subproblem solver at the end of execution, so that it may be cheaper to return it here. limitedbyTR: true if a boundary solution is returned printstr: logged information to be printed by trustregions. stats: structure with the following statistics: numinner: number of inner loops before returning hessvecevals: number of Hessian calls issued cauchy: true if cauchy step was used. false by default. Will be present in stats only if options.useRand = true trs_tCG can also be called in the following way (by trustregions) to obtain part of the header to print and an initial stats structure: function trsoutput = trs_tCG([], [], options) In this case trsoutput contains the following fields: printheader: subproblem header to be printed before the first loop of trustregions initstats: struct with initial values for stored stats in subsequent calls to trs_tCG. Used in the first call to savestats in trustregions to initialize the info struct properly. See also: trustregions trs_tCG_cached trs_gep
0001 function trsoutput = trs_tCG(problem, trsinput, options, storedb, key) 0002 % Truncated (Steihaug-Toint) Conjugate-Gradient method. 0003 % 0004 % minimize <eta,grad> + .5*<eta,Hess(eta)> 0005 % subject to <eta,eta>_[inverse precon] <= Delta^2 0006 % 0007 % function trsoutput = trs_tCG(problem, trsinput, options, storedb, key) 0008 % 0009 % Inputs: 0010 % problem: Manopt optimization problem structure 0011 % trsinput: structure with the following fields: 0012 % x: point on the manifold problem.M 0013 % fgradx: gradient of the cost function of the problem at x 0014 % vector if options.useRand == true. 0015 % Delta = trust-region radius 0016 % options: structure containing options for the subproblem solver 0017 % storedb, key: manopt's caching system for the point x 0018 % 0019 % Options specific to this subproblem solver: 0020 % kappa (0.1) 0021 % kappa convergence tolerance. 0022 % kappa > 0 is the linear convergence target rate: trs_tCG will 0023 % terminate early if the residual was reduced by a factor of kappa. 0024 % theta (1.0) 0025 % theta convergence tolerance. 0026 % 1+theta (theta between 0 and 1) is the superlinear convergence 0027 % target rate. trs_tCG will terminate early if the residual was 0028 % reduced by a power of 1+theta. 0029 % mininner (1) 0030 % Minimum number of inner iterations for trs_tCG. 0031 % maxinner (problem.M.dim() : the manifold's dimension) 0032 % Maximum number of inner iterations for trs_tCG. 0033 % useRand (false) 0034 % Set to true if the trust-region solve is to be initiated with a 0035 % random tangent vector. If set to true, no preconditioner will be 0036 % used. This option is set to true in some scenarios to escape saddle 0037 % points, but is otherwise seldom activated. 0038 % 0039 % Output: the structure trsoutput contains the following fields: 0040 % eta: approximate solution to the trust-region subproblem at x 0041 % Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it 0042 % is often naturally available to the subproblem solver at the 0043 % end of execution, so that it may be cheaper to return it here. 0044 % limitedbyTR: true if a boundary solution is returned 0045 % printstr: logged information to be printed by trustregions. 0046 % stats: structure with the following statistics: 0047 % numinner: number of inner loops before returning 0048 % hessvecevals: number of Hessian calls issued 0049 % cauchy: true if cauchy step was used. false by default. Will 0050 % be present in stats only if options.useRand = true 0051 % 0052 % trs_tCG can also be called in the following way (by trustregions) to 0053 % obtain part of the header to print and an initial stats structure: 0054 % 0055 % function trsoutput = trs_tCG([], [], options) 0056 % 0057 % In this case trsoutput contains the following fields: 0058 % printheader: subproblem header to be printed before the first loop of 0059 % trustregions 0060 % initstats: struct with initial values for stored stats in subsequent 0061 % calls to trs_tCG. Used in the first call to savestats 0062 % in trustregions to initialize the info struct properly. 0063 % 0064 % See also: trustregions trs_tCG_cached trs_gep 0065 0066 % This file is part of Manopt: www.manopt.org. 0067 % This code is an adaptation to Manopt of the original GenRTR code: 0068 % RTR - Riemannian Trust-Region 0069 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan 0070 % Florida State University 0071 % School of Computational Science 0072 % (http://www.math.fsu.edu/~cbaker/GenRTR/?page=download) 0073 % See accompanying license file. 0074 % The adaptation was executed by Nicolas Boumal. 0075 % 0076 % Change log: 0077 % 0078 % NB Feb. 12, 2013: 0079 % We do not project r back to the tangent space anymore: it was not 0080 % necessary, and as of Manopt 1.0.1, the proj operator does not 0081 % coincide with this notion anymore. 0082 % 0083 % NB April 3, 2013: 0084 % tCG now also returns Heta, the Hessian at x along eta. Additional 0085 % esthetic modifications. 0086 % 0087 % NB Dec. 2, 2013: 0088 % If options.useRand is activated, we now make sure the preconditio- 0089 % ner is not used, as was originally intended in GenRTR. In time, we 0090 % may want to investigate whether useRand can be modified to work well 0091 % with preconditioning too. 0092 % 0093 % NB Jan. 9, 2014: 0094 % Now checking explicitly for model decrease at each iteration. The 0095 % first iteration is a Cauchy point, which necessarily realizes a 0096 % decrease of the model cost. If a model increase is witnessed 0097 % (which is theoretically impossible if a linear operator is used for 0098 % the Hessian approximation), then we return the previous eta. This 0099 % ensures we always achieve at least the Cauchy decrease, which 0100 % should be sufficient for convergence. 0101 % 0102 % NB Feb. 17, 2015: 0103 % The previous update was in effect verifying that the current eta 0104 % performed at least as well as the first eta (the Cauchy step) with 0105 % respect to the model cost. While this is an acceptable strategy, 0106 % the documentation (and the original intent) was to ensure a 0107 % monotonic decrease of the model cost at each new eta. This is now 0108 % the case, with the added line: "model_value = new_model_value;". 0109 % 0110 % NB April 3, 2015: 0111 % Works with the new StoreDB class system. 0112 % 0113 % NB April 17, 2018: 0114 % Two changes: 0115 % (a) Instead of updating delta and Hdelta, we now update -delta and 0116 % -Hdelta, named mdelta and Hmdelta. This allows to spare one 0117 % call to lincomb(x, -1, z). 0118 % (b) We re-project mdelta to the tangent space at every iteration, 0119 % to avoid drifting away from it. The choice to project mdelta 0120 % specifically is motivated by the fact that this is the vector 0121 % passed to getHessian, hence this is where accurate tangency 0122 % may be most important. (All other operations are linear 0123 % combinations of tangent vectors, which should be fairly safe.) 0124 % 0125 % VL June 29, 2022: 0126 % Renamed tCG to trs_tCG to keep consistent naming with new 0127 % subproblem solvers for trustregion. Also modified inputs and 0128 % outputs to be compatible with other subproblem solvers. 0129 0130 % All terms involving the trust-region radius use an inner product 0131 % w.r.t. the preconditioner; this is because the iterates grow in 0132 % length w.r.t. the preconditioner, guaranteeing that we do not 0133 % re-enter the trust-region. 0134 % 0135 % The following recurrences for Prec-based norms and inner 0136 % products come from [CGT2000], pg. 205, first edition. 0137 % Below, P is the preconditioner. 0138 % 0139 % <eta_k,P*delta_k> = 0140 % beta_k-1 * ( <eta_k-1,P*delta_k-1> + alpha_k-1 |delta_k-1|^2_P ) 0141 % |delta_k|^2_P = <r_k,z_k> + beta_k-1^2 |delta_k-1|^2_P 0142 % 0143 % Therefore, we need to keep track of: 0144 % 1) |delta_k|^2_P 0145 % 2) <eta_k,P*delta_k> = <eta_k,delta_k>_P 0146 % 3) |eta_k |^2_P 0147 % 0148 % Initial values are given by 0149 % |delta_0|_P = <r,z> 0150 % |eta_0|_P = 0 0151 % <eta_0,delta_0>_P = 0 0152 % because we take eta_0 = 0 (if useRand = false). 0153 % 0154 % [CGT2000] Conn, Gould and Toint: Trust-region methods, 2000. 0155 0156 % trustregions only wants header and default values for stats. 0157 if nargin == 3 && isempty(problem) && isempty(trsinput) 0158 if isfield(options, 'useRand') && options.useRand 0159 trsoutput.printheader = sprintf('%9s %9s %11s %s', ... 0160 'numinner', 'hessvec', 'used_cauchy', ... 0161 'stopreason'); 0162 trsoutput.initstats = struct('numinner', 0, 'hessvecevals', 0, ... 0163 'cauchy', false); 0164 else 0165 trsoutput.printheader = sprintf('%9s %9s %s', 'numinner', ... 0166 'hessvec', 'stopreason'); 0167 trsoutput.initstats = struct('numinner', 0, 'hessvecevals', 0); 0168 end 0169 return; 0170 end 0171 0172 x = trsinput.x; 0173 Delta = trsinput.Delta; 0174 grad = trsinput.fgradx; 0175 0176 M = problem.M; 0177 0178 inner = @(u, v) M.inner(x, u, v); 0179 lincomb = @(a, u, b, v) M.lincomb(x, a, u, b, v); 0180 tangent = @(u) M.tangent(x, u); 0181 0182 % Set local defaults here 0183 localdefaults.kappa = 0.1; 0184 localdefaults.theta = 1.0; 0185 localdefaults.mininner = 1; 0186 localdefaults.maxinner = M.dim(); 0187 localdefaults.useRand = false; 0188 0189 % Merge local defaults with user options, if any 0190 if ~exist('options', 'var') || isempty(options) 0191 options = struct(); 0192 end 0193 options = mergeOptions(localdefaults, options); 0194 0195 theta = options.theta; 0196 kappa = options.kappa; 0197 0198 % returned boolean to trustregions.m. true if we are limited by the TR 0199 % boundary (returns boundary solution). Otherwise false. 0200 limitedbyTR = false; 0201 0202 % Determine eta0 and other useRand dependent initializations 0203 if ~options.useRand 0204 % Pick the zero vector 0205 eta = M.zerovec(x); 0206 Heta = M.zerovec(x); 0207 r = grad; 0208 e_Pe = 0; 0209 else 0210 % Random vector in T_x M (this has to be very small) 0211 eta = M.lincomb(x, 1e-6, M.randvec(x)); 0212 % Must be inside trust-region 0213 while M.norm(x, eta) > Delta 0214 eta = M.lincomb(x, sqrt(sqrt(eps)), eta); 0215 end 0216 Heta = getHessian(problem, x, eta, storedb, key); 0217 r = lincomb(1, grad, 1, Heta); 0218 e_Pe = inner(eta, eta); 0219 end 0220 0221 r_r = inner(r, r); 0222 norm_r = sqrt(r_r); 0223 norm_r0 = norm_r; 0224 0225 % Precondition the residual. 0226 if ~options.useRand 0227 z = getPrecon(problem, x, r, storedb, key); 0228 else 0229 z = r; 0230 end 0231 0232 % Compute z'*r. 0233 z_r = inner(z, r); 0234 d_Pd = z_r; 0235 0236 % Initial search direction (we maintain -delta in memory, called mdelta, to 0237 % avoid a change of sign of the tangent vector.) 0238 mdelta = z; 0239 if ~options.useRand % and therefore, eta == 0 0240 e_Pd = 0; 0241 else % and therefore, no preconditioner 0242 e_Pd = -inner(eta, mdelta); 0243 end 0244 0245 % If the Hessian or a linear Hessian approximation is in use, it is 0246 % theoretically guaranteed that the model value decreases strictly 0247 % with each iteration of tCG. Hence, there is no need to monitor the model 0248 % value. But, when a nonlinear Hessian approximation is used (such as the 0249 % built-in finite-difference approximation for example), the model may 0250 % increase. It is then important to terminate the tCG iterations and return 0251 % the previous (the best-so-far) iterate. The variable below will hold the 0252 % model value. 0253 % 0254 % This computation could be further improved based on Section 17.4.1 in 0255 % Conn, Gould, Toint, Trust Region Methods, 2000. 0256 % If we make this change, then also modify trustregions to gather this 0257 % value from tCG rather than recomputing it itself. 0258 model_fun = @(eta, Heta) inner(eta, grad) + .5*inner(eta, Heta); 0259 if ~options.useRand 0260 model_value = 0; 0261 else 0262 model_value = model_fun(eta, Heta); 0263 end 0264 0265 % Pre-assume termination because j == end. 0266 stopreason_str = 'maximum inner iterations'; 0267 0268 % Begin inner/tCG loop. 0269 for j = 1 : options.maxinner 0270 0271 % This call is the computationally expensive step. 0272 Hmdelta = getHessian(problem, x, mdelta, storedb, key); 0273 0274 % Compute curvature (often called kappa). 0275 d_Hd = inner(mdelta, Hmdelta); 0276 0277 0278 % Note that if d_Hd == 0, we will exit at the next "if" anyway. 0279 alpha = z_r/d_Hd; 0280 % <neweta,neweta>_P = 0281 % <eta,eta>_P + 2*alpha*<eta,delta>_P + alpha*alpha*<delta,delta>_P 0282 e_Pe_new = e_Pe + 2.0*alpha*e_Pd + alpha*alpha*d_Pd; 0283 0284 if options.debug > 2 0285 fprintf('DBG: (r,r) : %e\n', r_r); 0286 fprintf('DBG: (d,Hd) : %e\n', d_Hd); 0287 fprintf('DBG: alpha : %e\n', alpha); 0288 end 0289 0290 % Check against negative curvature and trust-region radius violation. 0291 % If either condition triggers, we bail out. 0292 if d_Hd <= 0 || e_Pe_new >= Delta^2 0293 % want 0294 % ee = <eta,eta>_prec,x 0295 % ed = <eta,delta>_prec,x 0296 % dd = <delta,delta>_prec,x 0297 % Note (Nov. 26, 2021, NB): numerically, it might be better to call 0298 % tau = max(real(roots([d_Pd, 2*e_Pd, e_Pe-Delta^2]))); 0299 % This should be checked. 0300 % Also, we should safe-guard against 0/0: could happen if grad = 0. 0301 tau = (-e_Pd + sqrt(e_Pd*e_Pd + d_Pd*(Delta^2-e_Pe))) / d_Pd; 0302 if options.debug > 2 0303 fprintf('DBG: tau : %e\n', tau); 0304 end 0305 eta = lincomb(1, eta, -tau, mdelta); 0306 0307 % If only a nonlinear Hessian approximation is available, this is 0308 % only approximately correct, but saves an additional Hessian call. 0309 Heta = lincomb(1, Heta, -tau, Hmdelta); 0310 0311 % Technically, we may want to verify that this new eta is indeed 0312 % better than the previous eta before returning it (this is always 0313 % the case if the Hessian approximation is linear, but I am unsure 0314 % whether it is the case or not for nonlinear approximations.) 0315 % At any rate, the impact should be limited, so in the interest of 0316 % code conciseness (if we can still hope for that), we omit this. 0317 0318 limitedbyTR = true; 0319 0320 if d_Hd <= 0 0321 stopreason_str = 'negative curvature'; 0322 else 0323 stopreason_str = 'exceeded trust region'; 0324 end 0325 0326 break; 0327 end 0328 0329 % No negative curvature and eta_prop inside TR: accept it. 0330 e_Pe = e_Pe_new; 0331 new_eta = lincomb(1, eta, -alpha, mdelta); 0332 0333 % If only a nonlinear Hessian approximation is available, this is 0334 % only approximately correct, but saves an additional Hessian call. 0335 % TODO: this computation is redundant with that of r, L241. Clean up. 0336 new_Heta = lincomb(1, Heta, -alpha, Hmdelta); 0337 0338 % Verify that the model cost decreased in going from eta to new_eta. If 0339 % it did not (which can only occur if the Hessian approximation is 0340 % nonlinear or because of numerical errors), then we return the 0341 % previous eta (which necessarily is the best reached so far, according 0342 % to the model cost). Otherwise, we accept the new eta and go on. 0343 new_model_value = model_fun(new_eta, new_Heta); 0344 if new_model_value >= model_value 0345 stopreason_str = 'model increased'; 0346 break; 0347 end 0348 0349 eta = new_eta; 0350 Heta = new_Heta; 0351 model_value = new_model_value; %% added Feb. 17, 2015 0352 0353 % Update the residual. 0354 r = lincomb(1, r, -alpha, Hmdelta); 0355 0356 % Compute new norm of r. 0357 r_r = inner(r, r); 0358 norm_r = sqrt(r_r); 0359 0360 % Check kappa/theta stopping criterion. 0361 % Note that it is somewhat arbitrary whether to check this stopping 0362 % criterion on the r's (the gradients) or on the z's (the 0363 % preconditioned gradients). [CGT2000], page 206, mentions both as 0364 % acceptable criteria. 0365 if j >= options.mininner && norm_r <= norm_r0*min(norm_r0^theta, kappa) 0366 % Residual is small enough to quit 0367 if kappa < norm_r0^theta 0368 stopreason_str = 'reached target residual-kappa (linear)'; 0369 else 0370 stopreason_str = 'reached target residual-theta (superlinear)'; 0371 end 0372 break; 0373 end 0374 0375 % Precondition the residual. 0376 if ~options.useRand 0377 z = getPrecon(problem, x, r, storedb, key); 0378 else 0379 z = r; 0380 end 0381 0382 % Save the old z'*r. 0383 zold_rold = z_r; 0384 % Compute new z'*r. 0385 z_r = inner(z, r); 0386 0387 % Compute new search direction. 0388 beta = z_r/zold_rold; 0389 mdelta = lincomb(1, z, beta, mdelta); 0390 0391 % Since mdelta is passed to getHessian, which is the part of the code 0392 % we have least control over from here, we want to make sure mdelta is 0393 % a tangent vector up to numerical errors that should remain small. 0394 % For this reason, we re-project mdelta to the tangent space. 0395 % In limited tests, it was observed that it is a good idea to project 0396 % at every iteration rather than only every k iterations, the reason 0397 % being that loss of tangency can lead to more inner iterations being 0398 % run, which leads to an overall higher computational cost. 0399 mdelta = tangent(mdelta); 0400 0401 % Update new P-norms and P-dots [CGT2000, eq. 7.5.6 & 7.5.7]. 0402 e_Pd = beta*(e_Pd + alpha*d_Pd); 0403 d_Pd = z_r + beta*beta*d_Pd; 0404 0405 end % of tCG loop 0406 0407 % If using randomized approach, compare result with the Cauchy point. 0408 % Convergence proofs assume that we achieve at least (a fraction of) 0409 % the reduction of the Cauchy point. After this if-block, either all 0410 % eta-related quantities have been changed consistently, or none of 0411 % them have changed. 0412 if options.useRand 0413 used_cauchy = false; 0414 0415 norm_grad = M.norm(x, grad); 0416 0417 % Check the curvature, 0418 Hg = getHessian(problem, x, grad, storedb, key); 0419 g_Hg = M.inner(x, grad, Hg); 0420 if g_Hg <= 0 0421 tau_c = 1; 0422 else 0423 tau_c = min( norm_grad^3/(Delta*g_Hg) , 1); 0424 end 0425 % and generate the Cauchy point. 0426 eta_c = M.lincomb(x, -tau_c * Delta / norm_grad, grad); 0427 Heta_c = M.lincomb(x, -tau_c * Delta / norm_grad, Hg); 0428 0429 % Now that we have computed the Cauchy point in addition to the 0430 % returned eta, we might as well keep the best of them. 0431 mdle = M.inner(x, grad, eta) + .5*M.inner(x, Heta, eta); 0432 mdlec = M.inner(x, grad, eta_c) + .5*M.inner(x, Heta_c, eta_c); 0433 0434 if mdlec < mdle 0435 eta = eta_c; 0436 Heta = Heta_c; % added April 11, 2012 0437 used_cauchy = true; 0438 end 0439 end 0440 0441 printstr = sprintf('%9d %9d %s', j, j, stopreason_str); 0442 stats = struct('numinner', j, 'hessvecevals', j); 0443 0444 if options.useRand 0445 stats.cauchy = used_cauchy; 0446 printstr = sprintf('%9d %9d %11s %s', j, j, ... 0447 string(used_cauchy), stopreason_str); 0448 end 0449 0450 trsoutput.eta = eta; 0451 trsoutput.Heta = Heta; 0452 trsoutput.limitedbyTR = limitedbyTR; 0453 trsoutput.printstr = printstr; 0454 trsoutput.stats = stats; 0455 end