tCG - Truncated (Steihaug-Toint) Conjugate-Gradient method minimize <eta,grad> + .5*<eta,Hess(eta)> subject to <eta,eta>_[inverse precon] <= Delta^2 See also: trustregions
0001 function [eta, Heta, inner_it, stop_tCG] ... 0002 = tCG(problem, x, grad, eta, Delta, options, storedb, key) 0003 % tCG - Truncated (Steihaug-Toint) Conjugate-Gradient method 0004 % minimize <eta,grad> + .5*<eta,Hess(eta)> 0005 % subject to <eta,eta>_[inverse precon] <= Delta^2 0006 % 0007 % See also: trustregions 0008 0009 % This file is part of Manopt: www.manopt.org. 0010 % This code is an adaptation to Manopt of the original GenRTR code: 0011 % RTR - Riemannian Trust-Region 0012 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan 0013 % Florida State University 0014 % School of Computational Science 0015 % (http://www.math.fsu.edu/~cbaker/GenRTR/?page=download) 0016 % See accompanying license file. 0017 % The adaptation was executed by Nicolas Boumal. 0018 % 0019 % Change log: 0020 % 0021 % NB Feb. 12, 2013: 0022 % We do not project r back to the tangent space anymore: it was not 0023 % necessary, and as of Manopt 1.0.1, the proj operator does not 0024 % coincide with this notion anymore. 0025 % 0026 % NB April 3, 2013: 0027 % tCG now also returns Heta, the Hessian at x along eta. Additional 0028 % esthetic modifications. 0029 % 0030 % NB Dec. 2, 2013: 0031 % If options.useRand is activated, we now make sure the preconditio- 0032 % ner is not used, as was originally intended in GenRTR. In time, we 0033 % may want to investigate whether useRand can be modified to work well 0034 % with preconditioning too. 0035 % 0036 % NB Jan. 9, 2014: 0037 % Now checking explicitly for model decrease at each iteration. The 0038 % first iteration is a Cauchy point, which necessarily realizes a 0039 % decrease of the model cost. If a model increase is witnessed 0040 % (which is theoretically impossible if a linear operator is used for 0041 % the Hessian approximation), then we return the previous eta. This 0042 % ensures we always achieve at least the Cauchy decrease, which 0043 % should be sufficient for convergence. 0044 % 0045 % NB Feb. 17, 2015: 0046 % The previous update was in effect verifying that the current eta 0047 % performed at least as well as the first eta (the Cauchy step) with 0048 % respect to the model cost. While this is an acceptable strategy, 0049 % the documentation (and the original intent) was to ensure a 0050 % monotonic decrease of the model cost at each new eta. This is now 0051 % the case, with the added line: "model_value = new_model_value;". 0052 % 0053 % NB April 3, 2015: 0054 % Works with the new StoreDB class system. 0055 % 0056 % NB April 17, 2018: 0057 % Two changes: 0058 % (a) Instead of updating delta and Hdelta, we now update -delta and 0059 % -Hdelta, named mdelta and Hmdelta. This allows to spare one 0060 % call to lincomb(x, -1, z). 0061 % (b) We re-project mdelta to the tangent space at every iteration, 0062 % to avoid drifting away from it. The choice to project mdelta 0063 % specifically is motivated by the fact that this is the vector 0064 % passed to getHessian, hence this is where accurate tangency 0065 % may be most important. (All other operations are linear 0066 % combinations of tangent vectors, which should be fairly safe.) 0067 0068 0069 % All terms involving the trust-region radius use an inner product 0070 % w.r.t. the preconditioner; this is because the iterates grow in 0071 % length w.r.t. the preconditioner, guaranteeing that we do not 0072 % re-enter the trust-region. 0073 % 0074 % The following recurrences for Prec-based norms and inner 0075 % products come from [CGT2000], pg. 205, first edition. 0076 % Below, P is the preconditioner. 0077 % 0078 % <eta_k,P*delta_k> = 0079 % beta_k-1 * ( <eta_k-1,P*delta_k-1> + alpha_k-1 |delta_k-1|^2_P ) 0080 % |delta_k|^2_P = <r_k,z_k> + beta_k-1^2 |delta_k-1|^2_P 0081 % 0082 % Therefore, we need to keep track of: 0083 % 1) |delta_k|^2_P 0084 % 2) <eta_k,P*delta_k> = <eta_k,delta_k>_P 0085 % 3) |eta_k |^2_P 0086 % 0087 % Initial values are given by 0088 % |delta_0|_P = <r,z> 0089 % |eta_0|_P = 0 0090 % <eta_0,delta_0>_P = 0 0091 % because we take eta_0 = 0 (if useRand = false). 0092 % 0093 % [CGT2000] Conn, Gould and Toint: Trust-region methods, 2000. 0094 0095 inner = @(u, v) problem.M.inner(x, u, v); 0096 lincomb = @(a, u, b, v) problem.M.lincomb(x, a, u, b, v); 0097 tangent = @(u) problem.M.tangent(x, u); 0098 0099 theta = options.theta; 0100 kappa = options.kappa; 0101 0102 if ~options.useRand % and therefore, eta == 0 0103 Heta = problem.M.zerovec(x); 0104 r = grad; 0105 e_Pe = 0; 0106 else % and therefore, no preconditioner 0107 % eta (presumably) ~= 0 was provided by the caller. 0108 Heta = getHessian(problem, x, eta, storedb, key); 0109 r = lincomb(1, grad, 1, Heta); 0110 e_Pe = inner(eta, eta); 0111 end 0112 r_r = inner(r, r); 0113 norm_r = sqrt(r_r); 0114 norm_r0 = norm_r; 0115 0116 % Precondition the residual. 0117 if ~options.useRand 0118 z = getPrecon(problem, x, r, storedb, key); 0119 else 0120 z = r; 0121 end 0122 0123 % Compute z'*r. 0124 z_r = inner(z, r); 0125 d_Pd = z_r; 0126 0127 % Initial search direction (we maintain -delta in memory, called mdelta, to 0128 % avoid a change of sign of the tangent vector.) 0129 mdelta = z; 0130 if ~options.useRand % and therefore, eta == 0 0131 e_Pd = 0; 0132 else % and therefore, no preconditioner 0133 e_Pd = -inner(eta, mdelta); 0134 end 0135 0136 % If the Hessian or a linear Hessian approximation is in use, it is 0137 % theoretically guaranteed that the model value decreases strictly 0138 % with each iteration of tCG. Hence, there is no need to monitor the model 0139 % value. But, when a nonlinear Hessian approximation is used (such as the 0140 % built-in finite-difference approximation for example), the model may 0141 % increase. It is then important to terminate the tCG iterations and return 0142 % the previous (the best-so-far) iterate. The variable below will hold the 0143 % model value. 0144 % 0145 % This computation could be further improved based on Section 17.4.1 in 0146 % Conn, Gould, Toint, Trust Region Methods, 2000. 0147 % If we make this change, then also modify trustregions to gather this 0148 % value from tCG rather than recomputing it itself. 0149 model_fun = @(eta, Heta) inner(eta, grad) + .5*inner(eta, Heta); 0150 if ~options.useRand 0151 model_value = 0; 0152 else 0153 model_value = model_fun(eta, Heta); 0154 end 0155 0156 % Pre-assume termination because j == end. 0157 stop_tCG = 5; 0158 0159 % Begin inner/tCG loop. 0160 for j = 1 : options.maxinner 0161 0162 % This call is the computationally expensive step. 0163 Hmdelta = getHessian(problem, x, mdelta, storedb, key); 0164 0165 % Compute curvature (often called kappa). 0166 d_Hd = inner(mdelta, Hmdelta); 0167 0168 0169 % Note that if d_Hd == 0, we will exit at the next "if" anyway. 0170 alpha = z_r/d_Hd; 0171 % <neweta,neweta>_P = 0172 % <eta,eta>_P + 2*alpha*<eta,delta>_P + alpha*alpha*<delta,delta>_P 0173 e_Pe_new = e_Pe + 2.0*alpha*e_Pd + alpha*alpha*d_Pd; 0174 0175 if options.debug > 2 0176 fprintf('DBG: (r,r) : %e\n', r_r); 0177 fprintf('DBG: (d,Hd) : %e\n', d_Hd); 0178 fprintf('DBG: alpha : %e\n', alpha); 0179 end 0180 0181 % Check against negative curvature and trust-region radius violation. 0182 % If either condition triggers, we bail out. 0183 if d_Hd <= 0 || e_Pe_new >= Delta^2 0184 % want 0185 % ee = <eta,eta>_prec,x 0186 % ed = <eta,delta>_prec,x 0187 % dd = <delta,delta>_prec,x 0188 tau = (-e_Pd + sqrt(e_Pd*e_Pd + d_Pd*(Delta^2-e_Pe))) / d_Pd; 0189 if options.debug > 2 0190 fprintf('DBG: tau : %e\n', tau); 0191 end 0192 eta = lincomb(1, eta, -tau, mdelta); 0193 0194 % If only a nonlinear Hessian approximation is available, this is 0195 % only approximately correct, but saves an additional Hessian call. 0196 Heta = lincomb(1, Heta, -tau, Hmdelta); 0197 0198 % Technically, we may want to verify that this new eta is indeed 0199 % better than the previous eta before returning it (this is always 0200 % the case if the Hessian approximation is linear, but I am unsure 0201 % whether it is the case or not for nonlinear approximations.) 0202 % At any rate, the impact should be limited, so in the interest of 0203 % code conciseness (if we can still hope for that), we omit this. 0204 0205 if d_Hd <= 0 0206 stop_tCG = 1; % negative curvature 0207 else 0208 stop_tCG = 2; % exceeded trust region 0209 end 0210 break; 0211 end 0212 0213 % No negative curvature and eta_prop inside TR: accept it. 0214 e_Pe = e_Pe_new; 0215 new_eta = lincomb(1, eta, -alpha, mdelta); 0216 0217 % If only a nonlinear Hessian approximation is available, this is 0218 % only approximately correct, but saves an additional Hessian call. 0219 % TODO: this computation is redundant with that of r, L234. Clean up. 0220 new_Heta = lincomb(1, Heta, -alpha, Hmdelta); 0221 0222 % Verify that the model cost decreased in going from eta to new_eta. If 0223 % it did not (which can only occur if the Hessian approximation is 0224 % nonlinear or because of numerical errors), then we return the 0225 % previous eta (which necessarily is the best reached so far, according 0226 % to the model cost). Otherwise, we accept the new eta and go on. 0227 new_model_value = model_fun(new_eta, new_Heta); 0228 if new_model_value >= model_value 0229 stop_tCG = 6; 0230 break; 0231 end 0232 0233 eta = new_eta; 0234 Heta = new_Heta; 0235 model_value = new_model_value; %% added Feb. 17, 2015 0236 0237 % Update the residual. 0238 r = lincomb(1, r, -alpha, Hmdelta); 0239 0240 % Compute new norm of r. 0241 r_r = inner(r, r); 0242 norm_r = sqrt(r_r); 0243 0244 % Check kappa/theta stopping criterion. 0245 % Note that it is somewhat arbitrary whether to check this stopping 0246 % criterion on the r's (the gradients) or on the z's (the 0247 % preconditioned gradients). [CGT2000], page 206, mentions both as 0248 % acceptable criteria. 0249 if j >= options.mininner && norm_r <= norm_r0*min(norm_r0^theta, kappa) 0250 % Residual is small enough to quit 0251 if kappa < norm_r0^theta 0252 stop_tCG = 3; % linear convergence 0253 else 0254 stop_tCG = 4; % superlinear convergence 0255 end 0256 break; 0257 end 0258 0259 % Precondition the residual. 0260 if ~options.useRand 0261 z = getPrecon(problem, x, r, storedb, key); 0262 else 0263 z = r; 0264 end 0265 0266 % Save the old z'*r. 0267 zold_rold = z_r; 0268 % Compute new z'*r. 0269 z_r = inner(z, r); 0270 0271 % Compute new search direction. 0272 beta = z_r/zold_rold; 0273 mdelta = lincomb(1, z, beta, mdelta); 0274 0275 % Since mdelta is passed to getHessian, which is the part of the code 0276 % we have least control over from here, we want to make sure mdelta is 0277 % a tangent vector up to numerical errors that should remain small. 0278 % For this reason, we re-project mdelta to the tangent space. 0279 % In limited tests, it was observed that it is a good idea to project 0280 % at every iteration rather than only every k iterations, the reason 0281 % being that loss of tangency can lead to more inner iterations being 0282 % run, which leads to an overall higher computational cost. 0283 mdelta = tangent(mdelta); 0284 0285 % Update new P-norms and P-dots [CGT2000, eq. 7.5.6 & 7.5.7]. 0286 e_Pd = beta*(e_Pd + alpha*d_Pd); 0287 d_Pd = z_r + beta*beta*d_Pd; 0288 0289 end % of tCG loop 0290 inner_it = j; 0291 0292 end