Description of tCG

0001 function [eta, Heta, inner_it, stop_tCG] ...
0002                  = tCG(problem, x, grad, eta, Delta, options, storedb, key)
0003 % tCG - Truncated (Steihaug-Toint) Conjugate-Gradient method
0004 % minimize <eta,grad> + .5*<eta,Hess(eta)>
0005 % subject to <eta,eta>_[inverse precon] <= Delta^2
0006 %
0007 % See also: trustregions
0008 
0009 % This file is part of Manopt: www.manopt.org.
0010 % This code is an adaptation to Manopt of the original GenRTR code:
0011 % RTR - Riemannian Trust-Region
0012 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan
0013 % Florida State University
0014 % School of Computational Science
0015 % (http://www.math.fsu.edu/~cbaker/GenRTR/?page=download)
0016 % See accompanying license file.
0017 % The adaptation was executed by Nicolas Boumal.
0018 %
0019 % Change log:
0020 %
0021 %   NB Feb. 12, 2013:
0022 %       We do not project r back to the tangent space anymore: it was not
0023 %       necessary, and as of Manopt 1.0.1, the proj operator does not
0024 %       coincide with this notion anymore.
0025 %
0026 %   NB April 3, 2013:
0027 %       tCG now also returns Heta, the Hessian at x along eta. Additional
0028 %       esthetic modifications.
0029 %
0030 %   NB Dec. 2, 2013:
0031 %       If options.useRand is activated, we now make sure the preconditio-
0032 %       ner is not used, as was originally intended in GenRTR. In time, we
0033 %       may want to investigate whether useRand can be modified to work well
0034 %       with preconditioning too.
0035 %
0036 %   NB Jan. 9, 2014:
0037 %       Now checking explicitly for model decrease at each iteration. The
0038 %       first iteration is a Cauchy point, which necessarily realizes a
0039 %       decrease of the model cost. If a model increase is witnessed
0040 %       (which is theoretically impossible if a linear operator is used for
0041 %       the Hessian approximation), then we return the previous eta. This
0042 %       ensures we always achieve at least the Cauchy decrease, which
0043 %       should be sufficient for convergence.
0044 %
0045 %   NB Feb. 17, 2015:
0046 %       The previous update was in effect verifying that the current eta
0047 %       performed at least as well as the first eta (the Cauchy step) with
0048 %       respect to the model cost. While this is an acceptable strategy,
0049 %       the documentation (and the original intent) was to ensure a
0050 %       monotonic decrease of the model cost at each new eta. This is now
0051 %       the case, with the added line: "model_value = new_model_value;".
0052 %
0053 %   NB April 3, 2015:
0054 %       Works with the new StoreDB class system.
0055 %
0056 %   NB April 17, 2018:
0057 %       Two changes:
0058 %        (a) Instead of updating delta and Hdelta, we now update -delta and
0059 %            -Hdelta, named mdelta and Hmdelta. This allows to spare one
0060 %            call to lincomb(x, -1, z).
0061 %        (b) We re-project mdelta to the tangent space at every iteration,
0062 %            to avoid drifting away from it. The choice to project mdelta
0063 %            specifically is motivated by the fact that this is the vector
0064 %            passed to getHessian, hence this is where accurate tangency
0065 %            may be most important. (All other operations are linear
0066 %            combinations of tangent vectors, which should be fairly safe.)
0067 
0068 
0069 % All terms involving the trust-region radius use an inner product
0070 % w.r.t. the preconditioner; this is because the iterates grow in
0071 % length w.r.t. the preconditioner, guaranteeing that we do not
0072 % re-enter the trust-region.
0073 %
0074 % The following recurrences for Prec-based norms and inner
0075 % products come from [CGT2000], pg. 205, first edition.
0076 % Below, P is the preconditioner.
0077 %
0078 % <eta_k,P*delta_k> =
0079 %          beta_k-1 * ( <eta_k-1,P*delta_k-1> + alpha_k-1 |delta_k-1|^2_P )
0080 % |delta_k|^2_P = <r_k,z_k> + beta_k-1^2 |delta_k-1|^2_P
0081 %
0082 % Therefore, we need to keep track of:
0083 % 1)   |delta_k|^2_P
0084 % 2)   <eta_k,P*delta_k> = <eta_k,delta_k>_P
0085 % 3)   |eta_k  |^2_P
0086 %
0087 % Initial values are given by
0088 %    |delta_0|_P = <r,z>
0089 %    |eta_0|_P   = 0
0090 %    <eta_0,delta_0>_P = 0
0091 % because we take eta_0 = 0 (if useRand = false).
0092 %
0093 % [CGT2000] Conn, Gould and Toint: Trust-region methods, 2000.
0094 
0095 inner   = @(u, v) problem.M.inner(x, u, v);
0096 lincomb = @(a, u, b, v) problem.M.lincomb(x, a, u, b, v);
0097 tangent = @(u) problem.M.tangent(x, u);
0098 
0099 theta = options.theta;
0100 kappa = options.kappa;
0101 
0102 if ~options.useRand % and therefore, eta == 0
0103     Heta = problem.M.zerovec(x);
0104     r = grad;
0105     e_Pe = 0;
0106 else % and therefore, no preconditioner
0107     % eta (presumably) ~= 0 was provided by the caller.
0108     Heta = getHessian(problem, x, eta, storedb, key);
0109     r = lincomb(1, grad, 1, Heta);
0110     e_Pe = inner(eta, eta);
0111 end
0112 r_r = inner(r, r);
0113 norm_r = sqrt(r_r);
0114 norm_r0 = norm_r;
0115 
0116 % Precondition the residual.
0117 if ~options.useRand
0118     z = getPrecon(problem, x, r, storedb, key);
0119 else
0120     z = r;
0121 end
0122 
0123 % Compute z'*r.
0124 z_r = inner(z, r);
0125 d_Pd = z_r;
0126 
0127 % Initial search direction (we maintain -delta in memory, called mdelta, to
0128 % avoid a change of sign of the tangent vector.)
0129 mdelta = z;
0130 if ~options.useRand % and therefore, eta == 0
0131     e_Pd = 0;
0132 else % and therefore, no preconditioner
0133     e_Pd = -inner(eta, mdelta);
0134 end
0135 
0136 % If the Hessian or a linear Hessian approximation is in use, it is
0137 % theoretically guaranteed that the model value decreases strictly
0138 % with each iteration of tCG. Hence, there is no need to monitor the model
0139 % value. But, when a nonlinear Hessian approximation is used (such as the
0140 % built-in finite-difference approximation for example), the model may
0141 % increase. It is then important to terminate the tCG iterations and return
0142 % the previous (the best-so-far) iterate. The variable below will hold the
0143 % model value.
0144 %
0145 % This computation could be further improved based on Section 17.4.1 in
0146 % Conn, Gould, Toint, Trust Region Methods, 2000.
0147 % If we make this change, then also modify trustregions to gather this
0148 % value from tCG rather than recomputing it itself.
0149 model_fun = @(eta, Heta) inner(eta, grad) + .5*inner(eta, Heta);
0150 if ~options.useRand
0151     model_value = 0;
0152 else
0153     model_value = model_fun(eta, Heta);
0154 end
0155 
0156 % Pre-assume termination because j == end.
0157 stop_tCG = 5;
0158 
0159 % Begin inner/tCG loop.
0160 for j = 1 : options.maxinner
0161     
0162     % This call is the computationally expensive step.
0163     Hmdelta = getHessian(problem, x, mdelta, storedb, key);
0164     
0165     % Compute curvature (often called kappa).
0166     d_Hd = inner(mdelta, Hmdelta);
0167     
0168     
0169     % Note that if d_Hd == 0, we will exit at the next "if" anyway.
0170     alpha = z_r/d_Hd;
0171     % <neweta,neweta>_P =
0172     % <eta,eta>_P + 2*alpha*<eta,delta>_P + alpha*alpha*<delta,delta>_P
0173     e_Pe_new = e_Pe + 2.0*alpha*e_Pd + alpha*alpha*d_Pd;
0174     
0175     if options.debug > 2
0176         fprintf('DBG:   (r,r)  : %e\n', r_r);
0177         fprintf('DBG:   (d,Hd) : %e\n', d_Hd);
0178         fprintf('DBG:   alpha  : %e\n', alpha);
0179     end
0180     
0181     % Check against negative curvature and trust-region radius violation.
0182     % If either condition triggers, we bail out.
0183     if d_Hd <= 0 || e_Pe_new >= Delta^2
0184         % want
0185         %  ee = <eta,eta>_prec,x
0186         %  ed = <eta,delta>_prec,x
0187         %  dd = <delta,delta>_prec,x
0188         tau = (-e_Pd + sqrt(e_Pd*e_Pd + d_Pd*(Delta^2-e_Pe))) / d_Pd;
0189         if options.debug > 2
0190             fprintf('DBG:     tau  : %e\n', tau);
0191         end
0192         eta  = lincomb(1,  eta, -tau,  mdelta);
0193         
0194         % If only a nonlinear Hessian approximation is available, this is
0195         % only approximately correct, but saves an additional Hessian call.
0196         Heta = lincomb(1, Heta, -tau, Hmdelta);
0197         
0198         % Technically, we may want to verify that this new eta is indeed
0199         % better than the previous eta before returning it (this is always
0200         % the case if the Hessian approximation is linear, but I am unsure
0201         % whether it is the case or not for nonlinear approximations.)
0202         % At any rate, the impact should be limited, so in the interest of
0203         % code conciseness (if we can still hope for that), we omit this.
0204         
0205         if d_Hd <= 0
0206             stop_tCG = 1;     % negative curvature
0207         else
0208             stop_tCG = 2;     % exceeded trust region
0209         end
0210         break;
0211     end
0212     
0213     % No negative curvature and eta_prop inside TR: accept it.
0214     e_Pe = e_Pe_new;
0215     new_eta  = lincomb(1,  eta, -alpha,  mdelta);
0216     
0217     % If only a nonlinear Hessian approximation is available, this is
0218     % only approximately correct, but saves an additional Hessian call.
0219     % TODO: this computation is redundant with that of r, L234. Clean up.
0220     new_Heta = lincomb(1, Heta, -alpha, Hmdelta);
0221     
0222     % Verify that the model cost decreased in going from eta to new_eta. If
0223     % it did not (which can only occur if the Hessian approximation is
0224     % nonlinear or because of numerical errors), then we return the
0225     % previous eta (which necessarily is the best reached so far, according
0226     % to the model cost). Otherwise, we accept the new eta and go on.
0227     new_model_value = model_fun(new_eta, new_Heta);
0228     if new_model_value >= model_value
0229         stop_tCG = 6;
0230         break;
0231     end
0232     
0233     eta = new_eta;
0234     Heta = new_Heta;
0235     model_value = new_model_value; %% added Feb. 17, 2015
0236     
0237     % Update the residual.
0238     r = lincomb(1, r, -alpha, Hmdelta);
0239     
0240     % Compute new norm of r.
0241     r_r = inner(r, r);
0242     norm_r = sqrt(r_r);
0243     
0244     % Check kappa/theta stopping criterion.
0245     % Note that it is somewhat arbitrary whether to check this stopping
0246     % criterion on the r's (the gradients) or on the z's (the
0247     % preconditioned gradients). [CGT2000], page 206, mentions both as
0248     % acceptable criteria.
0249     if j >= options.mininner && norm_r <= norm_r0*min(norm_r0^theta, kappa)
0250         % Residual is small enough to quit
0251         if kappa < norm_r0^theta
0252             stop_tCG = 3;  % linear convergence
0253         else
0254             stop_tCG = 4;  % superlinear convergence
0255         end
0256         break;
0257     end
0258     
0259     % Precondition the residual.
0260     if ~options.useRand
0261         z = getPrecon(problem, x, r, storedb, key);
0262     else
0263         z = r;
0264     end
0265     
0266     % Save the old z'*r.
0267     zold_rold = z_r;
0268     % Compute new z'*r.
0269     z_r = inner(z, r);
0270     
0271     % Compute new search direction.
0272     beta = z_r/zold_rold;
0273     mdelta = lincomb(1, z, beta, mdelta);
0274     
0275     % Since mdelta is passed to getHessian, which is the part of the code
0276     % we have least control over from here, we want to make sure mdelta is
0277     % a tangent vector up to numerical errors that should remain small.
0278     % For this reason, we re-project mdelta to the tangent space.
0279     % In limited tests, it was observed that it is a good idea to project
0280     % at every iteration rather than only every k iterations, the reason
0281     % being that loss of tangency can lead to more inner iterations being
0282     % run, which leads to an overall higher computational cost.
0283     mdelta = tangent(mdelta);
0284     
0285     % Update new P-norms and P-dots [CGT2000, eq. 7.5.6 & 7.5.7].
0286     e_Pd = beta*(e_Pd + alpha*d_Pd);
0287     d_Pd = z_r + beta*beta*d_Pd;
0288     
0289 end  % of tCG loop
0290 inner_it = j;
0291 
0292 end
tCG

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SOURCE CODE