Home > manopt > solvers > trustregions > trs_tCG.m

trs_tCG

PURPOSE ^

Truncated (Steihaug-Toint) Conjugate-Gradient method.

SYNOPSIS ^

function trsoutput = trs_tCG(problem, trsinput, options, storedb, key)

DESCRIPTION ^

 Truncated (Steihaug-Toint) Conjugate-Gradient method.

 minimize <eta,grad> + .5*<eta,Hess(eta)>
 subject to <eta,eta>_[inverse precon] <= Delta^2

 function trsoutput = trs_tCG(problem, trsinput, options, storedb, key)

 Inputs:
   problem: Manopt optimization problem structure
   trsinput: structure with the following fields:
       x: point on the manifold problem.M
       fgradx: gradient of the cost function of the problem at x
       vector if options.useRand == true.
       Delta = trust-region radius
   options: structure containing options for the subproblem solver
   storedb, key: manopt's caching system for the point x

 Options specific to this subproblem solver:
   kappa (0.1)
       kappa convergence tolerance.
       kappa > 0 is the linear convergence target rate: trs_tCG will
       terminate early if the residual was reduced by a factor of kappa.
   theta (1.0)
       theta convergence tolerance.
       1+theta (theta between 0 and 1) is the superlinear convergence
       target rate. trs_tCG will terminate early if the residual was 
       reduced by a power of 1+theta.
   mininner (1)
       Minimum number of inner iterations for trs_tCG.
   maxinner (problem.M.dim() : the manifold's dimension)
       Maximum number of inner iterations for trs_tCG.
   useRand (false)
       Set to true if the trust-region solve is to be initiated with a
       random tangent vector. If set to true, no preconditioner will be
       used. This option is set to true in some scenarios to escape saddle
       points, but is otherwise seldom activated. 
       
 Output: the structure trsoutput contains the following fields:
   eta: approximate solution to the trust-region subproblem at x
   Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it
       is often naturally available to the subproblem solver at the
       end of execution, so that it may be cheaper to return it here.
   limitedbyTR: true if a boundary solution is returned
   printstr: logged information to be printed by trustregions.
   stats: structure with the following statistics:
       numinner: number of inner loops before returning
       hessvecevals: number of Hessian calls issued
       cauchy: true if cauchy step was used. false by default. Will
       be present in stats only if options.useRand = true

 trs_tCG can also be called in the following way (by trustregions) to
 obtain part of the header to print and an initial stats structure:

 function trsoutput = trs_tCG([], [], options)

 In this case trsoutput contains the following fields:
   printheader: subproblem header to be printed before the first loop of 
       trustregions
   initstats: struct with initial values for stored stats in subsequent
       calls to trs_tCG. Used in the first call to savestats 
       in trustregions to initialize the info struct properly.

 See also: trustregions trs_tCG_cached trs_gep

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function trsoutput = trs_tCG(problem, trsinput, options, storedb, key)
0002 % Truncated (Steihaug-Toint) Conjugate-Gradient method.
0003 %
0004 % minimize <eta,grad> + .5*<eta,Hess(eta)>
0005 % subject to <eta,eta>_[inverse precon] <= Delta^2
0006 %
0007 % function trsoutput = trs_tCG(problem, trsinput, options, storedb, key)
0008 %
0009 % Inputs:
0010 %   problem: Manopt optimization problem structure
0011 %   trsinput: structure with the following fields:
0012 %       x: point on the manifold problem.M
0013 %       fgradx: gradient of the cost function of the problem at x
0014 %       vector if options.useRand == true.
0015 %       Delta = trust-region radius
0016 %   options: structure containing options for the subproblem solver
0017 %   storedb, key: manopt's caching system for the point x
0018 %
0019 % Options specific to this subproblem solver:
0020 %   kappa (0.1)
0021 %       kappa convergence tolerance.
0022 %       kappa > 0 is the linear convergence target rate: trs_tCG will
0023 %       terminate early if the residual was reduced by a factor of kappa.
0024 %   theta (1.0)
0025 %       theta convergence tolerance.
0026 %       1+theta (theta between 0 and 1) is the superlinear convergence
0027 %       target rate. trs_tCG will terminate early if the residual was
0028 %       reduced by a power of 1+theta.
0029 %   mininner (1)
0030 %       Minimum number of inner iterations for trs_tCG.
0031 %   maxinner (problem.M.dim() : the manifold's dimension)
0032 %       Maximum number of inner iterations for trs_tCG.
0033 %   useRand (false)
0034 %       Set to true if the trust-region solve is to be initiated with a
0035 %       random tangent vector. If set to true, no preconditioner will be
0036 %       used. This option is set to true in some scenarios to escape saddle
0037 %       points, but is otherwise seldom activated.
0038 %
0039 % Output: the structure trsoutput contains the following fields:
0040 %   eta: approximate solution to the trust-region subproblem at x
0041 %   Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it
0042 %       is often naturally available to the subproblem solver at the
0043 %       end of execution, so that it may be cheaper to return it here.
0044 %   limitedbyTR: true if a boundary solution is returned
0045 %   printstr: logged information to be printed by trustregions.
0046 %   stats: structure with the following statistics:
0047 %       numinner: number of inner loops before returning
0048 %       hessvecevals: number of Hessian calls issued
0049 %       cauchy: true if cauchy step was used. false by default. Will
0050 %       be present in stats only if options.useRand = true
0051 %
0052 % trs_tCG can also be called in the following way (by trustregions) to
0053 % obtain part of the header to print and an initial stats structure:
0054 %
0055 % function trsoutput = trs_tCG([], [], options)
0056 %
0057 % In this case trsoutput contains the following fields:
0058 %   printheader: subproblem header to be printed before the first loop of
0059 %       trustregions
0060 %   initstats: struct with initial values for stored stats in subsequent
0061 %       calls to trs_tCG. Used in the first call to savestats
0062 %       in trustregions to initialize the info struct properly.
0063 %
0064 % See also: trustregions trs_tCG_cached trs_gep
0065 
0066 % This file is part of Manopt: www.manopt.org.
0067 % This code is an adaptation to Manopt of the original GenRTR code:
0068 % RTR - Riemannian Trust-Region
0069 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan
0070 % Florida State University
0071 % School of Computational Science
0072 % (http://www.math.fsu.edu/~cbaker/GenRTR/?page=download)
0073 % See accompanying license file.
0074 % The adaptation was executed by Nicolas Boumal.
0075 %
0076 % Change log:
0077 %
0078 %   NB Feb. 12, 2013:
0079 %       We do not project r back to the tangent space anymore: it was not
0080 %       necessary, and as of Manopt 1.0.1, the proj operator does not
0081 %       coincide with this notion anymore.
0082 %
0083 %   NB April 3, 2013:
0084 %       tCG now also returns Heta, the Hessian at x along eta. Additional
0085 %       esthetic modifications.
0086 %
0087 %   NB Dec. 2, 2013:
0088 %       If options.useRand is activated, we now make sure the preconditio-
0089 %       ner is not used, as was originally intended in GenRTR. In time, we
0090 %       may want to investigate whether useRand can be modified to work well
0091 %       with preconditioning too.
0092 %
0093 %   NB Jan. 9, 2014:
0094 %       Now checking explicitly for model decrease at each iteration. The
0095 %       first iteration is a Cauchy point, which necessarily realizes a
0096 %       decrease of the model cost. If a model increase is witnessed
0097 %       (which is theoretically impossible if a linear operator is used for
0098 %       the Hessian approximation), then we return the previous eta. This
0099 %       ensures we always achieve at least the Cauchy decrease, which
0100 %       should be sufficient for convergence.
0101 %
0102 %   NB Feb. 17, 2015:
0103 %       The previous update was in effect verifying that the current eta
0104 %       performed at least as well as the first eta (the Cauchy step) with
0105 %       respect to the model cost. While this is an acceptable strategy,
0106 %       the documentation (and the original intent) was to ensure a
0107 %       monotonic decrease of the model cost at each new eta. This is now
0108 %       the case, with the added line: "model_value = new_model_value;".
0109 %
0110 %   NB April 3, 2015:
0111 %       Works with the new StoreDB class system.
0112 %
0113 %   NB April 17, 2018:
0114 %       Two changes:
0115 %        (a) Instead of updating delta and Hdelta, we now update -delta and
0116 %            -Hdelta, named mdelta and Hmdelta. This allows to spare one
0117 %            call to lincomb(x, -1, z).
0118 %        (b) We re-project mdelta to the tangent space at every iteration,
0119 %            to avoid drifting away from it. The choice to project mdelta
0120 %            specifically is motivated by the fact that this is the vector
0121 %            passed to getHessian, hence this is where accurate tangency
0122 %            may be most important. (All other operations are linear
0123 %            combinations of tangent vectors, which should be fairly safe.)
0124 %
0125 %   VL June 29, 2022:
0126 %       Renamed tCG to trs_tCG to keep consistent naming with new
0127 %       subproblem solvers for trustregion. Also modified inputs and
0128 %       outputs to be compatible with other subproblem solvers.
0129 
0130 % All terms involving the trust-region radius use an inner product
0131 % w.r.t. the preconditioner; this is because the iterates grow in
0132 % length w.r.t. the preconditioner, guaranteeing that we do not
0133 % re-enter the trust-region.
0134 %
0135 % The following recurrences for Prec-based norms and inner
0136 % products come from [CGT2000], pg. 205, first edition.
0137 % Below, P is the preconditioner.
0138 %
0139 % <eta_k,P*delta_k> =
0140 %          beta_k-1 * ( <eta_k-1,P*delta_k-1> + alpha_k-1 |delta_k-1|^2_P )
0141 % |delta_k|^2_P = <r_k,z_k> + beta_k-1^2 |delta_k-1|^2_P
0142 %
0143 % Therefore, we need to keep track of:
0144 % 1)   |delta_k|^2_P
0145 % 2)   <eta_k,P*delta_k> = <eta_k,delta_k>_P
0146 % 3)   |eta_k  |^2_P
0147 %
0148 % Initial values are given by
0149 %    |delta_0|_P = <r,z>
0150 %    |eta_0|_P   = 0
0151 %    <eta_0,delta_0>_P = 0
0152 % because we take eta_0 = 0 (if useRand = false).
0153 %
0154 % [CGT2000] Conn, Gould and Toint: Trust-region methods, 2000.
0155 
0156 % trustregions only wants header and default values for stats.
0157 if nargin == 3 && isempty(problem) && isempty(trsinput)
0158     if isfield(options, 'useRand') && options.useRand
0159         trsoutput.printheader = sprintf('%9s   %9s   %11s   %s', ...
0160                                 'numinner', 'hessvec', 'used_cauchy', ...
0161                                 'stopreason');
0162         trsoutput.initstats = struct('numinner', 0, 'hessvecevals', 0, ...
0163                                 'cauchy', false);
0164     else
0165         trsoutput.printheader = sprintf('%9s   %9s   %s', 'numinner', ...
0166                                 'hessvec', 'stopreason');
0167         trsoutput.initstats = struct('numinner', 0, 'hessvecevals', 0);
0168     end
0169     return;
0170 end
0171 
0172 x = trsinput.x;
0173 Delta = trsinput.Delta;
0174 grad = trsinput.fgradx;
0175 
0176 M = problem.M;
0177 
0178 inner   = @(u, v) M.inner(x, u, v);
0179 lincomb = @(a, u, b, v) M.lincomb(x, a, u, b, v);
0180 tangent = @(u) M.tangent(x, u);
0181 
0182 % Set local defaults here
0183 localdefaults.kappa = 0.1;
0184 localdefaults.theta = 1.0;
0185 localdefaults.mininner = 1;
0186 localdefaults.maxinner = M.dim();
0187 localdefaults.useRand = false;
0188 
0189 % Merge local defaults with user options, if any
0190 if ~exist('options', 'var') || isempty(options)
0191     options = struct();
0192 end
0193 options = mergeOptions(localdefaults, options);
0194 
0195 theta = options.theta;
0196 kappa = options.kappa;
0197 
0198 % returned boolean to trustregions.m. true if we are limited by the TR
0199 % boundary (returns boundary solution). Otherwise false.
0200 limitedbyTR = false;
0201 
0202 % Determine eta0 and other useRand dependent initializations
0203 if ~options.useRand
0204     % Pick the zero vector
0205     eta = M.zerovec(x);
0206     Heta = M.zerovec(x);
0207     r = grad;
0208     e_Pe = 0;
0209 else
0210     % Random vector in T_x M (this has to be very small)
0211     eta = M.lincomb(x, 1e-6, M.randvec(x));
0212     % Must be inside trust-region
0213     while M.norm(x, eta) > Delta
0214         eta = M.lincomb(x, sqrt(sqrt(eps)), eta);
0215     end
0216     Heta = getHessian(problem, x, eta, storedb, key);
0217     r = lincomb(1, grad, 1, Heta);
0218     e_Pe = inner(eta, eta);
0219 end
0220 
0221 r_r = inner(r, r);
0222 norm_r = sqrt(r_r);
0223 norm_r0 = norm_r;
0224 
0225 % Precondition the residual.
0226 if ~options.useRand
0227     z = getPrecon(problem, x, r, storedb, key);
0228 else
0229     z = r;
0230 end
0231 
0232 % Compute z'*r.
0233 z_r = inner(z, r);
0234 d_Pd = z_r;
0235 
0236 % Initial search direction (we maintain -delta in memory, called mdelta, to
0237 % avoid a change of sign of the tangent vector.)
0238 mdelta = z;
0239 if ~options.useRand % and therefore, eta == 0
0240     e_Pd = 0;
0241 else % and therefore, no preconditioner
0242     e_Pd = -inner(eta, mdelta);
0243 end
0244 
0245 % If the Hessian or a linear Hessian approximation is in use, it is
0246 % theoretically guaranteed that the model value decreases strictly
0247 % with each iteration of tCG. Hence, there is no need to monitor the model
0248 % value. But, when a nonlinear Hessian approximation is used (such as the
0249 % built-in finite-difference approximation for example), the model may
0250 % increase. It is then important to terminate the tCG iterations and return
0251 % the previous (the best-so-far) iterate. The variable below will hold the
0252 % model value.
0253 %
0254 % This computation could be further improved based on Section 17.4.1 in
0255 % Conn, Gould, Toint, Trust Region Methods, 2000.
0256 % If we make this change, then also modify trustregions to gather this
0257 % value from tCG rather than recomputing it itself.
0258 model_fun = @(eta, Heta) inner(eta, grad) + .5*inner(eta, Heta);
0259 if ~options.useRand
0260     model_value = 0;
0261 else
0262     model_value = model_fun(eta, Heta);
0263 end
0264 
0265 % Pre-assume termination because j == end.
0266 stopreason_str = 'maximum inner iterations';
0267 
0268 % Begin inner/tCG loop.
0269 for j = 1 : options.maxinner
0270     
0271     % This call is the computationally expensive step.
0272     Hmdelta = getHessian(problem, x, mdelta, storedb, key);
0273     
0274     % Compute curvature (often called kappa).
0275     d_Hd = inner(mdelta, Hmdelta);
0276     
0277     
0278     % Note that if d_Hd == 0, we will exit at the next "if" anyway.
0279     alpha = z_r/d_Hd;
0280     % <neweta,neweta>_P =
0281     % <eta,eta>_P + 2*alpha*<eta,delta>_P + alpha*alpha*<delta,delta>_P
0282     e_Pe_new = e_Pe + 2.0*alpha*e_Pd + alpha*alpha*d_Pd;
0283     
0284     if options.debug > 2
0285         fprintf('DBG:   (r,r)  : %e\n', r_r);
0286         fprintf('DBG:   (d,Hd) : %e\n', d_Hd);
0287         fprintf('DBG:   alpha  : %e\n', alpha);
0288     end
0289     
0290     % Check against negative curvature and trust-region radius violation.
0291     % If either condition triggers, we bail out.
0292     if d_Hd <= 0 || e_Pe_new >= Delta^2
0293         % want
0294         %  ee = <eta,eta>_prec,x
0295         %  ed = <eta,delta>_prec,x
0296         %  dd = <delta,delta>_prec,x
0297         % Note (Nov. 26, 2021, NB): numerically, it might be better to call
0298         %   tau = max(real(roots([d_Pd, 2*e_Pd, e_Pe-Delta^2])));
0299         % This should be checked.
0300         % Also, we should safe-guard against 0/0: could happen if grad = 0.
0301         tau = (-e_Pd + sqrt(e_Pd*e_Pd + d_Pd*(Delta^2-e_Pe))) / d_Pd;
0302         if options.debug > 2
0303             fprintf('DBG:     tau  : %e\n', tau);
0304         end
0305         eta = lincomb(1,  eta, -tau,  mdelta);
0306         
0307         % If only a nonlinear Hessian approximation is available, this is
0308         % only approximately correct, but saves an additional Hessian call.
0309         Heta = lincomb(1, Heta, -tau, Hmdelta);
0310         
0311         % Technically, we may want to verify that this new eta is indeed
0312         % better than the previous eta before returning it (this is always
0313         % the case if the Hessian approximation is linear, but I am unsure
0314         % whether it is the case or not for nonlinear approximations.)
0315         % At any rate, the impact should be limited, so in the interest of
0316         % code conciseness (if we can still hope for that), we omit this.
0317 
0318         limitedbyTR = true;
0319 
0320         if d_Hd <= 0
0321             stopreason_str = 'negative curvature';
0322         else
0323             stopreason_str = 'exceeded trust region';
0324         end
0325 
0326         break;
0327     end
0328     
0329     % No negative curvature and eta_prop inside TR: accept it.
0330     e_Pe = e_Pe_new;
0331     new_eta  = lincomb(1,  eta, -alpha,  mdelta);
0332     
0333     % If only a nonlinear Hessian approximation is available, this is
0334     % only approximately correct, but saves an additional Hessian call.
0335     % TODO: this computation is redundant with that of r, L241. Clean up.
0336     new_Heta = lincomb(1, Heta, -alpha, Hmdelta);
0337     
0338     % Verify that the model cost decreased in going from eta to new_eta. If
0339     % it did not (which can only occur if the Hessian approximation is
0340     % nonlinear or because of numerical errors), then we return the
0341     % previous eta (which necessarily is the best reached so far, according
0342     % to the model cost). Otherwise, we accept the new eta and go on.
0343     new_model_value = model_fun(new_eta, new_Heta);
0344     if new_model_value >= model_value
0345         stopreason_str = 'model increased';
0346         break;
0347     end
0348     
0349     eta = new_eta;
0350     Heta = new_Heta;
0351     model_value = new_model_value; %% added Feb. 17, 2015
0352     
0353     % Update the residual.
0354     r = lincomb(1, r, -alpha, Hmdelta);
0355     
0356     % Compute new norm of r.
0357     r_r = inner(r, r);
0358     norm_r = sqrt(r_r);
0359     
0360     % Check kappa/theta stopping criterion.
0361     % Note that it is somewhat arbitrary whether to check this stopping
0362     % criterion on the r's (the gradients) or on the z's (the
0363     % preconditioned gradients). [CGT2000], page 206, mentions both as
0364     % acceptable criteria.
0365     if j >= options.mininner && norm_r <= norm_r0*min(norm_r0^theta, kappa)
0366         % Residual is small enough to quit
0367         if kappa < norm_r0^theta
0368             stopreason_str = 'reached target residual-kappa (linear)';
0369         else
0370             stopreason_str = 'reached target residual-theta (superlinear)';
0371         end
0372         break;
0373     end
0374     
0375     % Precondition the residual.
0376     if ~options.useRand
0377         z = getPrecon(problem, x, r, storedb, key);
0378     else
0379         z = r;
0380     end
0381     
0382     % Save the old z'*r.
0383     zold_rold = z_r;
0384     % Compute new z'*r.
0385     z_r = inner(z, r);
0386     
0387     % Compute new search direction.
0388     beta = z_r/zold_rold;
0389     mdelta = lincomb(1, z, beta, mdelta);
0390     
0391     % Since mdelta is passed to getHessian, which is the part of the code
0392     % we have least control over from here, we want to make sure mdelta is
0393     % a tangent vector up to numerical errors that should remain small.
0394     % For this reason, we re-project mdelta to the tangent space.
0395     % In limited tests, it was observed that it is a good idea to project
0396     % at every iteration rather than only every k iterations, the reason
0397     % being that loss of tangency can lead to more inner iterations being
0398     % run, which leads to an overall higher computational cost.
0399     mdelta = tangent(mdelta);
0400     
0401     % Update new P-norms and P-dots [CGT2000, eq. 7.5.6 & 7.5.7].
0402     e_Pd = beta*(e_Pd + alpha*d_Pd);
0403     d_Pd = z_r + beta*beta*d_Pd;
0404     
0405 end  % of tCG loop
0406 
0407 % If using randomized approach, compare result with the Cauchy point.
0408 % Convergence proofs assume that we achieve at least (a fraction of)
0409 % the reduction of the Cauchy point. After this if-block, either all
0410 % eta-related quantities have been changed consistently, or none of
0411 % them have changed.
0412 if options.useRand
0413     used_cauchy = false;
0414     
0415     norm_grad = M.norm(x, grad);
0416     
0417     % Check the curvature,
0418     Hg = getHessian(problem, x, grad, storedb, key);
0419     g_Hg = M.inner(x, grad, Hg);
0420     if g_Hg <= 0
0421         tau_c = 1;
0422     else
0423         tau_c = min( norm_grad^3/(Delta*g_Hg) , 1);
0424     end
0425     % and generate the Cauchy point.
0426     eta_c  = M.lincomb(x, -tau_c * Delta / norm_grad, grad);
0427     Heta_c = M.lincomb(x, -tau_c * Delta / norm_grad, Hg);
0428 
0429     % Now that we have computed the Cauchy point in addition to the
0430     % returned eta, we might as well keep the best of them.
0431     mdle  = M.inner(x, grad, eta) + .5*M.inner(x, Heta,   eta);
0432     mdlec = M.inner(x, grad, eta_c) + .5*M.inner(x, Heta_c, eta_c);
0433 
0434     if mdlec < mdle
0435         eta = eta_c;
0436         Heta = Heta_c; % added April 11, 2012
0437         used_cauchy = true;
0438     end
0439 end
0440 
0441 printstr = sprintf('%9d   %9d   %s', j, j, stopreason_str);
0442 stats = struct('numinner', j, 'hessvecevals', j);
0443 
0444 if options.useRand
0445     stats.cauchy = used_cauchy;
0446     printstr = sprintf('%9d   %9d   %11s   %s', j, j, ...
0447                 string(used_cauchy), stopreason_str);
0448 end
0449 
0450 trsoutput.eta = eta;
0451 trsoutput.Heta = Heta;
0452 trsoutput.limitedbyTR = limitedbyTR;
0453 trsoutput.printstr = printstr;
0454 trsoutput.stats = stats;
0455 end

Generated on Fri 30-Sep-2022 13:18:25 by m2html © 2005