Description of trs

0001 function trsoutput = trs_tCG(problem, trsinput, options, storedb, key)
0002 % Truncated (Steihaug-Toint) Conjugate-Gradient method.
0003 %
0004 % minimize <eta,grad> + .5*<eta,Hess(eta)>
0005 % subject to <eta,eta>_[inverse precon] <= Delta^2
0006 %
0007 % function trsoutput = trs_tCG(problem, trsinput, options, storedb, key)
0008 %
0009 % Inputs:
0010 %   problem: Manopt optimization problem structure
0011 %   trsinput: structure with the following fields:
0012 %       x: point on the manifold problem.M
0013 %       fgradx: gradient of the cost function of the problem at x
0014 %       vector if options.useRand == true.
0015 %       Delta = trust-region radius
0016 %   options: structure containing options for the subproblem solver
0017 %   storedb, key: manopt's caching system for the point x
0018 %
0019 % Options specific to this subproblem solver:
0020 %   kappa (0.1)
0021 %       kappa convergence tolerance.
0022 %       kappa > 0 is the linear convergence target rate: trs_tCG will
0023 %       terminate early if the residual was reduced by a factor of kappa.
0024 %   theta (1.0)
0025 %       theta convergence tolerance.
0026 %       1+theta (theta between 0 and 1) is the superlinear convergence
0027 %       target rate. trs_tCG will terminate early if the residual was
0028 %       reduced by a power of 1+theta.
0029 %   mininner (1)
0030 %       Minimum number of inner iterations for trs_tCG.
0031 %   maxinner (problem.M.dim() : the manifold's dimension)
0032 %       Maximum number of inner iterations for trs_tCG.
0033 %   useRand (false)
0034 %       Set to true if the trust-region solve is to be initiated with a
0035 %       random tangent vector. If set to true, no preconditioner will be
0036 %       used. This option is set to true in some scenarios to escape saddle
0037 %       points, but is otherwise seldom activated.
0038 %
0039 % Output: the structure trsoutput contains the following fields:
0040 %   eta: approximate solution to the trust-region subproblem at x
0041 %   Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it
0042 %       is often naturally available to the subproblem solver at the
0043 %       end of execution, so that it may be cheaper to return it here.
0044 %   limitedbyTR: true if a boundary solution is returned
0045 %   printstr: logged information to be printed by trustregions.
0046 %   stats: structure with the following statistics:
0047 %       numinner: number of inner loops before returning
0048 %       hessvecevals: number of Hessian calls issued
0049 %       cauchy: true if cauchy step was used. false by default. Will
0050 %       be present in stats only if options.useRand = true
0051 %
0052 % trs_tCG can also be called in the following way (by trustregions) to
0053 % obtain part of the header to print and an initial stats structure:
0054 %
0055 % function trsoutput = trs_tCG([], [], options)
0056 %
0057 % In this case trsoutput contains the following fields:
0058 %   printheader: subproblem header to be printed before the first loop of
0059 %       trustregions
0060 %   initstats: struct with initial values for stored stats in subsequent
0061 %       calls to trs_tCG. Used in the first call to savestats
0062 %       in trustregions to initialize the info struct properly.
0063 %
0064 % See also: trustregions trs_tCG_cached trs_gep
0065 
0066 % This file is part of Manopt: www.manopt.org.
0067 % This code is an adaptation to Manopt of the original GenRTR code:
0068 % RTR - Riemannian Trust-Region
0069 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan
0070 % Florida State University
0071 % School of Computational Science
0072 % (http://www.math.fsu.edu/~cbaker/GenRTR/?page=download)
0073 % See accompanying license file.
0074 % The adaptation was executed by Nicolas Boumal.
0075 %
0076 % Change log:
0077 %
0078 %   NB Feb. 12, 2013:
0079 %       We do not project r back to the tangent space anymore: it was not
0080 %       necessary, and as of Manopt 1.0.1, the proj operator does not
0081 %       coincide with this notion anymore.
0082 %
0083 %   NB April 3, 2013:
0084 %       tCG now also returns Heta, the Hessian at x along eta. Additional
0085 %       esthetic modifications.
0086 %
0087 %   NB Dec. 2, 2013:
0088 %       If options.useRand is activated, we now make sure the preconditio-
0089 %       ner is not used, as was originally intended in GenRTR. In time, we
0090 %       may want to investigate whether useRand can be modified to work well
0091 %       with preconditioning too.
0092 %
0093 %   NB Jan. 9, 2014:
0094 %       Now checking explicitly for model decrease at each iteration. The
0095 %       first iteration is a Cauchy point, which necessarily realizes a
0096 %       decrease of the model cost. If a model increase is witnessed
0097 %       (which is theoretically impossible if a linear operator is used for
0098 %       the Hessian approximation), then we return the previous eta. This
0099 %       ensures we always achieve at least the Cauchy decrease, which
0100 %       should be sufficient for convergence.
0101 %
0102 %   NB Feb. 17, 2015:
0103 %       The previous update was in effect verifying that the current eta
0104 %       performed at least as well as the first eta (the Cauchy step) with
0105 %       respect to the model cost. While this is an acceptable strategy,
0106 %       the documentation (and the original intent) was to ensure a
0107 %       monotonic decrease of the model cost at each new eta. This is now
0108 %       the case, with the added line: "model_value = new_model_value;".
0109 %
0110 %   NB April 3, 2015:
0111 %       Works with the new StoreDB class system.
0112 %
0113 %   NB April 17, 2018:
0114 %       Two changes:
0115 %        (a) Instead of updating delta and Hdelta, we now update -delta and
0116 %            -Hdelta, named mdelta and Hmdelta. This allows to spare one
0117 %            call to lincomb(x, -1, z).
0118 %        (b) We re-project mdelta to the tangent space at every iteration,
0119 %            to avoid drifting away from it. The choice to project mdelta
0120 %            specifically is motivated by the fact that this is the vector
0121 %            passed to getHessian, hence this is where accurate tangency
0122 %            may be most important. (All other operations are linear
0123 %            combinations of tangent vectors, which should be fairly safe.)
0124 %
0125 %   VL June 29, 2022:
0126 %       Renamed tCG to trs_tCG to keep consistent naming with new
0127 %       subproblem solvers for trustregion. Also modified inputs and
0128 %       outputs to be compatible with other subproblem solvers.
0129 
0130 % All terms involving the trust-region radius use an inner product
0131 % w.r.t. the preconditioner; this is because the iterates grow in
0132 % length w.r.t. the preconditioner, guaranteeing that we do not
0133 % re-enter the trust-region.
0134 %
0135 % The following recurrences for Prec-based norms and inner
0136 % products come from [CGT2000], pg. 205, first edition.
0137 % Below, P is the preconditioner.
0138 %
0139 % <eta_k,P*delta_k> =
0140 %          beta_k-1 * ( <eta_k-1,P*delta_k-1> + alpha_k-1 |delta_k-1|^2_P )
0141 % |delta_k|^2_P = <r_k,z_k> + beta_k-1^2 |delta_k-1|^2_P
0142 %
0143 % Therefore, we need to keep track of:
0144 % 1)   |delta_k|^2_P
0145 % 2)   <eta_k,P*delta_k> = <eta_k,delta_k>_P
0146 % 3)   |eta_k  |^2_P
0147 %
0148 % Initial values are given by
0149 %    |delta_0|_P = <r,z>
0150 %    |eta_0|_P   = 0
0151 %    <eta_0,delta_0>_P = 0
0152 % because we take eta_0 = 0 (if useRand = false).
0153 %
0154 % [CGT2000] Conn, Gould and Toint: Trust-region methods, 2000.
0155 
0156 % trustregions only wants header and default values for stats.
0157 if nargin == 3 && isempty(problem) && isempty(trsinput)
0158     if isfield(options, 'useRand') && options.useRand
0159         trsoutput.printheader = sprintf('%9s   %9s   %11s   %s', ...
0160                                 'numinner', 'hessvec', 'used_cauchy', ...
0161                                 'stopreason');
0162         trsoutput.initstats = struct('numinner', 0, 'hessvecevals', 0, ...
0163                                 'cauchy', false);
0164     else
0165         trsoutput.printheader = sprintf('%9s   %9s   %s', 'numinner', ...
0166                                 'hessvec', 'stopreason');
0167         trsoutput.initstats = struct('numinner', 0, 'hessvecevals', 0);
0168     end
0169     return;
0170 end
0171 
0172 x = trsinput.x;
0173 Delta = trsinput.Delta;
0174 grad = trsinput.fgradx;
0175 
0176 M = problem.M;
0177 
0178 inner   = @(u, v) M.inner(x, u, v);
0179 lincomb = @(a, u, b, v) M.lincomb(x, a, u, b, v);
0180 tangent = @(u) M.tangent(x, u);
0181 
0182 % Set local defaults here
0183 localdefaults.kappa = 0.1;
0184 localdefaults.theta = 1.0;
0185 localdefaults.mininner = 1;
0186 localdefaults.maxinner = M.dim();
0187 localdefaults.useRand = false;
0188 
0189 % Merge local defaults with user options, if any
0190 if ~exist('options', 'var') || isempty(options)
0191     options = struct();
0192 end
0193 options = mergeOptions(localdefaults, options);
0194 
0195 theta = options.theta;
0196 kappa = options.kappa;
0197 
0198 % returned boolean to trustregions.m. true if we are limited by the TR
0199 % boundary (returns boundary solution). Otherwise false.
0200 limitedbyTR = false;
0201 
0202 % Determine eta0 and other useRand dependent initializations
0203 if ~options.useRand
0204     % Pick the zero vector
0205     eta = M.zerovec(x);
0206     Heta = M.zerovec(x);
0207     r = grad;
0208     e_Pe = 0;
0209 else
0210     % Random vector in T_x M (this has to be very small)
0211     eta = M.lincomb(x, 1e-6, M.randvec(x));
0212     % Must be inside trust-region
0213     while M.norm(x, eta) > Delta
0214         eta = M.lincomb(x, sqrt(sqrt(eps)), eta);
0215     end
0216     Heta = getHessian(problem, x, eta, storedb, key);
0217     r = lincomb(1, grad, 1, Heta);
0218     e_Pe = inner(eta, eta);
0219 end
0220 
0221 r_r = inner(r, r);
0222 norm_r = sqrt(r_r);
0223 norm_r0 = norm_r;
0224 
0225 % Precondition the residual.
0226 if ~options.useRand
0227     z = getPrecon(problem, x, r, storedb, key);
0228 else
0229     z = r;
0230 end
0231 
0232 % Compute z'*r.
0233 z_r = inner(z, r);
0234 d_Pd = z_r;
0235 
0236 % Initial search direction (we maintain -delta in memory, called mdelta, to
0237 % avoid a change of sign of the tangent vector.)
0238 mdelta = z;
0239 if ~options.useRand % and therefore, eta == 0
0240     e_Pd = 0;
0241 else % and therefore, no preconditioner
0242     e_Pd = -inner(eta, mdelta);
0243 end
0244 
0245 % If the Hessian or a linear Hessian approximation is in use, it is
0246 % theoretically guaranteed that the model value decreases strictly
0247 % with each iteration of tCG. Hence, there is no need to monitor the model
0248 % value. But, when a nonlinear Hessian approximation is used (such as the
0249 % built-in finite-difference approximation for example), the model may
0250 % increase. It is then important to terminate the tCG iterations and return
0251 % the previous (the best-so-far) iterate. The variable below will hold the
0252 % model value.
0253 %
0254 % This computation could be further improved based on Section 17.4.1 in
0255 % Conn, Gould, Toint, Trust Region Methods, 2000.
0256 % If we make this change, then also modify trustregions to gather this
0257 % value from tCG rather than recomputing it itself.
0258 model_fun = @(eta, Heta) inner(eta, grad) + .5*inner(eta, Heta);
0259 if ~options.useRand
0260     model_value = 0;
0261 else
0262     model_value = model_fun(eta, Heta);
0263 end
0264 
0265 % Pre-assume termination because j == end.
0266 stopreason_str = 'maximum inner iterations';
0267 
0268 % Begin inner/tCG loop.
0269 for j = 1 : options.maxinner
0270     
0271     % This call is the computationally expensive step.
0272     Hmdelta = getHessian(problem, x, mdelta, storedb, key);
0273     
0274     % Compute curvature (often called kappa).
0275     d_Hd = inner(mdelta, Hmdelta);
0276     
0277     
0278     % Note that if d_Hd == 0, we will exit at the next "if" anyway.
0279     alpha = z_r/d_Hd;
0280     % <neweta,neweta>_P =
0281     % <eta,eta>_P + 2*alpha*<eta,delta>_P + alpha*alpha*<delta,delta>_P
0282     e_Pe_new = e_Pe + 2.0*alpha*e_Pd + alpha*alpha*d_Pd;
0283     
0284     if options.debug > 2
0285         fprintf('DBG:   (r,r)  : %e\n', r_r);
0286         fprintf('DBG:   (d,Hd) : %e\n', d_Hd);
0287         fprintf('DBG:   alpha  : %e\n', alpha);
0288     end
0289     
0290     % Check against negative curvature and trust-region radius violation.
0291     % If either condition triggers, we bail out.
0292     if d_Hd <= 0 || e_Pe_new >= Delta^2
0293         % want
0294         %  ee = <eta,eta>_prec,x
0295         %  ed = <eta,delta>_prec,x
0296         %  dd = <delta,delta>_prec,x
0297         % Note (Nov. 26, 2021, NB): numerically, it might be better to call
0298         %   tau = max(real(roots([d_Pd, 2*e_Pd, e_Pe-Delta^2])));
0299         % This should be checked.
0300         % Also, we should safe-guard against 0/0: could happen if grad = 0.
0301         tau = (-e_Pd + sqrt(e_Pd*e_Pd + d_Pd*(Delta^2-e_Pe))) / d_Pd;
0302         if options.debug > 2
0303             fprintf('DBG:     tau  : %e\n', tau);
0304         end
0305         eta = lincomb(1,  eta, -tau,  mdelta);
0306         
0307         % If only a nonlinear Hessian approximation is available, this is
0308         % only approximately correct, but saves an additional Hessian call.
0309         Heta = lincomb(1, Heta, -tau, Hmdelta);
0310         
0311         % Technically, we may want to verify that this new eta is indeed
0312         % better than the previous eta before returning it (this is always
0313         % the case if the Hessian approximation is linear, but I am unsure
0314         % whether it is the case or not for nonlinear approximations.)
0315         % At any rate, the impact should be limited, so in the interest of
0316         % code conciseness (if we can still hope for that), we omit this.
0317 
0318         limitedbyTR = true;
0319 
0320         if d_Hd <= 0
0321             stopreason_str = 'negative curvature';
0322         else
0323             stopreason_str = 'exceeded trust region';
0324         end
0325 
0326         break;
0327     end
0328     
0329     % No negative curvature and eta_prop inside TR: accept it.
0330     e_Pe = e_Pe_new;
0331     new_eta  = lincomb(1,  eta, -alpha,  mdelta);
0332     
0333     % If only a nonlinear Hessian approximation is available, this is
0334     % only approximately correct, but saves an additional Hessian call.
0335     % TODO: this computation is redundant with that of r, L241. Clean up.
0336     new_Heta = lincomb(1, Heta, -alpha, Hmdelta);
0337     
0338     % Verify that the model cost decreased in going from eta to new_eta. If
0339     % it did not (which can only occur if the Hessian approximation is
0340     % nonlinear or because of numerical errors), then we return the
0341     % previous eta (which necessarily is the best reached so far, according
0342     % to the model cost). Otherwise, we accept the new eta and go on.
0343     new_model_value = model_fun(new_eta, new_Heta);
0344     if new_model_value >= model_value
0345         stopreason_str = 'model increased';
0346         break;
0347     end
0348     
0349     eta = new_eta;
0350     Heta = new_Heta;
0351     model_value = new_model_value; %% added Feb. 17, 2015
0352     
0353     % Update the residual.
0354     r = lincomb(1, r, -alpha, Hmdelta);
0355     
0356     % Compute new norm of r.
0357     r_r = inner(r, r);
0358     norm_r = sqrt(r_r);
0359     
0360     % Check kappa/theta stopping criterion.
0361     % Note that it is somewhat arbitrary whether to check this stopping
0362     % criterion on the r's (the gradients) or on the z's (the
0363     % preconditioned gradients). [CGT2000], page 206, mentions both as
0364     % acceptable criteria.
0365     if j >= options.mininner && norm_r <= norm_r0*min(norm_r0^theta, kappa)
0366         % Residual is small enough to quit
0367         if kappa < norm_r0^theta
0368             stopreason_str = 'reached target residual-kappa (linear)';
0369         else
0370             stopreason_str = 'reached target residual-theta (superlinear)';
0371         end
0372         break;
0373     end
0374     
0375     % Precondition the residual.
0376     if ~options.useRand
0377         z = getPrecon(problem, x, r, storedb, key);
0378     else
0379         z = r;
0380     end
0381     
0382     % Save the old z'*r.
0383     zold_rold = z_r;
0384     % Compute new z'*r.
0385     z_r = inner(z, r);
0386     
0387     % Compute new search direction.
0388     beta = z_r/zold_rold;
0389     mdelta = lincomb(1, z, beta, mdelta);
0390     
0391     % Since mdelta is passed to getHessian, which is the part of the code
0392     % we have least control over from here, we want to make sure mdelta is
0393     % a tangent vector up to numerical errors that should remain small.
0394     % For this reason, we re-project mdelta to the tangent space.
0395     % In limited tests, it was observed that it is a good idea to project
0396     % at every iteration rather than only every k iterations, the reason
0397     % being that loss of tangency can lead to more inner iterations being
0398     % run, which leads to an overall higher computational cost.
0399     mdelta = tangent(mdelta);
0400     
0401     % Update new P-norms and P-dots [CGT2000, eq. 7.5.6 & 7.5.7].
0402     e_Pd = beta*(e_Pd + alpha*d_Pd);
0403     d_Pd = z_r + beta*beta*d_Pd;
0404     
0405 end  % of tCG loop
0406 
0407 % If using randomized approach, compare result with the Cauchy point.
0408 % Convergence proofs assume that we achieve at least (a fraction of)
0409 % the reduction of the Cauchy point. After this if-block, either all
0410 % eta-related quantities have been changed consistently, or none of
0411 % them have changed.
0412 if options.useRand
0413     used_cauchy = false;
0414     
0415     norm_grad = M.norm(x, grad);
0416     
0417     % Check the curvature,
0418     Hg = getHessian(problem, x, grad, storedb, key);
0419     g_Hg = M.inner(x, grad, Hg);
0420     if g_Hg <= 0
0421         tau_c = 1;
0422     else
0423         tau_c = min( norm_grad^3/(Delta*g_Hg) , 1);
0424     end
0425     % and generate the Cauchy point.
0426     eta_c  = M.lincomb(x, -tau_c * Delta / norm_grad, grad);
0427     Heta_c = M.lincomb(x, -tau_c * Delta / norm_grad, Hg);
0428 
0429     % Now that we have computed the Cauchy point in addition to the
0430     % returned eta, we might as well keep the best of them.
0431     mdle  = M.inner(x, grad, eta) + .5*M.inner(x, Heta,   eta);
0432     mdlec = M.inner(x, grad, eta_c) + .5*M.inner(x, Heta_c, eta_c);
0433 
0434     if mdlec < mdle
0435         eta = eta_c;
0436         Heta = Heta_c; % added April 11, 2012
0437         used_cauchy = true;
0438     end
0439 end
0440 
0441 printstr = sprintf('%9d   %9d   %s', j, j, stopreason_str);
0442 stats = struct('numinner', j, 'hessvecevals', j);
0443 
0444 if options.useRand
0445     stats.cauchy = used_cauchy;
0446     printstr = sprintf('%9d   %9d   %11s   %s', j, j, ...
0447                 string(used_cauchy), stopreason_str);
0448 end
0449 
0450 trsoutput.eta = eta;
0451 trsoutput.Heta = Heta;
0452 trsoutput.limitedbyTR = limitedbyTR;
0453 trsoutput.printstr = printstr;
0454 trsoutput.stats = stats;
0455 end
trs_tCG

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SOURCE CODE