Description of trs_tCG

0001 function trsoutput = trs_tCG_cached(problem, trsinput, options, storedb, key)
0002 % Truncated (Steihaug-Toint) Conjugate-Gradient method with caching.
0003 %
0004 % minimize <eta,grad> + .5*<eta,Hess(eta)>
0005 % subject to <eta,eta>_[inverse precon] <= Delta^2
0006 %
0007 % function trsoutput = trs_tCG_cached(problem, trsinput, options, storedb, key)
0008 %
0009 % trs_tCG_cached stores information (when options.trscache == true)
0010 % which can help avoid redundant computations (using tCG_rejectedstep)
0011 % upon step rejection by trustregions compared to trs_tCG at the cost of
0012 % using extra memory.
0013 %
0014 % Inputs:
0015 %   problem: Manopt optimization problem structure
0016 %   trsinput: structure with the following fields:
0017 %       x: point on the manifold problem.M
0018 %       fgradx: gradient of the cost function of the problem at x
0019 %       Delta = trust-region radius
0020 %   options: structure containing options for the subproblem solver
0021 %   storedb, key: manopt's caching system for the point x
0022 %
0023 % Options specific to this subproblem solver:
0024 %   kappa (0.1)
0025 %       kappa convergence tolerance.
0026 %       kappa > 0 is the linear convergence target rate: trs_tCG_cached
0027 %       terminates early if the residual was reduced by a factor of kappa.
0028 %   theta (1.0)
0029 %       theta convergence tolerance.
0030 %       1+theta (theta between 0 and 1) is the superlinear convergence
0031 %       target rate. trs_tCG_cached terminates early if the residual
0032 %       was reduced by a power of 1+theta.
0033 %   mininner (1)
0034 %       Minimum number of inner iterations.
0035 %   maxinner (problem.M.dim())
0036 %       Maximum number of inner iterations.
0037 %   trscache (true)
0038 %       Set to false if no caching for the trs_tCG_cached is desired. It is
0039 %       default true to improve computation time if there are many step
0040 %       rejections in trustregions. Setting trscache to false can reduce
0041 %       memory usage.
0042 %   memorytCG_warningtol (1000)
0043 %       Tolerance memory value in MB before issuing warning when
0044 %       trscache = true.
0045 %       The default is 1GB but this value can be increased depending on the
0046 %       user's machine. To disable the warning completely use:
0047 %       warning('off', 'manopt:trs_tCG_cached:memory')
0048 %
0049 % Output: the structure trsoutput contains the following fields:
0050 %   eta: approximate solution to the trust-region subproblem at x
0051 %   Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it
0052 %       is often naturally available to the subproblem solver at the
0053 %       end of execution, so that it may be cheaper to return it here.
0054 %   limitedbyTR: true if a boundary solution is returned
0055 %   printstr: logged information to be printed by trustregions.
0056 %   stats: structure with the following statistics:
0057 %           numinner: number of inner loops before returning
0058 %           hessvecevals: number of Hessian calls issued
0059 %           memorytCG_MB: memory of store_iters and store_last in MB
0060 %
0061 % Stored Information:
0062 %   store_iters: a struct array with enough information to compute the next
0063 %       step upon step rejection when the algorithm exits due to negative
0064 %       curvature or trust-region radius violation.
0065 %   store_last: an additional struct to store_iters to compute the next
0066 %       step upon step rejection when the algorithm exits but not due to
0067 %       negative curvature or trust-region radius violation
0068 %
0069 %
0070 % trs_tCG_cached can also be called in the following way (by trustregions)
0071 % to obtain part of the header to print and an initial stats structure:
0072 %
0073 % function trsoutput = trs_tCG_cached([], [], options)
0074 %
0075 % In this case trsoutput contains the following fields:
0076 %   printheader: subproblem header to be printed before the first pass of
0077 %       trustregions
0078 %   initstats: struct with initial values for stored stats in subsequent
0079 %       calls to trs_tCG_cached. Used in the first call to savestats
0080 %       in trustregions to initialize the info struct properly.
0081 %
0082 % See also: trustregions trs_tCG tCG_rejectedstep
0083 
0084 % This file is part of Manopt: www.manopt.org.
0085 % This code is an adaptation to Manopt of the original GenRTR code:
0086 % RTR - Riemannian Trust-Region
0087 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan
0088 % Florida State University
0089 % School of Computational Science
0090 % (http://www.math.fsu.edu/~cbaker/GenRTR/?page=download)
0091 % See accompanying license file.
0092 % The adaptation was executed by Nicolas Boumal.
0093 %
0094 % Change log:
0095 %
0096 %   VL June 24, 2022:
0097 %       trs_tCG_cached by default stores information at each iteration
0098 %       compared to trs_tCG.
0099 %       This can be useful for the next call to trs_tCG_cached and the work
0100 %       is passed to tCG_rejectedstep rather than the normal tCG loop.
0101 
0102 
0103 % See trs_tCG for references to relevant equations in
0104 % [CGT2000] Conn, Gould and Toint: Trust-region methods, 2000.
0105 
0106 % trustregions only wants header and default values for stats.
0107 if nargin == 3 && isempty(problem) && isempty(trsinput)
0108     trsoutput.printheader = '';
0109     if options.verbosity == 2
0110         trsoutput.printheader = sprintf('%9s   %9s   %9s   %s', ...
0111                             'numinner', 'hessvec', 'numstored', ...
0112                             'stopreason');
0113     elseif options.verbosity > 2
0114         trsoutput.printheader = sprintf('%9s   %9s   %9s   %9s   %s', ...
0115                             'numinner', 'hessvec', 'numstored', ...
0116                             'memtCG_MB', 'stopreason');
0117     end
0118     trsoutput.initstats = struct('numinner', 0, 'hessvecevals', 0, ...
0119                    'memorytCG_MB', 0);
0120     return;
0121 end
0122 
0123 if isfield(options, 'useRand') && options.useRand
0124     warning('manopt:trs_tCG_cached:rand', ...
0125             ['options.useRand = true but @trs_tCG_cached ignores it.\n' ...
0126              'You may set options.subproblemsolver = @trs_tCG;\n', ...
0127              'Alternatively, set options.useRand = false;']);
0128 end
0129 
0130 x = trsinput.x;
0131 Delta = trsinput.Delta;
0132 grad = trsinput.fgradx;
0133 
0134 inner   = @(u, v) problem.M.inner(x, u, v);
0135 lincomb = @(a, u, b, v) problem.M.lincomb(x, a, u, b, v);
0136 tangent = @(u) problem.M.tangent(x, u);
0137 
0138 % Set local defaults here
0139 localdefaults.kappa = 0.1;
0140 localdefaults.theta = 1.0;
0141 localdefaults.mininner = 1;
0142 localdefaults.maxinner = problem.M.dim();
0143 localdefaults.trscache = true;
0144 localdefaults.memorytCG_warningtol = 1000;
0145 
0146 % Merge local defaults with user options, if any
0147 if ~exist('options', 'var') || isempty(options)
0148     options = struct();
0149 end
0150 options = mergeOptions(localdefaults, options);
0151 
0152 % If the previous step was rejected and we want to use caching,
0153 if ~trsinput.accept && options.trscache
0154     % Then check if there is cached information for the current point.
0155     store = storedb.get(key);
0156     if isfield(store, 'store_iters')
0157         % If so, use that cache to produce the same output as would have
0158         % been produced by running the code below (after the 'return'), but
0159         % without issuing Hessian-vector calls that were already issued.
0160         trsoutput = tCG_rejectedstep(problem, trsinput, options, store);
0161         return;
0162     end
0163 end
0164 
0165 % returned boolean to trustregions. true if we are limited by the TR
0166 % boundary (returns boundary solution). Otherwise false.
0167 limitedbyTR = false;
0168 
0169 theta = options.theta;
0170 kappa = options.kappa;
0171 
0172 eta = problem.M.zerovec(x);
0173 Heta = problem.M.zerovec(x);
0174 r = grad;
0175 e_Pe = 0;
0176 
0177 r_r = inner(r, r);
0178 norm_r = sqrt(r_r);
0179 norm_r0 = norm_r;
0180 
0181 % Precondition the residual.
0182 z = getPrecon(problem, x, r, storedb, key);
0183 
0184 % Compute z'*r.
0185 z_r = inner(z, r);
0186 d_Pd = z_r;
0187 
0188 % Initial search direction (we maintain -delta in memory, called mdelta, to
0189 % avoid a change of sign of the tangent vector.)
0190 mdelta = z;
0191 e_Pd = 0;
0192 
0193 % If the Hessian or a linear Hessian approximation is in use, it is
0194 % theoretically guaranteed that the model value decreases strictly
0195 % with each iteration of trs_tCG. Hence, there is no need to monitor the model
0196 % value. But, when a nonlinear Hessian approximation is used (such as the
0197 % built-in finite-difference approximation for example), the model may
0198 % increase. It is then important to terminate the trs_tCG iterations and return
0199 % the previous (the best-so-far) iterate. The variable below will hold the
0200 % model value.
0201 %
0202 % This computation could be further improved based on Section 17.4.1 in
0203 % Conn, Gould, Toint, Trust Region Methods, 2000.
0204 % If we make this change, then also modify trustregions to gather this
0205 % value from trs_tCG rather than recomputing it itself.
0206 model_fun = @(eta, Heta) inner(eta, grad) + .5*inner(eta, Heta);
0207 model_value = 0;
0208 
0209 % Pre-assume termination because j == end.
0210 stopreason_str = 'maximum inner iterations';
0211 
0212 % Track certain iterations in case step is rejected.
0213 % store_iters tracks candidate etas with increasing squared
0214 % norm relevant when limitedbyTR = true, or when <eta, Heta> <= 0
0215 store_iters = struct('normsq', [], 'numinner', [], 'e_Pe', [], ...
0216     'd_Pd', [], 'e_Pd', [], 'd_Hd', [], 'eta', [], 'Heta', [], ...
0217     'mdelta', [], 'Hmdelta', []);
0218 
0219 max_normsq = 0;
0220 
0221 % only need to compute memory for one item in store_iters in Megabytes(MB)
0222 peritermemory_MB = 0;
0223 
0224 % total cached memory stored in MB
0225 memorytCG_MB = 0;
0226 
0227 % number of iterations where trs_tCG_cached stores information. This value
0228 % will be length(store_iters) plus 1 if store_last is used.
0229 numstored = 0;
0230 
0231 % string that is printed by trustregions. For printing
0232 % per-iteration information
0233 printstr = '';
0234 
0235 % Begin inner/trs_tCG loop.
0236 for j = 1 : options.maxinner
0237     
0238     % This call is the computationally expensive step.
0239     Hmdelta = getHessian(problem, x, mdelta, storedb, key);
0240     
0241     % Compute curvature (often called kappa).
0242     d_Hd = inner(mdelta, Hmdelta);
0243     
0244     
0245     % Note that if d_Hd == 0, we will exit at the next "if" anyway.
0246     alpha = z_r/d_Hd;
0247     % <neweta,neweta>_P =
0248     % <eta,eta>_P + 2*alpha*<eta,delta>_P + alpha*alpha*<delta,delta>_P
0249     e_Pe_new = e_Pe + 2.0*alpha*e_Pd + alpha*alpha*d_Pd;
0250     
0251     if options.debug > 2
0252         fprintf('DBG:   (r,r)  : %e\n', r_r);
0253         fprintf('DBG:   (d,Hd) : %e\n', d_Hd);
0254         fprintf('DBG:   alpha  : %e\n', alpha);
0255     end
0256 
0257     if options.trscache
0258         % Selectively store info in store_iter.
0259         % next_smallest = (1/4^n Delta)^2 with n the smallest integer such
0260         % that max_normsq <= next_smallest.
0261         % We use this condition to only store relevant iterations in case
0262         % of rejection in trustregions.
0263         if max_normsq > 0
0264             next_smallest = (1/16)^floor(-(1/4)*(log2(max_normsq) - ...
0265                                                  log2(Delta^2))) * Delta^2;
0266         else
0267             next_smallest = 0;
0268         end
0269     
0270         if d_Hd <= 0 || e_Pe_new >= next_smallest
0271             numstored = numstored + 1;
0272 
0273             store_iters(numstored) = struct('normsq', e_Pe_new, 'numinner', ...
0274                              j, 'e_Pe', e_Pe, 'd_Pd', d_Pd, 'e_Pd', e_Pd,...
0275                              'd_Hd', d_Hd, 'eta', eta, 'Heta', Heta, ...
0276                              'mdelta', mdelta, 'Hmdelta', Hmdelta);
0277             max_normsq = e_Pe_new;
0278     
0279             % getSize for one entry in store_iters which will be the same
0280             % for all others.
0281             if peritermemory_MB == 0
0282                 peritermemory_MB = getsize(store_iters(numstored))/1024^2;
0283             end
0284     
0285             memorytCG_MB = memorytCG_MB + peritermemory_MB;
0286             
0287             if memorytCG_MB > options.memorytCG_warningtol
0288                 warning('manopt:trs_tCG_cached:memory', ...
0289                 [sprintf('trs_tCG_cached will cache %.2f [MB] for at least one iteration of trustregions until a step is accepted.', memorytCG_MB) ...
0290                 'If memory is limited turn off caching by options.trscache = false.\n' ...
0291                 'To disable this warning: warning(''off'', ''manopt:trs_tCG_cached:memory'')']);
0292              end
0293             
0294         end
0295     end
0296 
0297     % Check against negative curvature and trust-region radius violation.
0298     % If either condition triggers, we bail out.
0299     if d_Hd <= 0 || e_Pe_new >= Delta^2
0300         % want
0301         %  ee = <eta,eta>_prec,x
0302         %  ed = <eta,delta>_prec,x
0303         %  dd = <delta,delta>_prec,x
0304         % Note (Nov. 26, 2021, NB): numerically, it might be better to call
0305         %   tau = max(real(roots([d_Pd, 2*e_Pd, e_Pe-Delta^2])));
0306         % This should be checked.
0307         % Also, we should safe-guard against 0/0: could happen if grad = 0.
0308 
0309         % store new struct containing all the required info in store_iter
0310         tau = (-e_Pd + sqrt(e_Pd*e_Pd + d_Pd*(Delta^2-e_Pe))) / d_Pd;
0311         if options.debug > 2
0312             fprintf('DBG:     tau  : %e\n', tau);
0313         end
0314         eta = lincomb(1,  eta, -tau,  mdelta);
0315         
0316         % If only a nonlinear Hessian approximation is available, this is
0317         % only approximately correct, but saves an additional Hessian call.
0318         Heta = lincomb(1, Heta, -tau, Hmdelta);
0319 
0320         % Technically, we may want to verify that this new eta is indeed
0321         % better than the previous eta before returning it (this is always
0322         % the case if the Hessian approximation is linear, but I am unsure
0323         % whether it is the case or not for nonlinear approximations.)
0324         % At any rate, the impact should be limited, so in the interest of
0325         % code conciseness (if we can still hope for that), we omit this.
0326         
0327         limitedbyTR = true;
0328         
0329         if d_Hd <= 0
0330             stopreason_str = 'negative curvature';
0331         else
0332             stopreason_str = 'exceeded trust region';
0333         end
0334         break;
0335     end
0336     
0337     % No negative curvature and eta_prop inside TR: accept it.
0338     e_Pe = e_Pe_new;
0339     new_eta  = lincomb(1,  eta, -alpha,  mdelta);
0340     
0341     % If only a nonlinear Hessian approximation is available, this is
0342     % only approximately correct, but saves an additional Hessian call.
0343     % TODO: this computation is redundant with that of r, L241. Clean up.
0344     new_Heta = lincomb(1, Heta, -alpha, Hmdelta);
0345     
0346     % Verify that the model cost decreased in going from eta to new_eta. If
0347     % it did not (which can only occur if the Hessian approximation is
0348     % nonlinear or because of numerical errors), then we return the
0349     % previous eta (which necessarily is the best reached so far, according
0350     % to the model cost). Otherwise, we accept the new eta and go on.
0351     new_model_value = model_fun(new_eta, new_Heta);
0352     if new_model_value >= model_value
0353         stopreason_str = 'model increased';
0354         break;
0355     end
0356     
0357     eta = new_eta;
0358     Heta = new_Heta;
0359     model_value = new_model_value; %% added Feb. 17, 2015
0360     
0361     % Update the residual.
0362     r = lincomb(1, r, -alpha, Hmdelta);
0363     
0364     % Compute new norm of r.
0365     r_r = inner(r, r);
0366     norm_r = sqrt(r_r);
0367 
0368     % Check kappa/theta stopping criterion.
0369     % Note that it is somewhat arbitrary whether to check this stopping
0370     % criterion on the r's (the gradients) or on the z's (the
0371     % preconditioned gradients). [CGT2000], page 206, mentions both as
0372     % acceptable criteria.
0373     if j >= options.mininner && norm_r <= norm_r0*min(norm_r0^theta, kappa)
0374         % Residual is small enough to quit
0375         if kappa < norm_r0^theta
0376             stopreason_str = 'reached target residual-kappa (linear)';
0377         else
0378             stopreason_str = 'reached target residual-theta (superlinear)';
0379         end
0380         break;
0381     end
0382     
0383     % Precondition the residual.
0384     z = getPrecon(problem, x, r, storedb, key);
0385     
0386     % Save the old z'*r.
0387     zold_rold = z_r;
0388     % Compute new z'*r.
0389     z_r = inner(z, r);
0390     
0391     % Compute new search direction.
0392     beta = z_r/zold_rold;
0393     mdelta = lincomb(1, z, beta, mdelta);
0394     
0395     % Since mdelta is passed to getHessian, which is the part of the code
0396     % we have least control over from here, we want to make sure mdelta is
0397     % a tangent vector up to numerical errors that should remain small.
0398     % For this reason, we re-project mdelta to the tangent space.
0399     % In limited tests, it was observed that it is a good idea to project
0400     % at every iteration rather than only every k iterations, the reason
0401     % being that loss of tangency can lead to more inner iterations being
0402     % run, which leads to an overall higher computational cost.
0403     mdelta = tangent(mdelta);
0404     
0405     % Update new P-norms and P-dots [CGT2000, eq. 7.5.6 & 7.5.7].
0406     e_Pd = beta*(e_Pd + alpha*d_Pd);
0407     d_Pd = z_r + beta*beta*d_Pd;
0408     
0409 end  % of trs_tCG loop
0410 
0411 if options.trscache
0412     store = storedb.get(key);
0413     store.store_iters = store_iters;
0414     if ~limitedbyTR
0415         % Store extra information since we did not exit because we were
0416         % limited by TR (model value increased or kappa/theta stopping
0417         % criterion satisfied)
0418         store_last = struct('numinner', j, 'stopreason_str', ...
0419             stopreason_str, 'eta', eta, 'Heta', Heta);
0420         memorytCG_MB = memorytCG_MB + getsize(store_last)/1024^2;
0421         
0422         if memorytCG_MB > options.memorytCG_warningtol
0423             warning('manopt:trs_tCG_cached:memory', ...
0424             [sprintf('trs_tCG_cached will cache %.2f [MB] for at least one iteration of trustregions until a step is accepted.', memorytCG_MB) ...
0425             'If memory is limited turn off caching by options.trscache = false.\n' ...
0426             'If more memory can be used without problem increase options.memorytCG_warningtol accordingly.\n' ...
0427             'To disable this warning: warning(''off'', ''manopt:trs_tCG_cached:memory'')']);
0428         end
0429         store.store_last = store_last;
0430         
0431         numstored = numstored + 1;
0432     end
0433 
0434     storedb.set(store, key);
0435 end
0436 stats = struct('numinner', j, 'hessvecevals', j, ...
0437                     'memorytCG_MB', memorytCG_MB);
0438 
0439 if options.verbosity == 2
0440     printstr = sprintf('%9d   %9d   %9d   %s', j, j, numstored, ...
0441                 stopreason_str);
0442 elseif options.verbosity > 2
0443     printstr = sprintf('%9d   %9d   %9d   %9.2f   %s', j, j, numstored, ...
0444                 memorytCG_MB, stopreason_str);
0445 end
0446 
0447 trsoutput.eta = eta;
0448 trsoutput.Heta = Heta;
0449 trsoutput.limitedbyTR = limitedbyTR;
0450 trsoutput.printstr = printstr;
0451 trsoutput.stats = stats;
0452 end
trs_tCG_cached

PURPOSE

SYNOPSIS

DESCRIPTION

CROSS-REFERENCE INFORMATION

SOURCE CODE