Home > manopt > solvers > trustregions > trs_tCG.m

# trs_tCG

## SYNOPSIS

function trsoutput = trs_tCG(problem, trsinput, options, storedb, key)

## DESCRIPTION

``` Truncated (Steihaug-Toint) Conjugate-Gradient method.

subject to <eta,eta>_[inverse precon] <= Delta^2

function trsoutput = trs_tCG(problem, trsinput, options, storedb, key)

Inputs:
problem: Manopt optimization problem structure
trsinput: structure with the following fields:
x: point on the manifold problem.M
vector if options.useRand == true.
options: structure containing options for the subproblem solver
storedb, key: manopt's caching system for the point x

Options specific to this subproblem solver:
kappa (0.1)
kappa convergence tolerance.
kappa > 0 is the linear convergence target rate: trs_tCG will
terminate early if the residual was reduced by a factor of kappa.
theta (1.0)
theta convergence tolerance.
1+theta (theta between 0 and 1) is the superlinear convergence
target rate. trs_tCG will terminate early if the residual was
reduced by a power of 1+theta.
mininner (1)
Minimum number of inner iterations for trs_tCG.
maxinner (problem.M.dim() : the manifold's dimension)
Maximum number of inner iterations for trs_tCG.
useRand (false)
Set to true if the trust-region solve is to be initiated with a
random tangent vector. If set to true, no preconditioner will be
used. This option is set to true in some scenarios to escape saddle
points, but is otherwise seldom activated.

Output: the structure trsoutput contains the following fields:
eta: approximate solution to the trust-region subproblem at x
Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it
is often naturally available to the subproblem solver at the
end of execution, so that it may be cheaper to return it here.
limitedbyTR: true if a boundary solution is returned
printstr: logged information to be printed by trustregions.
stats: structure with the following statistics:
numinner: number of inner loops before returning
hessvecevals: number of Hessian calls issued
cauchy: true if cauchy step was used. false by default. Will
be present in stats only if options.useRand = true

trs_tCG can also be called in the following way (by trustregions) to
obtain part of the header to print and an initial stats structure:

function trsoutput = trs_tCG([], [], options)

In this case trsoutput contains the following fields:
trustregions
initstats: struct with initial values for stored stats in subsequent
calls to trs_tCG. Used in the first call to savestats
in trustregions to initialize the info struct properly.

## CROSS-REFERENCE INFORMATION

This function calls:
• getHessian Computes the Hessian of the cost function at x along d.
• getPrecon Applies the preconditioner for the Hessian of the cost at x along d.
• mergeOptions Merges two options structures with one having precedence over the other.
• lincomb Computes a linear combination of tangent vectors in the Manopt framework.
This function is called by:

## SOURCE CODE

```0001 function trsoutput = trs_tCG(problem, trsinput, options, storedb, key)
0002 % Truncated (Steihaug-Toint) Conjugate-Gradient method.
0003 %
0004 % minimize <eta,grad> + .5*<eta,Hess(eta)>
0005 % subject to <eta,eta>_[inverse precon] <= Delta^2
0006 %
0007 % function trsoutput = trs_tCG(problem, trsinput, options, storedb, key)
0008 %
0009 % Inputs:
0010 %   problem: Manopt optimization problem structure
0011 %   trsinput: structure with the following fields:
0012 %       x: point on the manifold problem.M
0013 %       fgradx: gradient of the cost function of the problem at x
0014 %       vector if options.useRand == true.
0015 %       Delta = trust-region radius
0016 %   options: structure containing options for the subproblem solver
0017 %   storedb, key: manopt's caching system for the point x
0018 %
0019 % Options specific to this subproblem solver:
0020 %   kappa (0.1)
0021 %       kappa convergence tolerance.
0022 %       kappa > 0 is the linear convergence target rate: trs_tCG will
0023 %       terminate early if the residual was reduced by a factor of kappa.
0024 %   theta (1.0)
0025 %       theta convergence tolerance.
0026 %       1+theta (theta between 0 and 1) is the superlinear convergence
0027 %       target rate. trs_tCG will terminate early if the residual was
0028 %       reduced by a power of 1+theta.
0029 %   mininner (1)
0030 %       Minimum number of inner iterations for trs_tCG.
0031 %   maxinner (problem.M.dim() : the manifold's dimension)
0032 %       Maximum number of inner iterations for trs_tCG.
0033 %   useRand (false)
0034 %       Set to true if the trust-region solve is to be initiated with a
0035 %       random tangent vector. If set to true, no preconditioner will be
0036 %       used. This option is set to true in some scenarios to escape saddle
0037 %       points, but is otherwise seldom activated.
0038 %
0039 % Output: the structure trsoutput contains the following fields:
0040 %   eta: approximate solution to the trust-region subproblem at x
0041 %   Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it
0042 %       is often naturally available to the subproblem solver at the
0043 %       end of execution, so that it may be cheaper to return it here.
0044 %   limitedbyTR: true if a boundary solution is returned
0045 %   printstr: logged information to be printed by trustregions.
0046 %   stats: structure with the following statistics:
0047 %       numinner: number of inner loops before returning
0048 %       hessvecevals: number of Hessian calls issued
0049 %       cauchy: true if cauchy step was used. false by default. Will
0050 %       be present in stats only if options.useRand = true
0051 %
0052 % trs_tCG can also be called in the following way (by trustregions) to
0053 % obtain part of the header to print and an initial stats structure:
0054 %
0055 % function trsoutput = trs_tCG([], [], options)
0056 %
0057 % In this case trsoutput contains the following fields:
0058 %   printheader: subproblem header to be printed before the first loop of
0059 %       trustregions
0060 %   initstats: struct with initial values for stored stats in subsequent
0061 %       calls to trs_tCG. Used in the first call to savestats
0062 %       in trustregions to initialize the info struct properly.
0063 %
0065
0066 % This file is part of Manopt: www.manopt.org.
0067 % This code is an adaptation to Manopt of the original GenRTR code:
0068 % RTR - Riemannian Trust-Region
0069 % (c) 2004-2007, P.-A. Absil, C. G. Baker, K. A. Gallivan
0070 % Florida State University
0071 % School of Computational Science
0073 % See accompanying license file.
0074 % The adaptation was executed by Nicolas Boumal.
0075 %
0076 % Change log:
0077 %
0078 %   NB Feb. 12, 2013:
0079 %       We do not project r back to the tangent space anymore: it was not
0080 %       necessary, and as of Manopt 1.0.1, the proj operator does not
0081 %       coincide with this notion anymore.
0082 %
0083 %   NB April 3, 2013:
0084 %       tCG now also returns Heta, the Hessian at x along eta. Additional
0085 %       esthetic modifications.
0086 %
0087 %   NB Dec. 2, 2013:
0088 %       If options.useRand is activated, we now make sure the preconditio-
0089 %       ner is not used, as was originally intended in GenRTR. In time, we
0090 %       may want to investigate whether useRand can be modified to work well
0091 %       with preconditioning too.
0092 %
0093 %   NB Jan. 9, 2014:
0094 %       Now checking explicitly for model decrease at each iteration. The
0095 %       first iteration is a Cauchy point, which necessarily realizes a
0096 %       decrease of the model cost. If a model increase is witnessed
0097 %       (which is theoretically impossible if a linear operator is used for
0098 %       the Hessian approximation), then we return the previous eta. This
0099 %       ensures we always achieve at least the Cauchy decrease, which
0100 %       should be sufficient for convergence.
0101 %
0102 %   NB Feb. 17, 2015:
0103 %       The previous update was in effect verifying that the current eta
0104 %       performed at least as well as the first eta (the Cauchy step) with
0105 %       respect to the model cost. While this is an acceptable strategy,
0106 %       the documentation (and the original intent) was to ensure a
0107 %       monotonic decrease of the model cost at each new eta. This is now
0108 %       the case, with the added line: "model_value = new_model_value;".
0109 %
0110 %   NB April 3, 2015:
0111 %       Works with the new StoreDB class system.
0112 %
0113 %   NB April 17, 2018:
0114 %       Two changes:
0115 %        (a) Instead of updating delta and Hdelta, we now update -delta and
0116 %            -Hdelta, named mdelta and Hmdelta. This allows to spare one
0117 %            call to lincomb(x, -1, z).
0118 %        (b) We re-project mdelta to the tangent space at every iteration,
0119 %            to avoid drifting away from it. The choice to project mdelta
0120 %            specifically is motivated by the fact that this is the vector
0121 %            passed to getHessian, hence this is where accurate tangency
0122 %            may be most important. (All other operations are linear
0123 %            combinations of tangent vectors, which should be fairly safe.)
0124 %
0125 %   VL June 29, 2022:
0126 %       Renamed tCG to trs_tCG to keep consistent naming with new
0127 %       subproblem solvers for trustregion. Also modified inputs and
0128 %       outputs to be compatible with other subproblem solvers.
0129
0130 % All terms involving the trust-region radius use an inner product
0131 % w.r.t. the preconditioner; this is because the iterates grow in
0132 % length w.r.t. the preconditioner, guaranteeing that we do not
0133 % re-enter the trust-region.
0134 %
0135 % The following recurrences for Prec-based norms and inner
0136 % products come from [CGT2000], pg. 205, first edition.
0137 % Below, P is the preconditioner.
0138 %
0139 % <eta_k,P*delta_k> =
0140 %          beta_k-1 * ( <eta_k-1,P*delta_k-1> + alpha_k-1 |delta_k-1|^2_P )
0141 % |delta_k|^2_P = <r_k,z_k> + beta_k-1^2 |delta_k-1|^2_P
0142 %
0143 % Therefore, we need to keep track of:
0144 % 1)   |delta_k|^2_P
0145 % 2)   <eta_k,P*delta_k> = <eta_k,delta_k>_P
0146 % 3)   |eta_k  |^2_P
0147 %
0148 % Initial values are given by
0149 %    |delta_0|_P = <r,z>
0150 %    |eta_0|_P   = 0
0151 %    <eta_0,delta_0>_P = 0
0152 % because we take eta_0 = 0 (if useRand = false).
0153 %
0154 % [CGT2000] Conn, Gould and Toint: Trust-region methods, 2000.
0155
0156 % trustregions only wants header and default values for stats.
0157 if nargin == 3 && isempty(problem) && isempty(trsinput)
0158     if isfield(options, 'useRand') && options.useRand
0159         trsoutput.printheader = sprintf('%9s   %9s   %11s   %s', ...
0160                                 'numinner', 'hessvec', 'used_cauchy', ...
0161                                 'stopreason');
0162         trsoutput.initstats = struct('numinner', 0, 'hessvecevals', 0, ...
0163                                 'cauchy', false);
0164     else
0165         trsoutput.printheader = sprintf('%9s   %9s   %s', 'numinner', ...
0166                                 'hessvec', 'stopreason');
0167         trsoutput.initstats = struct('numinner', 0, 'hessvecevals', 0);
0168     end
0169     return;
0170 end
0171
0172 x = trsinput.x;
0173 Delta = trsinput.Delta;
0175
0176 M = problem.M;
0177
0178 inner   = @(u, v) M.inner(x, u, v);
0179 lincomb = @(a, u, b, v) M.lincomb(x, a, u, b, v);
0180 tangent = @(u) M.tangent(x, u);
0181
0182 % Set local defaults here
0183 localdefaults.kappa = 0.1;
0184 localdefaults.theta = 1.0;
0185 localdefaults.mininner = 1;
0186 localdefaults.maxinner = M.dim();
0187 localdefaults.useRand = false;
0188
0189 % Merge local defaults with user options, if any
0190 if ~exist('options', 'var') || isempty(options)
0191     options = struct();
0192 end
0193 options = mergeOptions(localdefaults, options);
0194
0195 theta = options.theta;
0196 kappa = options.kappa;
0197
0198 % returned boolean to trustregions.m. true if we are limited by the TR
0199 % boundary (returns boundary solution). Otherwise false.
0200 limitedbyTR = false;
0201
0202 % Determine eta0 and other useRand dependent initializations
0203 if ~options.useRand
0204     % Pick the zero vector
0205     eta = M.zerovec(x);
0206     Heta = M.zerovec(x);
0208     e_Pe = 0;
0209 else
0210     % Random vector in T_x M (this has to be very small)
0211     eta = M.lincomb(x, 1e-6, M.randvec(x));
0212     % Must be inside trust-region
0213     while M.norm(x, eta) > Delta
0214         eta = M.lincomb(x, sqrt(sqrt(eps)), eta);
0215     end
0216     Heta = getHessian(problem, x, eta, storedb, key);
0217     r = lincomb(1, grad, 1, Heta);
0218     e_Pe = inner(eta, eta);
0219 end
0220
0221 r_r = inner(r, r);
0222 norm_r = sqrt(r_r);
0223 norm_r0 = norm_r;
0224
0225 % Precondition the residual.
0226 if ~options.useRand
0227     z = getPrecon(problem, x, r, storedb, key);
0228 else
0229     z = r;
0230 end
0231
0232 % Compute z'*r.
0233 z_r = inner(z, r);
0234 d_Pd = z_r;
0235
0236 % Initial search direction (we maintain -delta in memory, called mdelta, to
0237 % avoid a change of sign of the tangent vector.)
0238 mdelta = z;
0239 if ~options.useRand % and therefore, eta == 0
0240     e_Pd = 0;
0241 else % and therefore, no preconditioner
0242     e_Pd = -inner(eta, mdelta);
0243 end
0244
0245 % If the Hessian or a linear Hessian approximation is in use, it is
0246 % theoretically guaranteed that the model value decreases strictly
0247 % with each iteration of tCG. Hence, there is no need to monitor the model
0248 % value. But, when a nonlinear Hessian approximation is used (such as the
0249 % built-in finite-difference approximation for example), the model may
0250 % increase. It is then important to terminate the tCG iterations and return
0251 % the previous (the best-so-far) iterate. The variable below will hold the
0252 % model value.
0253 %
0254 % This computation could be further improved based on Section 17.4.1 in
0255 % Conn, Gould, Toint, Trust Region Methods, 2000.
0256 % If we make this change, then also modify trustregions to gather this
0257 % value from tCG rather than recomputing it itself.
0258 model_fun = @(eta, Heta) inner(eta, grad) + .5*inner(eta, Heta);
0259 if ~options.useRand
0260     model_value = 0;
0261 else
0262     model_value = model_fun(eta, Heta);
0263 end
0264
0265 % Pre-assume termination because j == end.
0266 stopreason_str = 'maximum inner iterations';
0267
0268 % Begin inner/tCG loop.
0269 for j = 1 : options.maxinner
0270
0271     % This call is the computationally expensive step.
0272     Hmdelta = getHessian(problem, x, mdelta, storedb, key);
0273
0274     % Compute curvature (often called kappa).
0275     d_Hd = inner(mdelta, Hmdelta);
0276
0277
0278     % Note that if d_Hd == 0, we will exit at the next "if" anyway.
0279     alpha = z_r/d_Hd;
0280     % <neweta,neweta>_P =
0281     % <eta,eta>_P + 2*alpha*<eta,delta>_P + alpha*alpha*<delta,delta>_P
0282     e_Pe_new = e_Pe + 2.0*alpha*e_Pd + alpha*alpha*d_Pd;
0283
0284     if options.debug > 2
0285         fprintf('DBG:   (r,r)  : %e\n', r_r);
0286         fprintf('DBG:   (d,Hd) : %e\n', d_Hd);
0287         fprintf('DBG:   alpha  : %e\n', alpha);
0288     end
0289
0290     % Check against negative curvature and trust-region radius violation.
0291     % If either condition triggers, we bail out.
0292     if d_Hd <= 0 || e_Pe_new >= Delta^2
0293         % want
0294         %  ee = <eta,eta>_prec,x
0295         %  ed = <eta,delta>_prec,x
0296         %  dd = <delta,delta>_prec,x
0297         % Note (Nov. 26, 2021, NB): numerically, it might be better to call
0298         %   tau = max(real(roots([d_Pd, 2*e_Pd, e_Pe-Delta^2])));
0299         % This should be checked.
0300         % Also, we should safe-guard against 0/0: could happen if grad = 0.
0301         tau = (-e_Pd + sqrt(e_Pd*e_Pd + d_Pd*(Delta^2-e_Pe))) / d_Pd;
0302         if options.debug > 2
0303             fprintf('DBG:     tau  : %e\n', tau);
0304         end
0305         eta = lincomb(1,  eta, -tau,  mdelta);
0306
0307         % If only a nonlinear Hessian approximation is available, this is
0308         % only approximately correct, but saves an additional Hessian call.
0309         Heta = lincomb(1, Heta, -tau, Hmdelta);
0310
0311         % Technically, we may want to verify that this new eta is indeed
0312         % better than the previous eta before returning it (this is always
0313         % the case if the Hessian approximation is linear, but I am unsure
0314         % whether it is the case or not for nonlinear approximations.)
0315         % At any rate, the impact should be limited, so in the interest of
0316         % code conciseness (if we can still hope for that), we omit this.
0317
0318         limitedbyTR = true;
0319
0320         if d_Hd <= 0
0321             stopreason_str = 'negative curvature';
0322         else
0323             stopreason_str = 'exceeded trust region';
0324         end
0325
0326         break;
0327     end
0328
0329     % No negative curvature and eta_prop inside TR: accept it.
0330     e_Pe = e_Pe_new;
0331     new_eta  = lincomb(1,  eta, -alpha,  mdelta);
0332
0333     % If only a nonlinear Hessian approximation is available, this is
0334     % only approximately correct, but saves an additional Hessian call.
0335     % TODO: this computation is redundant with that of r, L241. Clean up.
0336     new_Heta = lincomb(1, Heta, -alpha, Hmdelta);
0337
0338     % Verify that the model cost decreased in going from eta to new_eta. If
0339     % it did not (which can only occur if the Hessian approximation is
0340     % nonlinear or because of numerical errors), then we return the
0341     % previous eta (which necessarily is the best reached so far, according
0342     % to the model cost). Otherwise, we accept the new eta and go on.
0343     new_model_value = model_fun(new_eta, new_Heta);
0344     if new_model_value >= model_value
0345         stopreason_str = 'model increased';
0346         break;
0347     end
0348
0349     eta = new_eta;
0350     Heta = new_Heta;
0351     model_value = new_model_value; %% added Feb. 17, 2015
0352
0353     % Update the residual.
0354     r = lincomb(1, r, -alpha, Hmdelta);
0355
0356     % Compute new norm of r.
0357     r_r = inner(r, r);
0358     norm_r = sqrt(r_r);
0359
0360     % Check kappa/theta stopping criterion.
0361     % Note that it is somewhat arbitrary whether to check this stopping
0362     % criterion on the r's (the gradients) or on the z's (the
0363     % preconditioned gradients). [CGT2000], page 206, mentions both as
0364     % acceptable criteria.
0365     if j >= options.mininner && norm_r <= norm_r0*min(norm_r0^theta, kappa)
0366         % Residual is small enough to quit
0367         if kappa < norm_r0^theta
0368             stopreason_str = 'reached target residual-kappa (linear)';
0369         else
0370             stopreason_str = 'reached target residual-theta (superlinear)';
0371         end
0372         break;
0373     end
0374
0375     % Precondition the residual.
0376     if ~options.useRand
0377         z = getPrecon(problem, x, r, storedb, key);
0378     else
0379         z = r;
0380     end
0381
0382     % Save the old z'*r.
0383     zold_rold = z_r;
0384     % Compute new z'*r.
0385     z_r = inner(z, r);
0386
0387     % Compute new search direction.
0388     beta = z_r/zold_rold;
0389     mdelta = lincomb(1, z, beta, mdelta);
0390
0391     % Since mdelta is passed to getHessian, which is the part of the code
0392     % we have least control over from here, we want to make sure mdelta is
0393     % a tangent vector up to numerical errors that should remain small.
0394     % For this reason, we re-project mdelta to the tangent space.
0395     % In limited tests, it was observed that it is a good idea to project
0396     % at every iteration rather than only every k iterations, the reason
0397     % being that loss of tangency can lead to more inner iterations being
0398     % run, which leads to an overall higher computational cost.
0399     mdelta = tangent(mdelta);
0400
0401     % Update new P-norms and P-dots [CGT2000, eq. 7.5.6 & 7.5.7].
0402     e_Pd = beta*(e_Pd + alpha*d_Pd);
0403     d_Pd = z_r + beta*beta*d_Pd;
0404
0405 end  % of tCG loop
0406
0407 % If using randomized approach, compare result with the Cauchy point.
0408 % Convergence proofs assume that we achieve at least (a fraction of)
0409 % the reduction of the Cauchy point. After this if-block, either all
0410 % eta-related quantities have been changed consistently, or none of
0411 % them have changed.
0412 if options.useRand
0413     used_cauchy = false;
0414
0416
0417     % Check the curvature,
0418     Hg = getHessian(problem, x, grad, storedb, key);
0419     g_Hg = M.inner(x, grad, Hg);
0420     if g_Hg <= 0
0421         tau_c = 1;
0422     else
0423         tau_c = min( norm_grad^3/(Delta*g_Hg) , 1);
0424     end
0425     % and generate the Cauchy point.
0427     Heta_c = M.lincomb(x, -tau_c * Delta / norm_grad, Hg);
0428
0429     % Now that we have computed the Cauchy point in addition to the
0430     % returned eta, we might as well keep the best of them.
0431     mdle  = M.inner(x, grad, eta) + .5*M.inner(x, Heta,   eta);
0432     mdlec = M.inner(x, grad, eta_c) + .5*M.inner(x, Heta_c, eta_c);
0433
0434     if mdlec < mdle
0435         eta = eta_c;
0436         Heta = Heta_c; % added April 11, 2012
0437         used_cauchy = true;
0438     end
0439 end
0440
0441 printstr = sprintf('%9d   %9d   %s', j, j, stopreason_str);
0442 stats = struct('numinner', j, 'hessvecevals', j);
0443
0444 if options.useRand
0445     stats.cauchy = used_cauchy;
0446     printstr = sprintf('%9d   %9d   %11s   %s', j, j, ...
0447                 string(used_cauchy), stopreason_str);
0448 end
0449
0450 trsoutput.eta = eta;
0451 trsoutput.Heta = Heta;
0452 trsoutput.limitedbyTR = limitedbyTR;
0453 trsoutput.printstr = printstr;
0454 trsoutput.stats = stats;
0455 end```

Generated on Fri 30-Sep-2022 13:18:25 by m2html © 2005