Home > manopt > solvers > arc > arc_lanczos.m

# arc_lanczos

## PURPOSE Subproblem solver for ARC based on a Lanczos process.

## SYNOPSIS function [eta, Heta, hesscalls, stop_str, stats] = arc_lanczos(problem, x, grad, gradnorm, sigma, options, storedb, key)

## DESCRIPTION ``` Subproblem solver for ARC based on a Lanczos process.

[eta, Heta, hesscalls, stop_str, stats] =

This routine approximately solves the following problem:

min_{eta in T_x M}  m(eta),  where

m(eta) = <eta, g> + .5 <eta, H[eta]> + (sigma/3) ||eta||^3

where eta is a tangent vector at x on the manifold given by problem.M,
g = grad is a tangent vector at x, H[eta] is the result of applying the
Hessian of the problem at x along eta and the inner product and norm
are those from the Riemannian structure on the tangent space T_x M.

The solve is approximate in the sense that the returned s only ought to
satisfy the following conditions:

||gradient of m at s|| <= theta*||s||^2   and   m(s) <= m(0),

where theta is specified in options.theta (see below for default value.)
Since the gradient of the model at 0 is g, if it is zero, then s = 0 is
returned. This is the only scenario where s = 0 is returned.

Numerical errors can perturb the described expected behavior.

Inputs:
problem: Manopt optimization problem structure
x: point on the manifold problem.M
sigma: cubic regularization parameter (positive scalar)
options: structure containing options for the subproblem solver
storedb, key: caching data for problem at x

Options specific to this subproblem solver:
theta (.5)
Stopping criterion parameter for subproblem solver: the gradient of
the model at the returned step should have norm no more than theta
times the squared norm of the step.
maxinner (M.dim())
Maximum number of iterations of the Lanczos process, which is nearly
the same as the maximum number of calls to the Hessian.
maxiter_newton (100)
Maximum number of iterations of the Newton root finder to solve each
tridiagonal cubic problem.
tol_newton (1e-16)
Tolerance for the Newton root finder.

Outputs:
eta: approximate solution to the cubic regularized subproblem at x
Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it
is often naturally available to the subproblem solver at the
end of execution, so that it may be cheaper to return it here.
hesscalls: number of Hessian calls during execution
stop_str: string describing why the subsolver stopped
stats: a structure specifying some statistics about inner work

## CROSS-REFERENCE INFORMATION This function calls:
• getHessian Computes the Hessian of the cost function at x along d.
• mergeOptions Merges two options structures with one having precedence over the other.
• minimize_cubic_newton Minimize a cubicly regularized quadratic via Newton root finding.
• lincomb Computes a linear combination of tangent vectors in the Manopt framework.
This function is called by:

## SOURCE CODE ```0001 function [eta, Heta, hesscalls, stop_str, stats] = arc_lanczos(problem, x, grad, gradnorm, sigma, options, storedb, key)
0002 % Subproblem solver for ARC based on a Lanczos process.
0003 %
0004 % [eta, Heta, hesscalls, stop_str, stats] =
0006 %
0007 % This routine approximately solves the following problem:
0008 %
0009 %   min_{eta in T_x M}  m(eta),  where
0010 %
0011 %       m(eta) = <eta, g> + .5 <eta, H[eta]> + (sigma/3) ||eta||^3
0012 %
0013 % where eta is a tangent vector at x on the manifold given by problem.M,
0014 % g = grad is a tangent vector at x, H[eta] is the result of applying the
0015 % Hessian of the problem at x along eta and the inner product and norm
0016 % are those from the Riemannian structure on the tangent space T_x M.
0017 %
0018 % The solve is approximate in the sense that the returned s only ought to
0019 % satisfy the following conditions:
0020 %
0021 %     ||gradient of m at s|| <= theta*||s||^2   and   m(s) <= m(0),
0022 %
0023 % where theta is specified in options.theta (see below for default value.)
0024 % Since the gradient of the model at 0 is g, if it is zero, then s = 0 is
0025 % returned. This is the only scenario where s = 0 is returned.
0026 %
0027 % Numerical errors can perturb the described expected behavior.
0028 %
0029 % Inputs:
0030 %     problem: Manopt optimization problem structure
0031 %     x: point on the manifold problem.M
0032 %     grad: gradient of the cost function of the problem at x
0033 %     gradnorm: norm of the gradient, often available to the caller
0034 %     sigma: cubic regularization parameter (positive scalar)
0035 %     options: structure containing options for the subproblem solver
0036 %     storedb, key: caching data for problem at x
0037 %
0038 % Options specific to this subproblem solver:
0039 %   theta (.5)
0040 %     Stopping criterion parameter for subproblem solver: the gradient of
0041 %     the model at the returned step should have norm no more than theta
0042 %     times the squared norm of the step.
0043 %   maxinner (M.dim())
0044 %     Maximum number of iterations of the Lanczos process, which is nearly
0045 %     the same as the maximum number of calls to the Hessian.
0046 %   maxiter_newton (100)
0047 %     Maximum number of iterations of the Newton root finder to solve each
0048 %     tridiagonal cubic problem.
0049 %   tol_newton (1e-16)
0050 %     Tolerance for the Newton root finder.
0051 %
0052 % Outputs:
0053 %     eta: approximate solution to the cubic regularized subproblem at x
0054 %     Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it
0055 %           is often naturally available to the subproblem solver at the
0056 %           end of execution, so that it may be cheaper to return it here.
0057 %     hesscalls: number of Hessian calls during execution
0058 %     stop_str: string describing why the subsolver stopped
0059 %     stats: a structure specifying some statistics about inner work
0060 %
0062
0063 % This file is part of Manopt: www.manopt.org.
0064 % Original authors: May 1, 2018,
0065 %    Naman Agarwal, Brian Bullins, Nicolas Boumal and Coralia Cartis.
0066 % Contributors:
0067 % Change log:
0068 %   Aug 16, 2019 (NB):
0069 %       Default value for theta changed from 50 to 0.5.
0070 %       Option maxiter_lanczos renamed to maxinner to match trustregions.
0071
0072 % TODO: think whether we can save the Lanczos basis in the storedb at the
0073 % given key in case we get a rejection, and simply "start where we left
0074 % off" with the updated sigma.
0075
0076 % TODO: Lanczos is notoriously numerically unstable, with loss of
0077 % orthogonality being a main hurdle. Look into the literature (Paige,
0078 % Parlett), for possible numerical fixes.
0079
0080     % Some shortcuts
0081     M = problem.M;
0082     n = M.dim();
0083     inner = @(u, v) M.inner(x, u, v);
0084     rnorm = @(u) M.norm(x, u);
0085     tangent = @(u) problem.M.tangent(x, u);
0086     Hess = @(u) getHessian(problem, x, u, storedb, key);
0087
0088     % Counter for Hessian calls issued
0089     hesscalls = 0;
0090
0091     % If the gradient has norm zero, return a zero step
0093         eta = M.zerovec(x);
0094         Heta = eta;
0095         stop_str = 'Cost gradient is zero';
0096         stats = struct('newton_iterations', 0, 'gradnorms', 0, 'func_values', 0);
0097         return;
0098     end
0099
0100     % Set local defaults here
0101     localdefaults.theta = .5;
0102     localdefaults.maxinner = n;
0103     % The following are here for the Newton solver called below
0104     localdefaults.maxiter_newton = 100;
0105     localdefaults.tol_newton = 1e-16;
0106
0107     % Merge local defaults with user options, if any
0108     if ~exist('options', 'var') || isempty(options)
0109         options = struct();
0110     end
0111     options = mergeOptions(localdefaults, options);
0112
0113     % Vectors where we keep track of the Newton root finder's work, the
0114     % gradient norm, and the function values at each inner iteration
0115     newton_iterations = zeros(n, 1);
0117     func_values = zeros(n, 1);
0118
0119     % Lanczos iteratively produces an orthonormal basis of tangent vectors
0120     % which tridiagonalize the Hessian. The corresponding tridiagonal
0121     % matrix is preallocated here as a sparse matrix.
0122     T = spdiags(ones(n, 3), -1:1, n, n);
0123
0124     % The orthonormal basis (n tangent vectors at x) is stored in this cell
0125     Q = cell(n, 1);
0126
0127     % Initialize Lanczos along the gradient direction (it is nonzero)
0129     Q{1} = q;
0130     Hq = Hess(q);
0131     hesscalls = hesscalls + 1;
0132     alpha = inner(Hq, q);
0133     T(1, 1) = alpha;
0134     Hq_perp = M.lincomb(x, 1, Hq, -alpha, q);
0135
0136     % Minimizing the cubic restricted to the one-dimensional space spanned
0137     % by Q{1} is easy: it amounts to minimizing a univariate cubic. Indeed,
0138     % with eta = y*q where y is a scalar, we minimize (since g = ||g||q):
0139     %  h(y) = <y*q, g> + .5 <y*q, H[y*q]> + (sigma/3) ||y*q||^3
0140     %       = ||g||*y + .5*alpha*y^2 + (sigma/3) |y|^3.
0141     % The sign of y affects only the linear term, hence it is clear we need
0142     % to pick y nonpositive. In that case, h becomes a cubic polynomial:
0143     %  h(y) = ||g||*y + .5*alpha*y^2 - (sigma/3) y^3
0144     % The derivative is a quadratic polynomial:
0145     %  h'(y) = ||g|| + alpha*y - sigma*y^2.
0146     % Since ||g|| and sigma are positive, h' has two real roots, one
0147     % positive and one negative (strictly). The negative root is the only
0148     % root of interest. It necessarily identifies a minimizer since
0149     % h(0) = 0, h(-inf) = inf and h'(0) > 0.
0150     %
0151     % We take the real part only to be safe.
0152     y = min(real(roots([-sigma, alpha, gradnorm])));
0153
0154     % Main Lanczos iteration
0156     for j = 1 : min(options.maxinner, n) - 1
0157
0158         % Knowing that j Lanczos steps have been executed completely, now
0159         % execute the j+1st step to produce Q{j+1} and populate the
0160         % tridiagonal matrix T over the whole principal submatrix of size
0161         % j+1. This involves one Hessian call.
0162         %
0163         % In effect, we are computing this one step ahead. The reason is
0164         % that this makes it cheaper to compute the norm of the gradient of
0165         % the model, which is needed to check the stopping criterion (see
0166         % below).
0167         beta = rnorm(Hq_perp);
0168         % TODO: Figure out a sensible relative threshold
0169         if beta > 1e-12
0170             q = M.lincomb(x, 1/beta, Hq_perp);
0171         else
0172             % It appears the Krylov space maxed out (Hq is very nearly in
0173             % the space spanned by the existing Lanczos vectors). In order
0174             % to continue, the standard procedure is to generate a random
0175             % vector, and to orthogonalize it against the current basis.
0176             % This event is supposed to be rare.
0177             v = M.randvec(x);
0178             % Orthogonalize in the style of a modified Gram-Schmidt.
0179             for k = 1 : j
0180                 v = M.lincomb(x, 1, v, -inner(v, Q{k}), Q{k});
0181             end
0182             q = M.lincomb(x, 1/rnorm(v), v);
0183         end
0184
0185         q = tangent(q);
0186
0187         Hq = Hess(q);
0188         hesscalls = hesscalls + 1;
0189         Hqm = M.lincomb(x, 1, Hq, -beta, Q{j});
0190         % In exact arithmetic, alpha = <Hq, q>. Doing the computations in
0191         % this order amounts to a modified GS, which may help numerically.
0192         alpha = inner(Hqm, q);
0193         Hq_perp = M.lincomb(x, 1, Hqm, -alpha, q);
0194         Q{j+1} = q;
0195         T(j, j+1) = beta;     %#ok<SPRIX>
0196         T(j+1, j) = beta;     %#ok<SPRIX>
0197         T(j+1, j+1) = alpha;  %#ok<SPRIX>
0198         % End of the Lanczos procedure for step j.
0199
0200         % Computing the norm of the gradient of the model at the computed
0201         % step 'Qy' (linear combination of the Q's with coefficients y.)
0202         % We actually compute the norm of a vector of coordinates for the
0203         % gradient of the model in the basis Q{1}, ..., Q{j+1}.
0205                               T(1:j+1, 1:j)*y + ...
0206                               sigma*norm(y)*[y ; 0]);
0208         func_values(j) = y(1)*gradnorm + 0.5 * dot(y, T(1:j, 1:j)*y) + (sigma/3) * norm(y)^3;
0209
0210         if options.verbosity >= 4
0212         end
0213
0214         % Check termination condition
0216             stop_str = 'Model grad norm condition satisfied';
0218             break;
0219         end
0220
0221         % Minimize the cubic model restricted to the subspace spanned by
0222         % the available Lanczos vectors. In its current form, this solver
0223         % cannot reuse prior work from earlier Lanczos iterations: this may
0224         % be a point to improve.
0225         [y, newton_iter] = minimize_cubic_newton(T(1:j+1, 1:j+1), ...
0227         newton_iterations(j+1) = newton_iter;
0228     end
0229
0230     % Check why we stopped iterating
0231     points = numel(y);
0233         stop_str = sprintf(['Reached max number of Lanczos iterations ' ...
0234                '(options.maxinner = %d)'], options.maxinner);
0235         points = points - 1;
0236     end
0237
0238     % Construct the tangent vector eta as a linear combination of the basis
0239     % vectors and make sure the result is tangent up to numerical accuracy.
0240     eta = lincomb(M, x, Q(1:numel(y)), y);
0241     eta = tangent(eta);
0242     % We could easily return the norm of eta as the norm of the coefficient
0243     % vector y here, but numerical errors might accumulate.
0244
0245     % In principle we could avoid this call by computing an appropriate
0246     % linear combination of available vectors. For now at least, we favor
0247     % this numerically safer approach.
0248     Heta = Hess(eta);
0249     hesscalls = hesscalls + 1;
0250
0251     stats = struct('newton_iterations', newton_iterations(1:points), ...