Home > manopt > solvers > arc > arc_lanczos.m

arc_lanczos

PURPOSE ^

Subproblem solver for ARC based on a Lanczos process.

SYNOPSIS ^

function [eta, Heta, hesscalls, stop_str, stats] = arc_lanczos(problem, x, grad, gradnorm, sigma, options, storedb, key)

DESCRIPTION ^

 Subproblem solver for ARC based on a Lanczos process.

 [eta, Heta, hesscalls, stop_str, stats] = 
     arc_lanczos(problem, x, grad, gradnorm, sigma, options, storedb, key)

 This routine approximately solves the following problem:

   min_{eta in T_x M}  m(eta),  where

       m(eta) = <eta, g> + .5 <eta, H[eta]> + (sigma/3) ||eta||^3

 where eta is a tangent vector at x on the manifold given by problem.M,
 g = grad is a tangent vector at x, H[eta] is the result of applying the
 Hessian of the problem at x along eta and the inner product and norm
 are those from the Riemannian structure on the tangent space T_x M.

 The solve is approximate in the sense that the returned s only ought to
 satisfy the following conditions:

     ||gradient of m at s|| <= theta*||s||^2   and   m(s) <= m(0),

 where theta is specified in options.theta (see below for default value.)
 Since the gradient of the model at 0 is g, if it is zero, then s = 0 is
 returned. This is the only scenario where s = 0 is returned.

 Numerical errors can perturb the described expected behavior.

 Inputs:
     problem: Manopt optimization problem structure
     x: point on the manifold problem.M
     grad: gradient of the cost function of the problem at x
     gradnorm: norm of the gradient, often available to the caller
     sigma: cubic regularization parameter (positive scalar)
     options: structure containing options for the subproblem solver
     storedb, key: caching data for problem at x

 Options specific to this subproblem solver:
   theta (50)
     Stopping criterion parameter for subproblem solver: the gradient of
     the model at the returned step should have norm no more than theta
     times the squared norm of the step.
   maxiter_lanczos (M.dim())
     Maximum number of iterations of the Lanczos process, which is nearly
     the same as the maximum number of calls to the Hessian.
   maxiter_newton (100)
     Maximum number of iterations of the Newton root finder to solve each
     tridiagonal cubic problem.
   tol_newton (1e-16)
     Tolerance for the Newton root finder.

 Outputs:
     eta: approximate solution to the cubic regularized subproblem at x
     Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it
           is often naturally available to the subproblem solver at the
           end of execution, so that it may be cheaper to return it here.
     hesscalls: number of Hessian calls during execution
     stop_str: string describing why the subsolver stopped
     stats: a structure specifying some statistics about inner work

 See also: arc minimize_cubic_newton

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SOURCE CODE ^

0001 function [eta, Heta, hesscalls, stop_str, stats] = arc_lanczos(problem, x, grad, gradnorm, sigma, options, storedb, key)
0002 % Subproblem solver for ARC based on a Lanczos process.
0003 %
0004 % [eta, Heta, hesscalls, stop_str, stats] =
0005 %     arc_lanczos(problem, x, grad, gradnorm, sigma, options, storedb, key)
0006 %
0007 % This routine approximately solves the following problem:
0008 %
0009 %   min_{eta in T_x M}  m(eta),  where
0010 %
0011 %       m(eta) = <eta, g> + .5 <eta, H[eta]> + (sigma/3) ||eta||^3
0012 %
0013 % where eta is a tangent vector at x on the manifold given by problem.M,
0014 % g = grad is a tangent vector at x, H[eta] is the result of applying the
0015 % Hessian of the problem at x along eta and the inner product and norm
0016 % are those from the Riemannian structure on the tangent space T_x M.
0017 %
0018 % The solve is approximate in the sense that the returned s only ought to
0019 % satisfy the following conditions:
0020 %
0021 %     ||gradient of m at s|| <= theta*||s||^2   and   m(s) <= m(0),
0022 %
0023 % where theta is specified in options.theta (see below for default value.)
0024 % Since the gradient of the model at 0 is g, if it is zero, then s = 0 is
0025 % returned. This is the only scenario where s = 0 is returned.
0026 %
0027 % Numerical errors can perturb the described expected behavior.
0028 %
0029 % Inputs:
0030 %     problem: Manopt optimization problem structure
0031 %     x: point on the manifold problem.M
0032 %     grad: gradient of the cost function of the problem at x
0033 %     gradnorm: norm of the gradient, often available to the caller
0034 %     sigma: cubic regularization parameter (positive scalar)
0035 %     options: structure containing options for the subproblem solver
0036 %     storedb, key: caching data for problem at x
0037 %
0038 % Options specific to this subproblem solver:
0039 %   theta (50)
0040 %     Stopping criterion parameter for subproblem solver: the gradient of
0041 %     the model at the returned step should have norm no more than theta
0042 %     times the squared norm of the step.
0043 %   maxiter_lanczos (M.dim())
0044 %     Maximum number of iterations of the Lanczos process, which is nearly
0045 %     the same as the maximum number of calls to the Hessian.
0046 %   maxiter_newton (100)
0047 %     Maximum number of iterations of the Newton root finder to solve each
0048 %     tridiagonal cubic problem.
0049 %   tol_newton (1e-16)
0050 %     Tolerance for the Newton root finder.
0051 %
0052 % Outputs:
0053 %     eta: approximate solution to the cubic regularized subproblem at x
0054 %     Heta: Hess f(x)[eta] -- this is necessary in the outer loop, and it
0055 %           is often naturally available to the subproblem solver at the
0056 %           end of execution, so that it may be cheaper to return it here.
0057 %     hesscalls: number of Hessian calls during execution
0058 %     stop_str: string describing why the subsolver stopped
0059 %     stats: a structure specifying some statistics about inner work
0060 %
0061 % See also: arc minimize_cubic_newton
0062 
0063 % This file is part of Manopt: www.manopt.org.
0064 % Original authors: May 1, 2018,
0065 %    Naman Agarwal, Brian Bullins, Nicolas Boumal and Coralia Cartis.
0066 % Contributors:
0067 % Change log:
0068 
0069 % TODO: think whether we can save the Lanczos basis in the storedb at the
0070 % given key in case we get a rejection, and simply "start where we left
0071 % off" with the updated sigma.
0072 
0073 % TODO: Lanczos is notoriously numerically unstable, with loss of
0074 % orthogonality being a main hurdle. Look into the literature (Paige,
0075 % Parlett), for possible numerical fixes.
0076 
0077     % Some shortcuts
0078     M = problem.M;
0079     n = M.dim();
0080     inner = @(u, v) M.inner(x, u, v);
0081     rnorm = @(u) M.norm(x, u);
0082     tangent = @(u) problem.M.tangent(x, u);
0083     Hess = @(u) getHessian(problem, x, u, storedb, key);
0084     
0085     % Counter for Hessian calls issued
0086     hesscalls = 0;
0087     
0088     % If the gradient has norm zero, return a zero step
0089     if gradnorm == 0
0090         eta = M.zerovec(x);
0091         Heta = eta;
0092         stop_str = 'Cost gradient is zero';
0093         stats = struct('newton_iterations', 0);
0094         return;
0095     end
0096     
0097     % Set local defaults here
0098     localdefaults.theta = 50;
0099     localdefaults.maxiter_lanczos = n;
0100     % The following are here for the Newton solver called below
0101     localdefaults.maxiter_newton = 100;
0102     localdefaults.tol_newton = 1e-16;
0103     
0104     % Merge local defaults with user options, if any
0105     if ~exist('options', 'var') || isempty(options)
0106         options = struct();
0107     end
0108     options = mergeOptions(localdefaults, options);
0109     
0110     % Vector where we keep track of the Newton root finder's work
0111     newton_iterations = zeros(n, 1);
0112     
0113     % Lanczos iteratively produces an orthonormal basis of tangent vectors
0114     % which tridiagonalize the Hessian. The corresponding tridiagonal
0115     % matrix is preallocated here as a sparse matrix.
0116     T = spdiags(ones(n, 3), -1:1, n, n);
0117     
0118     % The orthonormal basis (n tangent vectors at x) is stored in this cell
0119     Q = cell(n, 1);
0120     
0121     % Initialize Lanczos along the gradient direction (it is nonzero)
0122     q = M.lincomb(x, 1/gradnorm, grad);
0123     Q{1} = q;
0124     Hq = Hess(q);
0125     hesscalls = hesscalls + 1;
0126     alpha = inner(Hq, q);
0127     T(1, 1) = alpha;
0128     Hq_perp = M.lincomb(x, 1, Hq, -alpha, q);
0129     
0130     % Minimizing the cubic restricted to the one-dimensional space spanned
0131     % by Q{1} is easy: it amounts to minimizing a univariate cubic. Indeed,
0132     % with eta = y*q where y is a scalar, we minimize (since g = ||g||q):
0133     %  h(y) = <y*q, g> + .5 <y*q, H[y*q]> + (sigma/3) ||y*q||^3
0134     %       = ||g||*y + .5*alpha*y^2 + (sigma/3) |y|^3.
0135     % The sign of y affects only the linear term, hence it is clear we need
0136     % to pick y nonpositive. In that case, h becomes a cubic polynomial:
0137     %  h(y) = ||g||*y + .5*alpha*y^2 - (sigma/3) y^3
0138     % The derivative is a quadratic polynomial:
0139     %  h'(y) = ||g|| + alpha*y - sigma*y^2.
0140     % Since ||g|| and sigma are positive, h' has two real roots, one
0141     % posivite and one negative (strictly). The negative root is the only
0142     % root of interest. It necessarily identifies a minimizer since
0143     % h(0) = 0, h(-inf) = inf and h'(0) > 0.
0144     %
0145     % We take the real part only to be safe.
0146     y = min(real(roots([-sigma, alpha, gradnorm])));
0147     
0148     
0149     % Main Lanczos iteration
0150     gradnorm_reached = false;
0151     for j = 1 : min(options.maxiter_lanczos, n) - 1
0152 
0153         % Knowing that j Lanczos steps have been executed completely, now
0154         % execute the j+1st step to produce Q{j+1} and populate the
0155         % tridiagonal matrix T over the whole principal submatrix of size
0156         % j+1. This involves one Hessian call.
0157         %
0158         % In effect, we are computing this one step ahead. The reason is
0159         % that this makes it cheaper to compute the norm of the gradient of
0160         % the model, which is needed to check the stopping criterion (see
0161         % below).
0162         beta = rnorm(Hq_perp);
0163         % TODO: Figure out a sensible relative threshold
0164         if beta > 1e-12
0165             q = M.lincomb(x, 1/beta, Hq_perp);
0166         else
0167             % It appears the Krylov space maxed out (Hq is very nearly in
0168             % the space spanned by the existing Lanczos vectors). In order
0169             % to continue, the standard procedure is to generate a random
0170             % vector, and to orthogonalize it against the current basis.
0171             % This event is supposed to be rare.
0172             v = M.randvec(x);
0173             % Orthogonalize in the style of a modified Gram-Schmidt.
0174             for k = 1 : j
0175                 v = M.lincomb(x, 1, v, -inner(v, Q{k}), Q{k});
0176             end
0177             q = M.lincomb(x, 1/rnorm(v), v);
0178         end
0179         Hq = Hess(q);
0180         hesscalls = hesscalls + 1;
0181         Hqm = M.lincomb(x, 1, Hq, -beta, Q{j});
0182         % In exact arithmetic, alpha = <Hq, q>. Doing the computations in
0183         % this order amounts to a modified GS, which may help numerically.
0184         alpha = inner(Hqm, q);
0185         Hq_perp = M.lincomb(x, 1, Hqm, -alpha, q);
0186         Q{j+1} = q;
0187         T(j, j+1) = beta;     %#ok<SPRIX>
0188         T(j+1, j) = beta;     %#ok<SPRIX>
0189         T(j+1, j+1) = alpha;  %#ok<SPRIX>
0190         % End of the Lanczos procedure for step j.
0191 
0192         % Computing the norm of the gradient of the model at the computed
0193         % step 'Qy' (linear combination of the Q's with coefficients y.)
0194         % We actually compute the norm of a vector of coordinates for the
0195         % gradient of the model in the basis Q{1}, ..., Q{j+1}.
0196         model_gradnorm = norm(gradnorm*eye(j+1, 1) + ...
0197                               T(1:j+1, 1:j)*y + ...
0198                               sigma*norm(y)*[y ; 0]);
0199 
0200         if options.verbosity >= 4
0201             fprintf('\nModel grad norm %.16e', model_gradnorm);
0202         end
0203         
0204         % Check termination condition
0205         if model_gradnorm <= options.theta*norm(y)^2
0206             stop_str = 'Model grad norm condition satisfied';
0207             gradnorm_reached = true;
0208             break;
0209         end
0210         
0211         % Minimize the cubic model restricted to the subspace spanned by
0212         % the available Lanczos vectors. In its current form, this solver
0213         % cannot reuse prior work from earlier Lanczos iterations: this may
0214         % be a point to improve.
0215         [y, newton_iter] = minimize_cubic_newton(T(1:j+1, 1:j+1), ...
0216                                      gradnorm*eye(j+1, 1), sigma, options);
0217         newton_iterations(j+1) = newton_iter;
0218         
0219     end
0220     
0221     % Check why we stopped iterating
0222     if ~gradnorm_reached
0223         stop_str = sprintf(['Reached max number of Lanczos iterations ' ...
0224                '(options.maxiter_lanczos = %d)'], options.maxiter_lanczos);
0225     end
0226     
0227     % Construct the tangent vector eta as a linear combination of the basis
0228     % vectors and make sure the result is tangent up to numerical accuracy.
0229     eta = lincomb(M, x, Q(1:numel(y)), y);
0230     eta = tangent(eta);
0231     % We could easily return the norm of eta as the norm of the coefficient
0232     % vector y here, but numerical errors might accumulate.
0233     
0234     % In principle we could avoid this call by computing an appropriate
0235     % linear combination of available vectors. For now at least, we favor
0236     % this numerically safer approach.
0237     Heta = Hess(eta);
0238     hesscalls = hesscalls + 1;
0239     
0240     stats = struct('newton_iterations', newton_iterations(1:numel(y)));
0241     
0242     if options.verbosity >= 4
0243         fprintf('\n');
0244     end
0245         
0246 end

Generated on Mon 10-Sep-2018 11:48:06 by m2html © 2005