Home > manopt > solvers > bfgs > rlbfgs.m

rlbfgs

PURPOSE ^

Riemannian limited memory BFGS solver for smooth objective functions.

SYNOPSIS ^

function [x, cost, info, options] = rlbfgs(problem, x0, options)

DESCRIPTION ^

 Riemannian limited memory BFGS solver for smooth objective functions.
 
 function [x, cost, info, options] = rlbfgs(problem)
 function [x, cost, info, options] = rlbfgs(problem, x0)
 function [x, cost, info, options] = rlbfgs(problem, x0, options)
 function [x, cost, info, options] = rlbfgs(problem, [], options)


 This is a Riemannian limited memory BFGS solver (quasi-Newton method), 
 which aims to minimize the cost function in the given problem structure.
 It requires access to the gradient of the cost function.

 Parameter options.memory can be used to specify the number of iterations
 the algorithm remembers and uses to approximate the inverse Hessian of
 the cost. Default value is 30.
 For unlimited memory, set options.memory = Inf.


 For a description of the algorithm and theorems offering convergence
 guarantees, see the references below.

 The initial iterate is x0 if it is provided. Otherwise, a random point on
 the manifold is picked. To specify options whilst not specifying an
 initial iterate, give x0 as [] (the empty matrix).

 The two outputs 'x' and 'cost' are the last reached point on the manifold
 and its cost.
 
 The output 'info' is a struct-array which contains information about the
 iterations:
   iter (integer)
       The iteration number. The initial guess is 0.
   cost (double)
       The corresponding cost value.
   gradnorm (double)
       The (Riemannian) norm of the gradient.
   time (double)
       The total elapsed time in seconds to reach the corresponding cost.
   stepsize (double)
       The size of the step from the previous to the new iterate.
   accepted (Boolean)
       true if step is accepted in the cautious update. 0 otherwise.
   And possibly additional information logged by options.statsfun.
 For example, type [info.gradnorm] to obtain a vector of the successive
 gradient norms reached at each iteration.

 The options structure is used to overwrite the default values. All
 options have a default value and are hence optional. To force an option
 value, pass an options structure with a field options.optionname, where
 optionname is one of the following and the default value is indicated
 between parentheses:

   tolgradnorm (1e-6)
       The algorithm terminates if the norm of the gradient drops below
       this. For well-scaled problems, a rule of thumb is that you can
       expect to reduce the gradient norm by 8 orders of magnitude
       (sqrt(eps)) compared to the gradient norm at a "typical" point (a
       rough initial iterate for example). Further decrease is sometimes
       possible, but inexact floating point arithmetic will eventually
       limit the final accuracy. If tolgradnorm is set too low, the
       algorithm may end up iterating forever (or at least until another
       stopping criterion triggers).
   maxiter (1000)
       The algorithm terminates if maxiter iterations were executed.
   maxtime (Inf)
       The algorithm terminates if maxtime seconds elapsed.
   minstepsize (1e-10)
     The minimum norm of the tangent vector that points from the current
     point to the next point. If the norm is less than minstepsize, the 
     program will terminate.
   memory (30)
     The number of previous iterations the program remembers. This is used 
     to approximate the inverse Hessian at the current point. Because of
     difficulty of maintaining a representation of operators in terms of
     coordinates, a recursive method is used. The number of steps in the
     recursion is at most options.memory. This parameter can take any
     integer value >= 0, or Inf, which is taken to be options.maxiter. If
     options.maxiter has value Inf, then it will take value 10000 and a
     warning will be displayed.
   strict_inc_func (@(t) 1e-4*t)
     The Cautious step needs a real function that has value 0 at t = 0,
     and  is strictly increasing. See details in Wen Huang's paper
     "A Riemannian BFGS Method without Differentiated Retraction for 
     Nonconvex Optimization Problems"
   statsfun (none)
       Function handle to a function that will be called after each
       iteration to provide the opportunity to log additional statistics.
       They will be returned in the info struct. See the generic Manopt
       documentation about solvers for further information. statsfun is
       called with the point x that was reached last.
   stopfun (none)
       Function handle to a function that will be called at each iteration
       to provide the opportunity to specify additional stopping criteria.
       See the generic Manopt documentation about solvers for further
       information.
   verbosity (2)
       Integer number used to tune the amount of output the algorithm
       generates during execution (mostly as text in the command window).
       The higher, the more output. 0 means silent. 3 and above includes a
       display of the options structure at the beginning of the execution.
   debug (false)
       Set to true to allow the algorithm to perform additional
       computations for debugging purposes. If a debugging test fails, you
       will be informed of it, usually via the command window. Be aware
       that these additional computations appear in the algorithm timings
       too, and may interfere with operations such as counting the number
       of cost evaluations, etc. (the debug calls get storedb too).
   storedepth (2)
       Maximum number of different points x of the manifold for which a
       store structure will be kept in memory in the storedb for caching.
       If memory usage is an issue, you may try to lower this number.
       Profiling may then help to investigate if a performance hit was
       incurred as a result.
   ls_initial_scale (@(gradnorm) 1/gradnorm)
       A function handle that takes as input a real number (a gradient norm)
       and outputs a real number for how to scale the initialization of
       line-search.


 Please cite the Manopt paper as well as the research paper:
 @InBook{Huang2016,
   title     = {A {R}iemannian {BFGS} Method for Nonconvex Optimization Problems},
   author    = {Huang, W. and Absil, P.-A. and Gallivan, K.A.},
   year      = {2016},
   publisher = {Springer International Publishing},
   editor    = {Karas{\"o}zen, B{\"u}lent and Manguo{\u{g}}lu, Murat and Tezer-Sezgin, M{\"u}nevver and G{\"o}ktepe, Serdar and U{\u{g}}ur, {\"O}m{\"u}r},
   address   = {Cham},
   booktitle = {Numerical Mathematics and Advanced Applications ENUMATH 2015},
   pages     = {627--634},
   doi       = {10.1007/978-3-319-39929-4_60}
 }

 We point out that, at the moment, this implementation of RLBFGS can be
 slower than the implementation in ROPTLIB by Wen Huang et al. referenced
 above. For the purpose of comparing to their work, please use their
 implementation.

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function [x, cost, info, options] = rlbfgs(problem, x0, options)
0002 % Riemannian limited memory BFGS solver for smooth objective functions.
0003 %
0004 % function [x, cost, info, options] = rlbfgs(problem)
0005 % function [x, cost, info, options] = rlbfgs(problem, x0)
0006 % function [x, cost, info, options] = rlbfgs(problem, x0, options)
0007 % function [x, cost, info, options] = rlbfgs(problem, [], options)
0008 %
0009 %
0010 % This is a Riemannian limited memory BFGS solver (quasi-Newton method),
0011 % which aims to minimize the cost function in the given problem structure.
0012 % It requires access to the gradient of the cost function.
0013 %
0014 % Parameter options.memory can be used to specify the number of iterations
0015 % the algorithm remembers and uses to approximate the inverse Hessian of
0016 % the cost. Default value is 30.
0017 % For unlimited memory, set options.memory = Inf.
0018 %
0019 %
0020 % For a description of the algorithm and theorems offering convergence
0021 % guarantees, see the references below.
0022 %
0023 % The initial iterate is x0 if it is provided. Otherwise, a random point on
0024 % the manifold is picked. To specify options whilst not specifying an
0025 % initial iterate, give x0 as [] (the empty matrix).
0026 %
0027 % The two outputs 'x' and 'cost' are the last reached point on the manifold
0028 % and its cost.
0029 %
0030 % The output 'info' is a struct-array which contains information about the
0031 % iterations:
0032 %   iter (integer)
0033 %       The iteration number. The initial guess is 0.
0034 %   cost (double)
0035 %       The corresponding cost value.
0036 %   gradnorm (double)
0037 %       The (Riemannian) norm of the gradient.
0038 %   time (double)
0039 %       The total elapsed time in seconds to reach the corresponding cost.
0040 %   stepsize (double)
0041 %       The size of the step from the previous to the new iterate.
0042 %   accepted (Boolean)
0043 %       true if step is accepted in the cautious update. 0 otherwise.
0044 %   And possibly additional information logged by options.statsfun.
0045 % For example, type [info.gradnorm] to obtain a vector of the successive
0046 % gradient norms reached at each iteration.
0047 %
0048 % The options structure is used to overwrite the default values. All
0049 % options have a default value and are hence optional. To force an option
0050 % value, pass an options structure with a field options.optionname, where
0051 % optionname is one of the following and the default value is indicated
0052 % between parentheses:
0053 %
0054 %   tolgradnorm (1e-6)
0055 %       The algorithm terminates if the norm of the gradient drops below
0056 %       this. For well-scaled problems, a rule of thumb is that you can
0057 %       expect to reduce the gradient norm by 8 orders of magnitude
0058 %       (sqrt(eps)) compared to the gradient norm at a "typical" point (a
0059 %       rough initial iterate for example). Further decrease is sometimes
0060 %       possible, but inexact floating point arithmetic will eventually
0061 %       limit the final accuracy. If tolgradnorm is set too low, the
0062 %       algorithm may end up iterating forever (or at least until another
0063 %       stopping criterion triggers).
0064 %   maxiter (1000)
0065 %       The algorithm terminates if maxiter iterations were executed.
0066 %   maxtime (Inf)
0067 %       The algorithm terminates if maxtime seconds elapsed.
0068 %   minstepsize (1e-10)
0069 %     The minimum norm of the tangent vector that points from the current
0070 %     point to the next point. If the norm is less than minstepsize, the
0071 %     program will terminate.
0072 %   memory (30)
0073 %     The number of previous iterations the program remembers. This is used
0074 %     to approximate the inverse Hessian at the current point. Because of
0075 %     difficulty of maintaining a representation of operators in terms of
0076 %     coordinates, a recursive method is used. The number of steps in the
0077 %     recursion is at most options.memory. This parameter can take any
0078 %     integer value >= 0, or Inf, which is taken to be options.maxiter. If
0079 %     options.maxiter has value Inf, then it will take value 10000 and a
0080 %     warning will be displayed.
0081 %   strict_inc_func (@(t) 1e-4*t)
0082 %     The Cautious step needs a real function that has value 0 at t = 0,
0083 %     and  is strictly increasing. See details in Wen Huang's paper
0084 %     "A Riemannian BFGS Method without Differentiated Retraction for
0085 %     Nonconvex Optimization Problems"
0086 %   statsfun (none)
0087 %       Function handle to a function that will be called after each
0088 %       iteration to provide the opportunity to log additional statistics.
0089 %       They will be returned in the info struct. See the generic Manopt
0090 %       documentation about solvers for further information. statsfun is
0091 %       called with the point x that was reached last.
0092 %   stopfun (none)
0093 %       Function handle to a function that will be called at each iteration
0094 %       to provide the opportunity to specify additional stopping criteria.
0095 %       See the generic Manopt documentation about solvers for further
0096 %       information.
0097 %   verbosity (2)
0098 %       Integer number used to tune the amount of output the algorithm
0099 %       generates during execution (mostly as text in the command window).
0100 %       The higher, the more output. 0 means silent. 3 and above includes a
0101 %       display of the options structure at the beginning of the execution.
0102 %   debug (false)
0103 %       Set to true to allow the algorithm to perform additional
0104 %       computations for debugging purposes. If a debugging test fails, you
0105 %       will be informed of it, usually via the command window. Be aware
0106 %       that these additional computations appear in the algorithm timings
0107 %       too, and may interfere with operations such as counting the number
0108 %       of cost evaluations, etc. (the debug calls get storedb too).
0109 %   storedepth (2)
0110 %       Maximum number of different points x of the manifold for which a
0111 %       store structure will be kept in memory in the storedb for caching.
0112 %       If memory usage is an issue, you may try to lower this number.
0113 %       Profiling may then help to investigate if a performance hit was
0114 %       incurred as a result.
0115 %   ls_initial_scale (@(gradnorm) 1/gradnorm)
0116 %       A function handle that takes as input a real number (a gradient norm)
0117 %       and outputs a real number for how to scale the initialization of
0118 %       line-search.
0119 %
0120 %
0121 % Please cite the Manopt paper as well as the research paper:
0122 % @InBook{Huang2016,
0123 %   title     = {A {R}iemannian {BFGS} Method for Nonconvex Optimization Problems},
0124 %   author    = {Huang, W. and Absil, P.-A. and Gallivan, K.A.},
0125 %   year      = {2016},
0126 %   publisher = {Springer International Publishing},
0127 %   editor    = {Karas{\"o}zen, B{\"u}lent and Manguo{\u{g}}lu, Murat and Tezer-Sezgin, M{\"u}nevver and G{\"o}ktepe, Serdar and U{\u{g}}ur, {\"O}m{\"u}r},
0128 %   address   = {Cham},
0129 %   booktitle = {Numerical Mathematics and Advanced Applications ENUMATH 2015},
0130 %   pages     = {627--634},
0131 %   doi       = {10.1007/978-3-319-39929-4_60}
0132 % }
0133 %
0134 % We point out that, at the moment, this implementation of RLBFGS can be
0135 % slower than the implementation in ROPTLIB by Wen Huang et al. referenced
0136 % above. For the purpose of comparing to their work, please use their
0137 % implementation.
0138 
0139 
0140 % This file is part of Manopt: www.manopt.org.
0141 % Original author: Changshuo Liu, July 19, 2017.
0142 % Contributors: Nicolas Boumal
0143 % Change log:
0144 %
0145 %   Nov. 27, 2017 (Wen Huang):
0146 %       Changed the default strict_inc_func to @(t) 1e-4*t from @(t) t.
0147 %
0148 %   Jan. 18, 2018 (NB):
0149 %       Corrected a bug related to the way the line search hint was defined
0150 %       by default.
0151 %
0152 %   Aug. 2, 2018 (NB):
0153 %       Using the new storedb.remove features to keep storedb lean, and
0154 %       reduced the default value of storedepth from 30 to 2 as a result.
0155 %
0156 %   Aug. 2, 2022 (sfrcorne):
0157 %       Added option ls_initial_scale, with default setting to ensure
0158 %       the method is invariant to positive scaling of the cost function.
0159 
0160 
0161     % Verify that the problem description is sufficient for the solver.
0162     if ~canGetCost(problem)
0163         warning('manopt:getCost', ...
0164                 'No cost provided. The algorithm will likely abort.');
0165     end
0166     if ~canGetGradient(problem) && ~canGetApproxGradient(problem)
0167         % Note: we do not give a warning if an approximate gradient is
0168         % explicitly given in the problem description, as in that case the user
0169         % seems to be aware of the issue.
0170         warning('manopt:getGradient:approx', ...
0171            ['No gradient provided. Using an FD approximation instead (slow).\n' ...
0172             'It may be necessary to increase options.tolgradnorm.\n' ...
0173             'To disable this warning: warning(''off'', ''manopt:getGradient:approx'')']);
0174         problem.approxgrad = approxgradientFD(problem);
0175     end
0176     
0177     % This solver uses linesearch_hint as a line search algorithm. By
0178     % default, try a step size of 1, so that if the BFGS approximation of
0179     % the Hessian (or inverse Hessian) is good, then the iteration is close
0180     % to a Newton step.
0181     if ~canGetLinesearch(problem)
0182         problem.linesearch = @(x, d) 1;
0183     end
0184     
0185     % Local defaults for the program
0186     localdefaults.minstepsize = 1e-10;
0187     localdefaults.maxiter = 1000;
0188     localdefaults.tolgradnorm = 1e-6;
0189     localdefaults.memory = 30;
0190     localdefaults.strict_inc_func = @(t) 1e-4*t;
0191     localdefaults.ls_max_steps = 25;
0192     localdefaults.ls_initial_scale = @(gradnorm) 1/gradnorm;
0193     localdefaults.storedepth = 2;
0194     
0195     % Merge global and local defaults, then merge w/ user options, if any.
0196     localdefaults = mergeOptions(getGlobalDefaults(), localdefaults);
0197     if ~exist('options', 'var') || isempty(options)
0198         options = struct();
0199     end
0200     options = mergeOptions(localdefaults, options);
0201     
0202     % To make sure memory in range [0, Inf)
0203     options.memory = max(options.memory, 0);
0204     if options.memory == Inf
0205         if isinf(options.maxiter)
0206             options.memory = 10000;
0207             warning('rlbfgs:memory', ['options.memory and options.maxiter' ...
0208               ' are both Inf; options.memory has been changed to 10000.']);
0209         else
0210             options.memory = options.maxiter;
0211         end
0212     end
0213     
0214     M = problem.M;
0215     
0216     
0217     timetic = tic();
0218     
0219     
0220     % Create a random starting point if no starting point is provided.
0221     if ~exist('x0', 'var')|| isempty(x0)
0222         xCur = M.rand();
0223     else
0224         xCur = x0;
0225     end
0226     
0227     % Create a store database and get a key for the current x
0228     storedb = StoreDB(options.storedepth);
0229     key = storedb.getNewKey();
0230     
0231     % __________Initialization of variables______________
0232     % Number of iterations since the last restart
0233     k = 0;  
0234     % Total number of BFGS iterations
0235     iter = 0; 
0236     
0237     % This cell stores step vectors which point from x_{t} to x_{t+1} for t
0238     % indexing the last iterations, capped at options.memory.
0239     % That is, it stores up to options.memory of the most recent step
0240     % vectors. However, the implementation below does not need step vectors
0241     % in their respective tangent spaces at x_{t}'s. Rather, it requires
0242     % them transported to the current point's tangent space by vector
0243     % transport. For details regarding the requirements on the the vector
0244     % transport, see the reference paper by Huang et al.
0245     % In this implementation, those step vectors are iteratively
0246     % transported to the current point's tangent space after every
0247     % iteration. Thus, at every iteration, vectors in sHistory are in the
0248     % current point's tangent space.
0249     sHistory = cell(1, options.memory);
0250     
0251     % This cell stores the differences for latest t's of the gradient at
0252     % x_{t+1} and the gradient at x_{t}, transported to x_{t+1}'s tangent
0253     % space. The memory is also capped at options.memory.
0254     yHistory = cell(1, options.memory);
0255     
0256     % rhoHistory{t} stores the reciprocal of the inner product between
0257     % sHistory{t} and yHistory{t}.
0258     rhoHistory = cell(1, options.memory);
0259     
0260     % Scaling of direction given by getDirection for acceptable step
0261     alpha = 1; 
0262     
0263     % Norm of the step
0264     stepsize = 1;
0265     
0266     % Stores whether the step is accepted by the cautious update check.
0267     accepted = true;
0268     
0269     % Query the cost function and its gradient
0270     [xCurCost, xCurGradient] = getCostGrad(problem, xCur, storedb, key);
0271     
0272     xCurGradNorm = M.norm(xCur, xCurGradient);
0273     
0274     % Scaling of initial matrix, Barzilai-Borwein.
0275     scaleFactor = options.ls_initial_scale(xCurGradNorm);
0276     
0277     % Line-search statistics for recording in info.
0278     lsstats = [];
0279     
0280     % Flag to control restarting scheme to avoid infinite loops (see below)
0281     ultimatum = false;
0282     
0283     % Save stats in a struct array info, and preallocate.
0284     stats = savestats();
0285     info(1) = stats;
0286     info(min(10000, options.maxiter+1)).iter = [];
0287     
0288     if options.verbosity >= 2
0289         fprintf(' iter                   cost val            grad. norm           alpha\n');
0290     end
0291     
0292     % Main iteration
0293     while true
0294 
0295         % Display iteration information
0296         if options.verbosity >= 2
0297         fprintf('%5d    %+.16e        %.8e      %.4e\n', ...
0298                 iter, xCurCost, xCurGradNorm, alpha);
0299         end
0300         
0301         % Start timing this iteration
0302         timetic = tic();
0303         
0304         % Run standard stopping criterion checks
0305         [stop, reason] = stoppingcriterion(problem, xCur, options, ...
0306                                            info, iter+1);
0307         
0308         % If none triggered, run specific stopping criterion check
0309         if ~stop 
0310             if stats.stepsize < options.minstepsize
0311                 % To avoid infinite loop and to push the search further
0312                 % in case BFGS approximation of Hessian is off towards
0313                 % the end, we erase the memory by setting k = 0;
0314                 % In this way, it starts off like a steepest descent.
0315                 % If even steepest descent does not work, then it is
0316                 % hopeless and we will terminate.
0317                 if ~ultimatum
0318                     if options.verbosity >= 2
0319                         fprintf(['stepsize is too small, restarting ' ...
0320                             'the bfgs procedure at the current point.\n']);
0321                     end
0322                     k = 0;
0323                     ultimatum = true;
0324                 else
0325                     stop = true;
0326                     reason = sprintf(['Last stepsize smaller than '  ...
0327                         'minimum allowed; options.minstepsize = %g.'], ...
0328                         options.minstepsize);
0329                 end
0330             else
0331                 % We are not in trouble: lift the ultimatum if it was on.
0332                 ultimatum = false;
0333             end
0334         end  
0335         
0336         if stop
0337             if options.verbosity >= 1
0338                 fprintf([reason '\n']);
0339             end
0340             break;
0341         end
0342 
0343         
0344         % Compute BFGS direction
0345         p = getDirection(M, xCur, xCurGradient, sHistory,...
0346                 yHistory, rhoHistory, scaleFactor, min(k, options.memory));
0347 
0348         % Execute line-search
0349         [stepsize, xNext, newkey, lsstats] = ...
0350             linesearch_hint(problem, xCur, p, xCurCost, ...
0351                             M.inner(xCur, xCurGradient, p), ...
0352                             options, storedb, key);
0353         
0354         % Record the BFGS step-multiplier alpha which was effectively
0355         % selected. Toward convergence, we hope to see alpha = 1.
0356         alpha = stepsize/M.norm(xCur, p);
0357         step = M.lincomb(xCur, alpha, p);
0358         
0359         
0360         % Query cost and gradient at the candidate new point.
0361         [xNextCost, xNextGrad] = getCostGrad(problem, xNext, storedb, newkey);
0362         
0363         % Compute sk and yk
0364         sk = M.transp(xCur, xNext, step);
0365         yk = M.lincomb(xNext, 1, xNextGrad, ...
0366                              -1, M.transp(xCur, xNext, xCurGradient));
0367 
0368         % Computation of the BFGS step is invariant under scaling of sk and
0369         % yk by a common factor. For numerical reasons, we scale sk and yk
0370         % so that sk is a unit norm vector.
0371         norm_sk = M.norm(xNext, sk);
0372         sk = M.lincomb(xNext, 1/norm_sk, sk);
0373         yk = M.lincomb(xNext, 1/norm_sk, yk);
0374         
0375         inner_sk_yk = M.inner(xNext, sk, yk);
0376         inner_sk_sk = M.norm(xNext, sk)^2;    % ensures nonnegativity
0377         
0378         
0379         % If the cautious step is accepted (which is the intended
0380         % behavior), we record sk, yk and rhok and need to do some
0381         % housekeeping. If the cautious step is rejected, these are not
0382         % recorded. In all cases, xNext is the next iterate: the notion of
0383         % accept/reject here is limited to whether or not we keep track of
0384         % sk, yk, rhok to update the BFGS operator.
0385         cap = options.strict_inc_func(xCurGradNorm);
0386         if inner_sk_sk ~= 0 && (inner_sk_yk / inner_sk_sk) >= cap
0387             
0388             accepted = true;
0389             
0390             rhok = 1/inner_sk_yk;
0391             
0392             scaleFactor = inner_sk_yk / M.norm(xNext, yk)^2;
0393             
0394             % Time to store the vectors sk, yk and the scalar rhok.
0395             % Remember: we need to transport all vectors to the most
0396             % current tangent space.
0397             
0398             % If we are out of memory
0399             if k >= options.memory
0400                 
0401                 % sk and yk are saved from 1 to the end with the most
0402                 % current recorded to the rightmost hand side of the cells
0403                 % that are occupied. When memory is full, do a shift so
0404                 % that the rightmost is earliest and replace it with the
0405                 % most recent sk, yk.
0406                 for  i = 2 : options.memory
0407                     sHistory{i} = M.transp(xCur, xNext, sHistory{i});
0408                     yHistory{i} = M.transp(xCur, xNext, yHistory{i});
0409                 end
0410                 if options.memory > 1
0411                     sHistory = sHistory([2:end, 1]);
0412                     yHistory = yHistory([2:end, 1]);
0413                     rhoHistory = rhoHistory([2:end 1]);
0414                 end
0415                 if options.memory > 0
0416                     sHistory{options.memory} = sk;
0417                     yHistory{options.memory} = yk;
0418                     rhoHistory{options.memory} = rhok;
0419                 end
0420                 
0421             % If we are not out of memory
0422             else
0423                 
0424                 for i = 1:k
0425                     sHistory{i} = M.transp(xCur, xNext, sHistory{i});
0426                     yHistory{i} = M.transp(xCur, xNext, yHistory{i});
0427                 end
0428                 sHistory{k+1} = sk;
0429                 yHistory{k+1} = yk;
0430                 rhoHistory{k+1} = rhok;
0431                 
0432             end
0433             
0434             k = k + 1;
0435             
0436         % The cautious step is rejected: we do not store sk, yk, rhok but
0437         % we still need to transport stored vectors to the new tangent
0438         % space.
0439         else
0440             
0441             accepted = false;
0442             
0443             for  i = 1 : min(k, options.memory)
0444                 sHistory{i} = M.transp(xCur, xNext, sHistory{i});
0445                 yHistory{i} = M.transp(xCur, xNext, yHistory{i});
0446             end
0447             
0448         end
0449         
0450         % Update variables to new iterate.
0451         storedb.removefirstifdifferent(key, newkey);
0452         xCur = xNext;
0453         key = newkey;
0454         xCurGradient = xNextGrad;
0455         xCurGradNorm = M.norm(xNext, xNextGrad);
0456         xCurCost = xNextCost;
0457         
0458         % iter is the number of iterations we have accomplished.
0459         iter = iter + 1;
0460         
0461         % Make sure we don't use too much memory for the store database
0462         % (this is independent from the BFGS memory.)
0463         storedb.purge();
0464         
0465         
0466         % Log statistics for freshly executed iteration
0467         stats = savestats();
0468         info(iter+1) = stats; 
0469         
0470     end
0471 
0472     
0473     % Housekeeping before we return
0474     info = info(1:iter+1);
0475     x = xCur;
0476     cost = xCurCost;
0477 
0478     if options.verbosity >= 1
0479         fprintf('Total time is %f [s] (excludes statsfun)\n', ...
0480                 info(end).time);
0481     end
0482 
0483     
0484     % Routine in charge of collecting the current iteration stats
0485     function stats = savestats()
0486         stats.iter = iter;
0487         stats.cost = xCurCost;
0488         stats.gradnorm = xCurGradNorm;
0489         if iter == 0
0490             stats.stepsize = NaN;
0491             stats.time = toc(timetic);
0492             stats.accepted = NaN;
0493         else
0494             stats.stepsize = stepsize;
0495             stats.time = info(iter).time + toc(timetic);
0496             stats.accepted = accepted;
0497         end
0498         stats.linesearch = lsstats;
0499         stats = applyStatsfun(problem, xCur, storedb, key, options, stats);
0500     end
0501 
0502 end
0503 
0504 
0505 
0506 
0507 % BFGS step, see Wen's paper for details. This function takes in a tangent
0508 % vector g, and applies an approximate inverse Hessian P to it to get Pg.
0509 % Then, -Pg is returned.
0510 %
0511 % Theory requires the vector transport to be isometric and to satisfy the
0512 % locking condition (see paper), but these properties do not seem to be
0513 % crucial in practice. If your manifold provides M.isotransp, it may be
0514 % good to do M.transp = M.isotransp; after loading M with a factory.
0515 %
0516 % This implementation operates in the tangent space of the most recent
0517 % point since all vectors in sHistory and yHistory have been transported
0518 % there.
0519 function dir = getDirection(M, xCur, xCurGradient, sHistory, yHistory, ...
0520                             rhoHistory, scaleFactor, k)
0521     
0522     q = xCurGradient;
0523     
0524     inner_s_q = zeros(1, k);
0525     
0526     for i = k : -1 : 1
0527         inner_s_q(1, i) = rhoHistory{i} * M.inner(xCur, sHistory{i}, q);
0528         q = M.lincomb(xCur, 1, q, -inner_s_q(1, i), yHistory{i});
0529     end
0530     
0531     r = M.lincomb(xCur, scaleFactor, q);
0532     
0533     for i = 1 : k
0534          omega = rhoHistory{i} * M.inner(xCur, yHistory{i}, r);
0535          r = M.lincomb(xCur, 1, r, inner_s_q(1, i)-omega, sHistory{i});
0536     end
0537     
0538     dir = M.lincomb(xCur, -1, r);
0539     
0540 end

Generated on Fri 30-Sep-2022 13:18:25 by m2html © 2005