Riemannian limited memory BFGS solver for smooth objective functions. function [x, cost, info, options] = rlbfgs(problem) function [x, cost, info, options] = rlbfgs(problem, x0) function [x, cost, info, options] = rlbfgs(problem, x0, options) function [x, cost, info, options] = rlbfgs(problem, [], options) This is a Riemannian limited memory BFGS solver (quasi-Newton method), which aims to minimize the cost function in the given problem structure. It requires access to the gradient of the cost function. Parameter options.memory can be used to specify the number of iterations the algorithm remembers and uses to approximate the inverse Hessian of the cost. Default value is 30. For unlimited memory, set options.memory = Inf. For a description of the algorithm and theorems offering convergence guarantees, see the references below. The initial iterate is x0 if it is provided. Otherwise, a random point on the manifold is picked. To specify options whilst not specifying an initial iterate, give x0 as [] (the empty matrix). The two outputs 'x' and 'cost' are the last reached point on the manifold and its cost. The output 'info' is a struct-array which contains information about the iterations: iter (integer) The iteration number. The initial guess is 0. cost (double) The corresponding cost value. gradnorm (double) The (Riemannian) norm of the gradient. time (double) The total elapsed time in seconds to reach the corresponding cost. stepsize (double) The size of the step from the previous to the new iterate. accepted (Boolean) true if step is accepted in the cautious update. 0 otherwise. And possibly additional information logged by options.statsfun. For example, type [info.gradnorm] to obtain a vector of the successive gradient norms reached at each iteration. The options structure is used to overwrite the default values. All options have a default value and are hence optional. To force an option value, pass an options structure with a field options.optionname, where optionname is one of the following and the default value is indicated between parentheses: tolgradnorm (1e-6) The algorithm terminates if the norm of the gradient drops below this. For well-scaled problems, a rule of thumb is that you can expect to reduce the gradient norm by 8 orders of magnitude (sqrt(eps)) compared to the gradient norm at a "typical" point (a rough initial iterate for example). Further decrease is sometimes possible, but inexact floating point arithmetic will eventually limit the final accuracy. If tolgradnorm is set too low, the algorithm may end up iterating forever (or at least until another stopping criterion triggers). maxiter (1000) The algorithm terminates if maxiter iterations were executed. maxtime (Inf) The algorithm terminates if maxtime seconds elapsed. minstepsize (1e-10) The minimum norm of the tangent vector that points from the current point to the next point. If the norm is less than minstepsize, the program will terminate. memory (30) The number of previous iterations the program remembers. This is used to approximate the inverse Hessian at the current point. Because of difficulty of maintaining a representation of operators in terms of coordinates, a recursive method is used. The number of steps in the recursion is at most options.memory. This parameter can take any integer value >= 0, or Inf, which is taken to be options.maxiter. If options.maxiter has value Inf, then it will take value 10000 and a warning will be displayed. strict_inc_func (@(t) 1e-4*t) The Cautious step needs a real function that has value 0 at t = 0, and is strictly increasing. See details in Wen Huang's paper "A Riemannian BFGS Method without Differentiated Retraction for Nonconvex Optimization Problems" statsfun (none) Function handle to a function that will be called after each iteration to provide the opportunity to log additional statistics. They will be returned in the info struct. See the generic Manopt documentation about solvers for further information. statsfun is called with the point x that was reached last. stopfun (none) Function handle to a function that will be called at each iteration to provide the opportunity to specify additional stopping criteria. See the generic Manopt documentation about solvers for further information. verbosity (2) Integer number used to tune the amount of output the algorithm generates during execution (mostly as text in the command window). The higher, the more output. 0 means silent. 3 and above includes a display of the options structure at the beginning of the execution. debug (false) Set to true to allow the algorithm to perform additional computations for debugging purposes. If a debugging test fails, you will be informed of it, usually via the command window. Be aware that these additional computations appear in the algorithm timings too, and may interfere with operations such as counting the number of cost evaluations, etc. (the debug calls get storedb too). storedepth (2) Maximum number of different points x of the manifold for which a store structure will be kept in memory in the storedb for caching. If memory usage is an issue, you may try to lower this number. Profiling may then help to investigate if a performance hit was incurred as a result. ls_initial_scale (@(gradnorm) 1/gradnorm) A function handle that takes as input a real number (a gradient norm) and outputs a real number for how to scale the initialization of line-search. Please cite the Manopt paper as well as the research paper: @InBook{Huang2016, title = {A {R}iemannian {BFGS} Method for Nonconvex Optimization Problems}, author = {Huang, W. and Absil, P.-A. and Gallivan, K.A.}, year = {2016}, publisher = {Springer International Publishing}, editor = {Karas{\"o}zen, B{\"u}lent and Manguo{\u{g}}lu, Murat and Tezer-Sezgin, M{\"u}nevver and G{\"o}ktepe, Serdar and U{\u{g}}ur, {\"O}m{\"u}r}, address = {Cham}, booktitle = {Numerical Mathematics and Advanced Applications ENUMATH 2015}, pages = {627--634}, doi = {10.1007/978-3-319-39929-4_60} } We point out that, at the moment, this implementation of RLBFGS can be slower than the implementation in ROPTLIB by Wen Huang et al. referenced above. For the purpose of comparing to their work, please use their implementation.
0001 function [x, cost, info, options] = rlbfgs(problem, x0, options) 0002 % Riemannian limited memory BFGS solver for smooth objective functions. 0003 % 0004 % function [x, cost, info, options] = rlbfgs(problem) 0005 % function [x, cost, info, options] = rlbfgs(problem, x0) 0006 % function [x, cost, info, options] = rlbfgs(problem, x0, options) 0007 % function [x, cost, info, options] = rlbfgs(problem, [], options) 0008 % 0009 % 0010 % This is a Riemannian limited memory BFGS solver (quasi-Newton method), 0011 % which aims to minimize the cost function in the given problem structure. 0012 % It requires access to the gradient of the cost function. 0013 % 0014 % Parameter options.memory can be used to specify the number of iterations 0015 % the algorithm remembers and uses to approximate the inverse Hessian of 0016 % the cost. Default value is 30. 0017 % For unlimited memory, set options.memory = Inf. 0018 % 0019 % 0020 % For a description of the algorithm and theorems offering convergence 0021 % guarantees, see the references below. 0022 % 0023 % The initial iterate is x0 if it is provided. Otherwise, a random point on 0024 % the manifold is picked. To specify options whilst not specifying an 0025 % initial iterate, give x0 as [] (the empty matrix). 0026 % 0027 % The two outputs 'x' and 'cost' are the last reached point on the manifold 0028 % and its cost. 0029 % 0030 % The output 'info' is a struct-array which contains information about the 0031 % iterations: 0032 % iter (integer) 0033 % The iteration number. The initial guess is 0. 0034 % cost (double) 0035 % The corresponding cost value. 0036 % gradnorm (double) 0037 % The (Riemannian) norm of the gradient. 0038 % time (double) 0039 % The total elapsed time in seconds to reach the corresponding cost. 0040 % stepsize (double) 0041 % The size of the step from the previous to the new iterate. 0042 % accepted (Boolean) 0043 % true if step is accepted in the cautious update. 0 otherwise. 0044 % And possibly additional information logged by options.statsfun. 0045 % For example, type [info.gradnorm] to obtain a vector of the successive 0046 % gradient norms reached at each iteration. 0047 % 0048 % The options structure is used to overwrite the default values. All 0049 % options have a default value and are hence optional. To force an option 0050 % value, pass an options structure with a field options.optionname, where 0051 % optionname is one of the following and the default value is indicated 0052 % between parentheses: 0053 % 0054 % tolgradnorm (1e-6) 0055 % The algorithm terminates if the norm of the gradient drops below 0056 % this. For well-scaled problems, a rule of thumb is that you can 0057 % expect to reduce the gradient norm by 8 orders of magnitude 0058 % (sqrt(eps)) compared to the gradient norm at a "typical" point (a 0059 % rough initial iterate for example). Further decrease is sometimes 0060 % possible, but inexact floating point arithmetic will eventually 0061 % limit the final accuracy. If tolgradnorm is set too low, the 0062 % algorithm may end up iterating forever (or at least until another 0063 % stopping criterion triggers). 0064 % maxiter (1000) 0065 % The algorithm terminates if maxiter iterations were executed. 0066 % maxtime (Inf) 0067 % The algorithm terminates if maxtime seconds elapsed. 0068 % minstepsize (1e-10) 0069 % The minimum norm of the tangent vector that points from the current 0070 % point to the next point. If the norm is less than minstepsize, the 0071 % program will terminate. 0072 % memory (30) 0073 % The number of previous iterations the program remembers. This is used 0074 % to approximate the inverse Hessian at the current point. Because of 0075 % difficulty of maintaining a representation of operators in terms of 0076 % coordinates, a recursive method is used. The number of steps in the 0077 % recursion is at most options.memory. This parameter can take any 0078 % integer value >= 0, or Inf, which is taken to be options.maxiter. If 0079 % options.maxiter has value Inf, then it will take value 10000 and a 0080 % warning will be displayed. 0081 % strict_inc_func (@(t) 1e-4*t) 0082 % The Cautious step needs a real function that has value 0 at t = 0, 0083 % and is strictly increasing. See details in Wen Huang's paper 0084 % "A Riemannian BFGS Method without Differentiated Retraction for 0085 % Nonconvex Optimization Problems" 0086 % statsfun (none) 0087 % Function handle to a function that will be called after each 0088 % iteration to provide the opportunity to log additional statistics. 0089 % They will be returned in the info struct. See the generic Manopt 0090 % documentation about solvers for further information. statsfun is 0091 % called with the point x that was reached last. 0092 % stopfun (none) 0093 % Function handle to a function that will be called at each iteration 0094 % to provide the opportunity to specify additional stopping criteria. 0095 % See the generic Manopt documentation about solvers for further 0096 % information. 0097 % verbosity (2) 0098 % Integer number used to tune the amount of output the algorithm 0099 % generates during execution (mostly as text in the command window). 0100 % The higher, the more output. 0 means silent. 3 and above includes a 0101 % display of the options structure at the beginning of the execution. 0102 % debug (false) 0103 % Set to true to allow the algorithm to perform additional 0104 % computations for debugging purposes. If a debugging test fails, you 0105 % will be informed of it, usually via the command window. Be aware 0106 % that these additional computations appear in the algorithm timings 0107 % too, and may interfere with operations such as counting the number 0108 % of cost evaluations, etc. (the debug calls get storedb too). 0109 % storedepth (2) 0110 % Maximum number of different points x of the manifold for which a 0111 % store structure will be kept in memory in the storedb for caching. 0112 % If memory usage is an issue, you may try to lower this number. 0113 % Profiling may then help to investigate if a performance hit was 0114 % incurred as a result. 0115 % ls_initial_scale (@(gradnorm) 1/gradnorm) 0116 % A function handle that takes as input a real number (a gradient norm) 0117 % and outputs a real number for how to scale the initialization of 0118 % line-search. 0119 % 0120 % 0121 % Please cite the Manopt paper as well as the research paper: 0122 % @InBook{Huang2016, 0123 % title = {A {R}iemannian {BFGS} Method for Nonconvex Optimization Problems}, 0124 % author = {Huang, W. and Absil, P.-A. and Gallivan, K.A.}, 0125 % year = {2016}, 0126 % publisher = {Springer International Publishing}, 0127 % editor = {Karas{\"o}zen, B{\"u}lent and Manguo{\u{g}}lu, Murat and Tezer-Sezgin, M{\"u}nevver and G{\"o}ktepe, Serdar and U{\u{g}}ur, {\"O}m{\"u}r}, 0128 % address = {Cham}, 0129 % booktitle = {Numerical Mathematics and Advanced Applications ENUMATH 2015}, 0130 % pages = {627--634}, 0131 % doi = {10.1007/978-3-319-39929-4_60} 0132 % } 0133 % 0134 % We point out that, at the moment, this implementation of RLBFGS can be 0135 % slower than the implementation in ROPTLIB by Wen Huang et al. referenced 0136 % above. For the purpose of comparing to their work, please use their 0137 % implementation. 0138 0139 0140 % This file is part of Manopt: www.manopt.org. 0141 % Original author: Changshuo Liu, July 19, 2017. 0142 % Contributors: Nicolas Boumal 0143 % Change log: 0144 % 0145 % Nov. 27, 2017 (Wen Huang): 0146 % Changed the default strict_inc_func to @(t) 1e-4*t from @(t) t. 0147 % 0148 % Jan. 18, 2018 (NB): 0149 % Corrected a bug related to the way the line search hint was defined 0150 % by default. 0151 % 0152 % Aug. 2, 2018 (NB): 0153 % Using the new storedb.remove features to keep storedb lean, and 0154 % reduced the default value of storedepth from 30 to 2 as a result. 0155 % 0156 % Aug. 2, 2022 (sfrcorne): 0157 % Added option ls_initial_scale, with default setting to ensure 0158 % the method is invariant to positive scaling of the cost function. 0159 0160 0161 % Verify that the problem description is sufficient for the solver. 0162 if ~canGetCost(problem) 0163 warning('manopt:getCost', ... 0164 'No cost provided. The algorithm will likely abort.'); 0165 end 0166 if ~canGetGradient(problem) && ~canGetApproxGradient(problem) 0167 % Note: we do not give a warning if an approximate gradient is 0168 % explicitly given in the problem description, as in that case the user 0169 % seems to be aware of the issue. 0170 warning('manopt:getGradient:approx', ... 0171 ['No gradient provided. Using an FD approximation instead (slow).\n' ... 0172 'It may be necessary to increase options.tolgradnorm.\n' ... 0173 'To disable this warning: warning(''off'', ''manopt:getGradient:approx'')']); 0174 problem.approxgrad = approxgradientFD(problem); 0175 end 0176 0177 % This solver uses linesearch_hint as a line search algorithm. By 0178 % default, try a step size of 1, so that if the BFGS approximation of 0179 % the Hessian (or inverse Hessian) is good, then the iteration is close 0180 % to a Newton step. 0181 if ~canGetLinesearch(problem) 0182 problem.linesearch = @(x, d) 1; 0183 end 0184 0185 % Local defaults for the program 0186 localdefaults.minstepsize = 1e-10; 0187 localdefaults.maxiter = 1000; 0188 localdefaults.tolgradnorm = 1e-6; 0189 localdefaults.memory = 30; 0190 localdefaults.strict_inc_func = @(t) 1e-4*t; 0191 localdefaults.ls_max_steps = 25; 0192 localdefaults.ls_initial_scale = @(gradnorm) 1/gradnorm; 0193 localdefaults.storedepth = 2; 0194 0195 % Merge global and local defaults, then merge w/ user options, if any. 0196 localdefaults = mergeOptions(getGlobalDefaults(), localdefaults); 0197 if ~exist('options', 'var') || isempty(options) 0198 options = struct(); 0199 end 0200 options = mergeOptions(localdefaults, options); 0201 0202 % To make sure memory in range [0, Inf) 0203 options.memory = max(options.memory, 0); 0204 if options.memory == Inf 0205 if isinf(options.maxiter) 0206 options.memory = 10000; 0207 warning('rlbfgs:memory', ['options.memory and options.maxiter' ... 0208 ' are both Inf; options.memory has been changed to 10000.']); 0209 else 0210 options.memory = options.maxiter; 0211 end 0212 end 0213 0214 M = problem.M; 0215 0216 0217 timetic = tic(); 0218 0219 0220 % Create a random starting point if no starting point is provided. 0221 if ~exist('x0', 'var')|| isempty(x0) 0222 xCur = M.rand(); 0223 else 0224 xCur = x0; 0225 end 0226 0227 % Create a store database and get a key for the current x 0228 storedb = StoreDB(options.storedepth); 0229 key = storedb.getNewKey(); 0230 0231 % __________Initialization of variables______________ 0232 % Number of iterations since the last restart 0233 k = 0; 0234 % Total number of BFGS iterations 0235 iter = 0; 0236 0237 % This cell stores step vectors which point from x_{t} to x_{t+1} for t 0238 % indexing the last iterations, capped at options.memory. 0239 % That is, it stores up to options.memory of the most recent step 0240 % vectors. However, the implementation below does not need step vectors 0241 % in their respective tangent spaces at x_{t}'s. Rather, it requires 0242 % them transported to the current point's tangent space by vector 0243 % transport. For details regarding the requirements on the the vector 0244 % transport, see the reference paper by Huang et al. 0245 % In this implementation, those step vectors are iteratively 0246 % transported to the current point's tangent space after every 0247 % iteration. Thus, at every iteration, vectors in sHistory are in the 0248 % current point's tangent space. 0249 sHistory = cell(1, options.memory); 0250 0251 % This cell stores the differences for latest t's of the gradient at 0252 % x_{t+1} and the gradient at x_{t}, transported to x_{t+1}'s tangent 0253 % space. The memory is also capped at options.memory. 0254 yHistory = cell(1, options.memory); 0255 0256 % rhoHistory{t} stores the reciprocal of the inner product between 0257 % sHistory{t} and yHistory{t}. 0258 rhoHistory = cell(1, options.memory); 0259 0260 % Scaling of direction given by getDirection for acceptable step 0261 alpha = 1; 0262 0263 % Norm of the step 0264 stepsize = 1; 0265 0266 % Stores whether the step is accepted by the cautious update check. 0267 accepted = true; 0268 0269 % Query the cost function and its gradient 0270 [xCurCost, xCurGradient] = getCostGrad(problem, xCur, storedb, key); 0271 0272 xCurGradNorm = M.norm(xCur, xCurGradient); 0273 0274 % Scaling of initial matrix, Barzilai-Borwein. 0275 scaleFactor = options.ls_initial_scale(xCurGradNorm); 0276 0277 % Line-search statistics for recording in info. 0278 lsstats = []; 0279 0280 % Flag to control restarting scheme to avoid infinite loops (see below) 0281 ultimatum = false; 0282 0283 % Save stats in a struct array info, and preallocate. 0284 stats = savestats(); 0285 info(1) = stats; 0286 info(min(10000, options.maxiter+1)).iter = []; 0287 0288 if options.verbosity >= 2 0289 fprintf(' iter cost val grad. norm alpha\n'); 0290 end 0291 0292 % Main iteration 0293 while true 0294 0295 % Display iteration information 0296 if options.verbosity >= 2 0297 fprintf('%5d %+.16e %.8e %.4e\n', ... 0298 iter, xCurCost, xCurGradNorm, alpha); 0299 end 0300 0301 % Start timing this iteration 0302 timetic = tic(); 0303 0304 % Run standard stopping criterion checks 0305 [stop, reason] = stoppingcriterion(problem, xCur, options, ... 0306 info, iter+1); 0307 0308 % If none triggered, run specific stopping criterion check 0309 if ~stop 0310 if stats.stepsize < options.minstepsize 0311 % To avoid infinite loop and to push the search further 0312 % in case BFGS approximation of Hessian is off towards 0313 % the end, we erase the memory by setting k = 0; 0314 % In this way, it starts off like a steepest descent. 0315 % If even steepest descent does not work, then it is 0316 % hopeless and we will terminate. 0317 if ~ultimatum 0318 if options.verbosity >= 2 0319 fprintf(['stepsize is too small, restarting ' ... 0320 'the bfgs procedure at the current point.\n']); 0321 end 0322 k = 0; 0323 ultimatum = true; 0324 else 0325 stop = true; 0326 reason = sprintf(['Last stepsize smaller than ' ... 0327 'minimum allowed; options.minstepsize = %g.'], ... 0328 options.minstepsize); 0329 end 0330 else 0331 % We are not in trouble: lift the ultimatum if it was on. 0332 ultimatum = false; 0333 end 0334 end 0335 0336 if stop 0337 if options.verbosity >= 1 0338 fprintf([reason '\n']); 0339 end 0340 break; 0341 end 0342 0343 0344 % Compute BFGS direction 0345 p = getDirection(M, xCur, xCurGradient, sHistory,... 0346 yHistory, rhoHistory, scaleFactor, min(k, options.memory)); 0347 0348 % Execute line-search 0349 [stepsize, xNext, newkey, lsstats] = ... 0350 linesearch_hint(problem, xCur, p, xCurCost, ... 0351 M.inner(xCur, xCurGradient, p), ... 0352 options, storedb, key); 0353 0354 % Record the BFGS step-multiplier alpha which was effectively 0355 % selected. Toward convergence, we hope to see alpha = 1. 0356 alpha = stepsize/M.norm(xCur, p); 0357 step = M.lincomb(xCur, alpha, p); 0358 0359 0360 % Query cost and gradient at the candidate new point. 0361 [xNextCost, xNextGrad] = getCostGrad(problem, xNext, storedb, newkey); 0362 0363 % Compute sk and yk 0364 sk = M.transp(xCur, xNext, step); 0365 yk = M.lincomb(xNext, 1, xNextGrad, ... 0366 -1, M.transp(xCur, xNext, xCurGradient)); 0367 0368 % Computation of the BFGS step is invariant under scaling of sk and 0369 % yk by a common factor. For numerical reasons, we scale sk and yk 0370 % so that sk is a unit norm vector. 0371 norm_sk = M.norm(xNext, sk); 0372 sk = M.lincomb(xNext, 1/norm_sk, sk); 0373 yk = M.lincomb(xNext, 1/norm_sk, yk); 0374 0375 inner_sk_yk = M.inner(xNext, sk, yk); 0376 inner_sk_sk = M.norm(xNext, sk)^2; % ensures nonnegativity 0377 0378 0379 % If the cautious step is accepted (which is the intended 0380 % behavior), we record sk, yk and rhok and need to do some 0381 % housekeeping. If the cautious step is rejected, these are not 0382 % recorded. In all cases, xNext is the next iterate: the notion of 0383 % accept/reject here is limited to whether or not we keep track of 0384 % sk, yk, rhok to update the BFGS operator. 0385 cap = options.strict_inc_func(xCurGradNorm); 0386 if inner_sk_sk ~= 0 && (inner_sk_yk / inner_sk_sk) >= cap 0387 0388 accepted = true; 0389 0390 rhok = 1/inner_sk_yk; 0391 0392 scaleFactor = inner_sk_yk / M.norm(xNext, yk)^2; 0393 0394 % Time to store the vectors sk, yk and the scalar rhok. 0395 % Remember: we need to transport all vectors to the most 0396 % current tangent space. 0397 0398 % If we are out of memory 0399 if k >= options.memory 0400 0401 % sk and yk are saved from 1 to the end with the most 0402 % current recorded to the rightmost hand side of the cells 0403 % that are occupied. When memory is full, do a shift so 0404 % that the rightmost is earliest and replace it with the 0405 % most recent sk, yk. 0406 for i = 2 : options.memory 0407 sHistory{i} = M.transp(xCur, xNext, sHistory{i}); 0408 yHistory{i} = M.transp(xCur, xNext, yHistory{i}); 0409 end 0410 if options.memory > 1 0411 sHistory = sHistory([2:end, 1]); 0412 yHistory = yHistory([2:end, 1]); 0413 rhoHistory = rhoHistory([2:end 1]); 0414 end 0415 if options.memory > 0 0416 sHistory{options.memory} = sk; 0417 yHistory{options.memory} = yk; 0418 rhoHistory{options.memory} = rhok; 0419 end 0420 0421 % If we are not out of memory 0422 else 0423 0424 for i = 1:k 0425 sHistory{i} = M.transp(xCur, xNext, sHistory{i}); 0426 yHistory{i} = M.transp(xCur, xNext, yHistory{i}); 0427 end 0428 sHistory{k+1} = sk; 0429 yHistory{k+1} = yk; 0430 rhoHistory{k+1} = rhok; 0431 0432 end 0433 0434 k = k + 1; 0435 0436 % The cautious step is rejected: we do not store sk, yk, rhok but 0437 % we still need to transport stored vectors to the new tangent 0438 % space. 0439 else 0440 0441 accepted = false; 0442 0443 for i = 1 : min(k, options.memory) 0444 sHistory{i} = M.transp(xCur, xNext, sHistory{i}); 0445 yHistory{i} = M.transp(xCur, xNext, yHistory{i}); 0446 end 0447 0448 end 0449 0450 % Update variables to new iterate. 0451 storedb.removefirstifdifferent(key, newkey); 0452 xCur = xNext; 0453 key = newkey; 0454 xCurGradient = xNextGrad; 0455 xCurGradNorm = M.norm(xNext, xNextGrad); 0456 xCurCost = xNextCost; 0457 0458 % iter is the number of iterations we have accomplished. 0459 iter = iter + 1; 0460 0461 % Make sure we don't use too much memory for the store database 0462 % (this is independent from the BFGS memory.) 0463 storedb.purge(); 0464 0465 0466 % Log statistics for freshly executed iteration 0467 stats = savestats(); 0468 info(iter+1) = stats; 0469 0470 end 0471 0472 0473 % Housekeeping before we return 0474 info = info(1:iter+1); 0475 x = xCur; 0476 cost = xCurCost; 0477 0478 if options.verbosity >= 1 0479 fprintf('Total time is %f [s] (excludes statsfun)\n', ... 0480 info(end).time); 0481 end 0482 0483 0484 % Routine in charge of collecting the current iteration stats 0485 function stats = savestats() 0486 stats.iter = iter; 0487 stats.cost = xCurCost; 0488 stats.gradnorm = xCurGradNorm; 0489 if iter == 0 0490 stats.stepsize = NaN; 0491 stats.time = toc(timetic); 0492 stats.accepted = NaN; 0493 else 0494 stats.stepsize = stepsize; 0495 stats.time = info(iter).time + toc(timetic); 0496 stats.accepted = accepted; 0497 end 0498 stats.linesearch = lsstats; 0499 stats = applyStatsfun(problem, xCur, storedb, key, options, stats); 0500 end 0501 0502 end 0503 0504 0505 0506 0507 % BFGS step, see Wen's paper for details. This function takes in a tangent 0508 % vector g, and applies an approximate inverse Hessian P to it to get Pg. 0509 % Then, -Pg is returned. 0510 % 0511 % Theory requires the vector transport to be isometric and to satisfy the 0512 % locking condition (see paper), but these properties do not seem to be 0513 % crucial in practice. If your manifold provides M.isotransp, it may be 0514 % good to do M.transp = M.isotransp; after loading M with a factory. 0515 % 0516 % This implementation operates in the tangent space of the most recent 0517 % point since all vectors in sHistory and yHistory have been transported 0518 % there. 0519 function dir = getDirection(M, xCur, xCurGradient, sHistory, yHistory, ... 0520 rhoHistory, scaleFactor, k) 0521 0522 q = xCurGradient; 0523 0524 inner_s_q = zeros(1, k); 0525 0526 for i = k : -1 : 1 0527 inner_s_q(1, i) = rhoHistory{i} * M.inner(xCur, sHistory{i}, q); 0528 q = M.lincomb(xCur, 1, q, -inner_s_q(1, i), yHistory{i}); 0529 end 0530 0531 r = M.lincomb(xCur, scaleFactor, q); 0532 0533 for i = 1 : k 0534 omega = rhoHistory{i} * M.inner(xCur, yHistory{i}, r); 0535 r = M.lincomb(xCur, 1, r, inner_s_q(1, i)-omega, sHistory{i}); 0536 end 0537 0538 dir = M.lincomb(xCur, -1, r); 0539 0540 end