Adaptive regularization by cubics (ARC) minimization algorithm for Manopt function [x, cost, info, options] = arc(problem) function [x, cost, info, options] = arc(problem, x0) function [x, cost, info, options] = arc(problem, x0, options) function [x, cost, info, options] = arc(problem, [], options) Apply the ARC minimization algorithm to the problem defined in the problem structure, starting at x0 if it is provided (otherwise, at a random point on the manifold). To specify options whilst not specifying an initial guess, give x0 as [] (the empty matrix). In most of the examples bundled with the toolbox (see link below), the solver can be replaced by the present one as is. With the default subproblem solver (@arc_conjugate_gradient), tuning parameter options.theta properly appears important for performance. Users may want to try different values in the range 1e-3 to 1e3 for a particular application. The outputs x and cost are the last reached point on the manifold and its cost. The struct-array info contains information about the iterations: iter (integer) The (outer) iteration number, i.e., number of steps considered so far (whether accepted or rejected). The initial guess is 0. cost (double) The corresponding cost value. gradnorm (double) The (Riemannian) norm of the gradient. hessiancalls (integer) The number of Hessian calls issued by the subproblem solver to compute this iterate. time (double) The total elapsed time in seconds to reach the corresponding cost. rho (double) The regularized performance ratio for the iterate. See options.rho_regularization. rhonum, rhoden (double) Numerator and denominator of the performance ratio, before regularization. accepted (boolean) Whether the proposed iterate was accepted or not. stepsize (double) The (Riemannian) norm of the vector returned by the subproblem solver and which is retracted to obtain the proposed next iterate. If accepted = true for the corresponding iterate, this is the size of the step from the previous to the new iterate. If accepted is false, the step was not executed and this is the size of the rejected step. sigma (double) The cubic regularization parameter at the outer iteration. And possibly additional information logged by options.statsfun or by the subproblem solver. For example, type [info.gradnorm] to obtain a vector of the successive gradient norms reached and [info.time] to obtain a vector with the corresponding computation times to reach that point. The options structure is used to overwrite the default values. All options have a default value and are hence optional. To force an option value, pass an options structure with a field options.optionname, where optionname is one of the following. The default value is indicated between parentheses. The subproblem solver may also accept options. tolgradnorm (1e-6) The algorithm terminates if the norm of the gradient drops below this. maxiter (1000) The algorithm terminates if maxiter (outer) iterations have been executed. maxtime (Inf) The algorithm terminates if maxtime seconds elapsed. sigma_0 (100 / trust-regions default maximum radius) Initial regularization parameter. sigma_min (1e-7) Minimum regularization parameter. eta_1 (0.1) If rho is below this, the step is unsuccessful (rejected). eta_2 (0.9) If rho exceeds this, the step is very successful. gamma_1 (0.1) Shrinking factor for regularization parameter if very successful. gamma_2 (2) Expansion factor for regularization parameter if unsuccessful. subproblemsolver (@arc_conjugate_gradient) Function handle to a subproblem solver. The subproblem solver will also see this options structure, so that parameters can be passed to it through here as well. Built-in solvers included: arc_lanczos arc_conjugate_gradient arc_gradient_descent rho_regularization (1e3) See help for the same parameter in the trustregions solver. statsfun (none) Function handle to a function that will be called after each iteration to provide the opportunity to log additional statistics. They will be returned in the info struct. See the generic Manopt documentation about solvers for further information. stopfun (none) Function handle to a function that will be called at each iteration to provide the opportunity to specify additional stopping criteria. See the generic Manopt documentation about solvers for further information. verbosity (3) Integer number used to tune the amount of output the algorithm generates during execution (mostly as text in the command window). The higher, the more output. 0 means silent. storedepth (2) Maximum number of different points x of the manifold for which a store structure will be kept in memory in the storedb. If the caching features of Manopt are not used, this is irrelevant. As of version 5.0, this is not particularly important. Please cite the Manopt paper as well as the research paper: @article{agarwal2018arcfirst, author = {Agarwal, N. and Boumal, N. and Bullins, B. and Cartis, C.}, title = {Adaptive regularization with cubics on manifolds}, journal = {arXiv preprint arXiv:1806.00065}, year = {2018} } See also: trustregions conjugategradient manopt/examples arc_lanczos arc_conjugate_gradient
0001 function [x, cost, info, options] = arc(problem, x, options) 0002 % Adaptive regularization by cubics (ARC) minimization algorithm for Manopt 0003 % 0004 % function [x, cost, info, options] = arc(problem) 0005 % function [x, cost, info, options] = arc(problem, x0) 0006 % function [x, cost, info, options] = arc(problem, x0, options) 0007 % function [x, cost, info, options] = arc(problem, [], options) 0008 % 0009 % Apply the ARC minimization algorithm to the problem defined in the 0010 % problem structure, starting at x0 if it is provided (otherwise, at a 0011 % random point on the manifold). To specify options whilst not specifying 0012 % an initial guess, give x0 as [] (the empty matrix). 0013 % 0014 % In most of the examples bundled with the toolbox (see link below), the 0015 % solver can be replaced by the present one as is. 0016 % 0017 % With the default subproblem solver (@arc_conjugate_gradient), tuning 0018 % parameter options.theta properly appears important for performance. 0019 % Users may want to try different values in the range 1e-3 to 1e3 for a 0020 % particular application. 0021 % 0022 % The outputs x and cost are the last reached point on the manifold and its 0023 % cost. The struct-array info contains information about the iterations: 0024 % iter (integer) 0025 % The (outer) iteration number, i.e., number of steps considered 0026 % so far (whether accepted or rejected). The initial guess is 0. 0027 % cost (double) 0028 % The corresponding cost value. 0029 % gradnorm (double) 0030 % The (Riemannian) norm of the gradient. 0031 % hessiancalls (integer) 0032 % The number of Hessian calls issued by the subproblem solver to 0033 % compute this iterate. 0034 % time (double) 0035 % The total elapsed time in seconds to reach the corresponding cost. 0036 % rho (double) 0037 % The regularized performance ratio for the iterate. 0038 % See options.rho_regularization. 0039 % rhonum, rhoden (double) 0040 % Numerator and denominator of the performance ratio, before 0041 % regularization. 0042 % accepted (boolean) 0043 % Whether the proposed iterate was accepted or not. 0044 % stepsize (double) 0045 % The (Riemannian) norm of the vector returned by the subproblem 0046 % solver and which is retracted to obtain the proposed next iterate. 0047 % If accepted = true for the corresponding iterate, this is the size 0048 % of the step from the previous to the new iterate. If accepted is 0049 % false, the step was not executed and this is the size of the 0050 % rejected step. 0051 % sigma (double) 0052 % The cubic regularization parameter at the outer iteration. 0053 % And possibly additional information logged by options.statsfun or by 0054 % the subproblem solver. 0055 % For example, type [info.gradnorm] to obtain a vector of the successive 0056 % gradient norms reached and [info.time] to obtain a vector with the 0057 % corresponding computation times to reach that point. 0058 % 0059 % The options structure is used to overwrite the default values. All 0060 % options have a default value and are hence optional. To force an option 0061 % value, pass an options structure with a field options.optionname, where 0062 % optionname is one of the following. The default value is indicated 0063 % between parentheses. The subproblem solver may also accept options. 0064 % 0065 % tolgradnorm (1e-6) 0066 % The algorithm terminates if the norm of the gradient drops below this. 0067 % maxiter (1000) 0068 % The algorithm terminates if maxiter (outer) iterations have been executed. 0069 % maxtime (Inf) 0070 % The algorithm terminates if maxtime seconds elapsed. 0071 % sigma_0 (100 / trust-regions default maximum radius) 0072 % Initial regularization parameter. 0073 % sigma_min (1e-7) 0074 % Minimum regularization parameter. 0075 % eta_1 (0.1) 0076 % If rho is below this, the step is unsuccessful (rejected). 0077 % eta_2 (0.9) 0078 % If rho exceeds this, the step is very successful. 0079 % gamma_1 (0.1) 0080 % Shrinking factor for regularization parameter if very successful. 0081 % gamma_2 (2) 0082 % Expansion factor for regularization parameter if unsuccessful. 0083 % subproblemsolver (@arc_conjugate_gradient) 0084 % Function handle to a subproblem solver. The subproblem solver will 0085 % also see this options structure, so that parameters can be passed 0086 % to it through here as well. Built-in solvers included: 0087 % arc_lanczos 0088 % arc_conjugate_gradient 0089 % arc_gradient_descent 0090 % rho_regularization (1e3) 0091 % See help for the same parameter in the trustregions solver. 0092 % statsfun (none) 0093 % Function handle to a function that will be called after each 0094 % iteration to provide the opportunity to log additional statistics. 0095 % They will be returned in the info struct. See the generic Manopt 0096 % documentation about solvers for further information. 0097 % stopfun (none) 0098 % Function handle to a function that will be called at each iteration 0099 % to provide the opportunity to specify additional stopping criteria. 0100 % See the generic Manopt documentation about solvers for further 0101 % information. 0102 % verbosity (3) 0103 % Integer number used to tune the amount of output the algorithm 0104 % generates during execution (mostly as text in the command window). 0105 % The higher, the more output. 0 means silent. 0106 % storedepth (2) 0107 % Maximum number of different points x of the manifold for which a 0108 % store structure will be kept in memory in the storedb. If the 0109 % caching features of Manopt are not used, this is irrelevant. As of 0110 % version 5.0, this is not particularly important. 0111 % 0112 % 0113 % Please cite the Manopt paper as well as the research paper: 0114 % @article{agarwal2018arcfirst, 0115 % author = {Agarwal, N. and Boumal, N. and Bullins, B. and Cartis, C.}, 0116 % title = {Adaptive regularization with cubics on manifolds}, 0117 % journal = {arXiv preprint arXiv:1806.00065}, 0118 % year = {2018} 0119 % } 0120 % 0121 % 0122 % See also: trustregions conjugategradient manopt/examples arc_lanczos arc_conjugate_gradient 0123 0124 % This file is part of Manopt: www.manopt.org. 0125 % Original authors: May 1, 2018, 0126 % Naman Agarwal, Brian Bullins, Nicolas Boumal and Coralia Cartis. 0127 % Contributors: 0128 % Change log: 0129 % 0130 % Aug 14, 2019 (NB): 0131 % Default subproblem solver for ARC is now arc_conjugate_gradient 0132 % instead of arc_lanczos. Default gamma_2 changed to 2 from 5. 0133 0134 M = problem.M; 0135 0136 % Verify that the problem description is sufficient for the solver. 0137 if ~canGetCost(problem) 0138 warning('manopt:getCost', ... 0139 'No cost provided. The algorithm will likely abort.'); 0140 end 0141 if ~canGetGradient(problem) && ~canGetApproxGradient(problem) 0142 % Note: we do not give a warning if an approximate gradient is 0143 % explicitly given in the problem description, as in that case the 0144 % user seems to be aware of the issue. 0145 warning('manopt:getGradient:approx', ['No gradient provided. ' ... 0146 'Using an FD approximation instead (slow).\n' ... 0147 'It may be necessary to increase options.tolgradnorm.\n'... 0148 'To disable this warning: ' ... 0149 'warning(''off'', ''manopt:getGradient:approx'')']); 0150 problem.approxgrad = approxgradientFD(problem); 0151 end 0152 if ~canGetHessian(problem) && ~canGetApproxHessian(problem) 0153 % Note: we do not give a warning if an approximate Hessian is 0154 % explicitly given in the problem description, as in that case the 0155 % user seems to be aware of the issue. 0156 warning('manopt:getHessian:approx', ['No Hessian provided. ' ... 0157 'Using an FD approximation instead.\n' ... 0158 'To disable this warning: ' ... 0159 'warning(''off'', ''manopt:getHessian:approx'')']); 0160 problem.approxhess = approxhessianFD(problem); 0161 end 0162 0163 % Set local defaults here 0164 localdefaults.tolgradnorm = 1e-6; 0165 localdefaults.maxiter = 1000; 0166 localdefaults.maxtime = inf; 0167 localdefaults.sigma_min = 1e-7; 0168 localdefaults.eta_1 = 0.1; 0169 localdefaults.eta_2 = 0.9; 0170 localdefaults.gamma_1 = 0.1; 0171 localdefaults.gamma_2 = 2; 0172 localdefaults.storedepth = 2; 0173 localdefaults.subproblemsolver = @arc_conjugate_gradient; 0174 localdefaults.rho_regularization = 1e3; 0175 0176 % Merge global and local defaults, then merge w/ user options, if any. 0177 localdefaults = mergeOptions(getGlobalDefaults(), localdefaults); 0178 if ~exist('options', 'var') || isempty(options) 0179 options = struct(); 0180 end 0181 options = mergeOptions(localdefaults, options); 0182 0183 % Default initial sigma_0 is based on the initial Delta_bar of the 0184 % trustregions solver. 0185 if ~isfield(options, 'sigma_0') 0186 if isfield(M, 'typicaldist') 0187 options.sigma_0 = 100/M.typicaldist(); 0188 else 0189 options.sigma_0 = 100/sqrt(M.dim()); 0190 end 0191 end 0192 0193 0194 timetic = tic(); 0195 0196 % If no initial point x is given by the user, generate one at random. 0197 if ~exist('x', 'var') || isempty(x) 0198 x = M.rand(); 0199 end 0200 0201 % Create a store database and get a key for the current x. 0202 storedb = StoreDB(options.storedepth); 0203 key = storedb.getNewKey(); 0204 0205 % Compute objective-related quantities for x. 0206 [cost, grad] = getCostGrad(problem, x, storedb, key); 0207 gradnorm = M.norm(x, grad); 0208 0209 % Initialize regularization parameter. 0210 sigma = options.sigma_0; 0211 0212 % Iteration counter. 0213 % At any point, iter is the number of fully executed iterations so far. 0214 iter = 0; 0215 0216 % Save stats in a struct array info, and preallocate. 0217 stats = savestats(problem, x, storedb, key, options); 0218 info(1) = stats; 0219 info(min(10000, options.maxiter+1)).iter = []; 0220 0221 if options.verbosity >= 2 0222 fprintf(' iter\t\t\t\t\tcost val\t\t grad norm sigma #Hess\n'); 0223 end 0224 0225 % Iterate until stopping criterion triggers. 0226 while true 0227 0228 % Display iteration information. 0229 if options.verbosity >= 2 0230 fprintf('%5d\t %+.16e\t %.8e %.2e', ... 0231 iter, cost, gradnorm, sigma); 0232 end 0233 0234 % Start timing this iteration. 0235 timetic = tic(); 0236 0237 % Run standard stopping criterion checks. 0238 [stop, reason] = stoppingcriterion(problem, x, options, ... 0239 info, iter+1); 0240 0241 if stop 0242 if options.verbosity >= 1 0243 fprintf(['\n' reason '\n']); 0244 end 0245 break; 0246 end 0247 0248 % Solve the ARC subproblem. 0249 % Outputs: eta is the tentative step (it is a tangent vector at x); 0250 % Heta is the result of applying the Hessian at x along eta (this 0251 % is often a natural by-product of the subproblem solver); 0252 % hesscalls is the number of Hessian calls issued in the solver; 0253 % stop_str is a string describing why the solver terminated; and 0254 % substats is some statistics about the solver's work to be logged. 0255 [eta, Heta, hesscalls, stop_str, substats] = ... 0256 options.subproblemsolver(problem, x, grad, gradnorm, ... 0257 sigma, options, storedb, key); 0258 0259 etanorm = M.norm(x, eta); 0260 0261 % Get a tentative next x by retracting the proposed step. 0262 newx = M.retr(x, eta); 0263 newkey = storedb.getNewKey(); 0264 0265 % Compute the new cost-related quantities for proposal x. 0266 % We could just compute the cost here, as the gradient is only 0267 % necessary if the step is accepted; but we expect most steps are 0268 % accepted, and sometimes the gradient can be computed faster if it 0269 % is computed in conjunction with the cost. 0270 [newcost, newgrad] = getCostGrad(problem, newx, storedb, newkey); 0271 0272 % Compute a regularized ratio between actual and model improvement. 0273 rho_num = cost - newcost; 0274 vec_rho = M.lincomb(x, 1, grad, .5, Heta); 0275 rho_den = -M.inner(x, eta, vec_rho); 0276 rho_reg = options.rho_regularization*eps*max(1, abs(cost)); 0277 rho = (rho_num+rho_reg) / (rho_den+rho_reg); 0278 0279 % In principle, the subproblem solver should guarantee rho_den > 0. 0280 % In practice, it may fail, in which case we reject the step. 0281 subproblem_failure = (rho_den+rho_reg <= 0); 0282 if subproblem_failure 0283 stop_str = sprintf( ... 0284 'SUBPROBLEM FAILURE! (Though it returned: %s)', stop_str); 0285 end 0286 0287 % Determine if the tentative step should be accepted or not. 0288 if rho >= options.eta_1 && ~subproblem_failure 0289 accept = true; 0290 arc_str = 'acc '; 0291 % We accepted this step: erase cache of the previous point. 0292 storedb.removefirstifdifferent(key, newkey); 0293 x = newx; 0294 key = newkey; 0295 cost = newcost; 0296 grad = newgrad; 0297 gradnorm = M.norm(x, grad); 0298 else 0299 accept = false; 0300 arc_str = 'REJ '; 0301 % We rejected this step: erase cache of the tentative point. 0302 storedb.removefirstifdifferent(newkey, key); 0303 end 0304 0305 % Update the regularization parameter. 0306 if rho >= options.eta_2 && ~subproblem_failure 0307 % Very successful step 0308 arc_str(4) = '-'; 0309 if options.gamma_1 > 0 0310 sigma = max(options.sigma_min, options.gamma_1 * sigma); 0311 else 0312 sigma = max(options.sigma_min, min(sigma, gradnorm)); % TODO document this 0313 end 0314 elseif rho >= options.eta_1 && ~subproblem_failure 0315 % Successful step 0316 arc_str(4) = ' '; 0317 else 0318 % Unsuccessful step 0319 arc_str(4) = '+'; 0320 sigma = options.gamma_2 * sigma; 0321 end 0322 0323 % iter is the number of iterations we have completed. 0324 iter = iter + 1; 0325 0326 % Make sure we don't use too much memory for the store database. 0327 storedb.purge(); 0328 0329 % Log statistics for freshly executed iteration. 0330 stats = savestats(problem, x, storedb, key, options); 0331 info(iter+1) = stats; 0332 0333 if options.verbosity >= 2 0334 fprintf(' %5d %s\n', hesscalls, [arc_str ' ' stop_str]); 0335 end 0336 0337 % When the subproblem solver fails, it would be nice to have an 0338 % alternative, such as a slower but more robust solver. For now, we 0339 % force the solver to terminate when that happens. 0340 if subproblem_failure 0341 if options.verbosity >= 1 0342 fprintf(['\nThe subproblem solver failed to make ' ... 0343 'progress even on the model; this is ' ... 0344 'likely due to numerical errors.\n']); 0345 end 0346 break; 0347 end 0348 0349 end 0350 0351 % Truncate the struct-array to the part we populated 0352 info = info(1:iter+1); 0353 0354 if options.verbosity >= 1 0355 fprintf('Total time is %f [s] (excludes statsfun)\n', info(end).time); 0356 end 0357 0358 0359 % Routine in charge of collecting the current iteration statistics 0360 function stats = savestats(problem, x, storedb, key, options) 0361 0362 stats.iter = iter; 0363 stats.cost = cost; 0364 stats.gradnorm = gradnorm; 0365 stats.sigma = sigma; 0366 0367 if iter == 0 0368 stats.hessiancalls = 0; 0369 stats.stepsize = NaN; 0370 stats.time = toc(timetic); 0371 stats.rho = inf; 0372 stats.rhonum = NaN; 0373 stats.rhoden = NaN; 0374 stats.accepted = true; 0375 stats.subproblem = struct(); 0376 else 0377 stats.hessiancalls = hesscalls; 0378 stats.stepsize = etanorm; 0379 stats.time = info(iter).time + toc(timetic); 0380 stats.rho = rho; 0381 stats.rhonum = rho_num; 0382 stats.rhoden = rho_den; 0383 stats.accepted = accept; 0384 stats.subproblem = substats; 0385 end 0386 0387 % Similar to statsfun with trustregions: the x and store passed to 0388 % statsfun are that of the most recently accepted point after the 0389 % iteration fully executed. 0390 stats = applyStatsfun(problem, x, storedb, key, options, stats); 0391 0392 end 0393 0394 end