Returns an SVD decomposition of A truncated to rank p. function [U, S, V, info] = truncated_svd(A, p) Input: A real matrix A of size mxn and an integer p <= min(m, n). Output: An orthonormal matrix U of size mxp, an orthonormal matrix Y of size nxp and a diagonal matrix S of size pxp with nonnegative and decreasing diagonal entries such that USV.' is the best rank p approximation of A according to the Frobenius norm. All real. This function produces an output akin to svds. The decomposition is obtained by maximizing f(U, V) = .5*norm(U'*A*V, 'fro')^2 where U, V are orthonormal. Notice that f(U*Q, V*R) = f(U, V) for all Q, R orthogonal pxp matrices. Hence, only the column spaces of U and V matter and we may perform the optimization over a product of two Grassmannian manifolds. It is easy to show that maximizing f is equivalent to minimizing g with g(U, V) = min_S norm(U*S*V' - A, 'fro')^2, which confirms that we are going for a best low-rank approximation of A. The inner workings of the Grassmann manifold use the built-in svd function of Matlab but only for matrices of size mxp and nxp to re-orthonormalize them. Notice that we are actually chasing a best fixed-rank approximation of a matrix, which is best obtained by working directly over a manifold of fixed-rank matrices. This is simply an example script to demonstrate some functionalities of the toolbox. The code can be modified to accept a function handle for A(x) = A*x instead of a matrix A, which is often useful. This would further require a function handle At for the transpose of A, such that At(x) = A.'*x.
0001 function [U, S, V, info] = truncated_svd(A, p) 0002 % Returns an SVD decomposition of A truncated to rank p. 0003 % 0004 % function [U, S, V, info] = truncated_svd(A, p) 0005 % 0006 % Input: A real matrix A of size mxn and an integer p <= min(m, n). 0007 % Output: An orthonormal matrix U of size mxp, an orthonormal matrix Y of 0008 % size nxp and a diagonal matrix S of size pxp with nonnegative and 0009 % decreasing diagonal entries such that USV.' is the best rank p 0010 % approximation of A according to the Frobenius norm. All real. 0011 % This function produces an output akin to svds. 0012 % 0013 % The decomposition is obtained by maximizing 0014 % f(U, V) = .5*norm(U'*A*V, 'fro')^2 0015 % where U, V are orthonormal. Notice that f(U*Q, V*R) = f(U, V) for all 0016 % Q, R orthogonal pxp matrices. Hence, only the column spaces of U and V 0017 % matter and we may perform the optimization over a product of two 0018 % Grassmannian manifolds. 0019 % 0020 % It is easy to show that maximizing f is equivalent to minimizing g with 0021 % g(U, V) = min_S norm(U*S*V' - A, 'fro')^2, 0022 % which confirms that we are going for a best low-rank approximation of A. 0023 % 0024 % The inner workings of the Grassmann manifold use the built-in svd 0025 % function of Matlab but only for matrices of size mxp and nxp to 0026 % re-orthonormalize them. 0027 % 0028 % Notice that we are actually chasing a best fixed-rank approximation of a 0029 % matrix, which is best obtained by working directly over a manifold of 0030 % fixed-rank matrices. This is simply an example script to demonstrate some 0031 % functionalities of the toolbox. 0032 % 0033 % The code can be modified to accept a function handle for A(x) = A*x 0034 % instead of a matrix A, which is often useful. This would further require 0035 % a function handle At for the transpose of A, such that At(x) = A.'*x. 0036 0037 % This file is part of Manopt and is copyrighted. See the license file. 0038 % 0039 % Main author: Nicolas Boumal, July 5, 2013 0040 % Contributors: 0041 % 0042 % Change log: 0043 % 0044 % Aug. 23, 2021 (XJ): 0045 % Added AD to compute the egrad and the ehess 0046 0047 % Generate some random data to test the function if none is given. 0048 if ~exist('A', 'var') || isempty(A) 0049 A = randn(42, 60); 0050 end 0051 if ~exist('p', 'var') || isempty(p) 0052 p = 5; 0053 end 0054 0055 % Retrieve the size of the problem and make sure the requested 0056 % approximation rank is at most the maximum possible rank. 0057 [m, n] = size(A); 0058 assert(p <= min(m, n), 'p must be smaller than the smallest dimension of A.'); 0059 0060 % Define the cost and its derivatives on the Grassmann manifold 0061 tuple.U = grassmannfactory(m, p); 0062 tuple.V = grassmannfactory(n, p); 0063 % All of the code will work just as well if we ignore the invariance 0064 % property of the cost function indicated above and thus place U and V 0065 % on the Stiefel manifold (orthonormal matrices) instead of the 0066 % Grassmann manifold. Working on Stiefel is expected to be slower 0067 % though, partly because de search space is higher dimensional and 0068 % partly because the optimizers are not isolated. 0069 % tuple.U = stiefelfactory(m, p); 0070 % tuple.V = stiefelfactory(n, p); 0071 M = productmanifold(tuple); 0072 0073 % Define a problem structure, specifying the manifold M, the cost 0074 % function and its derivatives. Here, to demonstrate the rapid 0075 % prototyping capabilities of Manopt, we directly define the Euclidean 0076 % gradient and the Euclidean Hessian egrad and ehess instead of the 0077 % Riemannian gradient and Hessian grad and hess. Manopt will take care 0078 % of the conversion. This automatic conversion is usually not 0079 % computationally optimal though, because much of the computations 0080 % involved in obtaining the gradient could be reused to obtain the 0081 % Hessian. After the prototyping stage, when efficiency becomes 0082 % important, it makes sense to define grad and hess rather than egrad 0083 % an ehess, and to use the caching system (the store structure). 0084 problem.M = M; 0085 problem.cost = @cost; 0086 problem.egrad = @egrad; 0087 problem.ehess = @ehess; 0088 0089 % The functions below make many redundant computations. This 0090 % performance hit can be alleviated by using the caching system. 0091 0092 % Cost function 0093 function f = cost(X) 0094 U = X.U; 0095 V = X.V; 0096 f = -.5*norm(U'*A*V, 'fro')^2; 0097 end 0098 % Euclidean gradient of the cost function 0099 function g = egrad(X) 0100 U = X.U; 0101 V = X.V; 0102 AV = A*V; 0103 AtU = A'*U; 0104 g.U = -AV*(AV'*U); 0105 g.V = -AtU*(AtU'*V); 0106 end 0107 % Euclidean Hessian of the cost function 0108 function h = ehess(X, H) 0109 U = X.U; 0110 V = X.V; 0111 Udot = H.U; 0112 Vdot = H.V; 0113 AV = A*V; 0114 AtU = A'*U; 0115 AVdot = A*Vdot; 0116 AtUdot = A'*Udot; 0117 h.U = -(AVdot*AV'*U + AV*AVdot'*U + AV*AV'*Udot); 0118 h.V = -(AtUdot*AtU'*V + AtU*AtUdot'*V + AtU*AtU'*Vdot); 0119 end 0120 0121 % An alternative way to compute the egrad and the ehess is to use 0122 % automatic differentiation provided in the deep learning toolbox 0123 % (slower). Notice that the function norm is not supported for AD so 0124 % far. Replace norm(...,'fro') with the backup function cnormsqfro 0125 % described in manoptADhelp 0126 % problem.cost = @cost_AD; 0127 % function f = cost_AD(X) 0128 % U = X.U; 0129 % V = X.V; 0130 % f = -.5*cnormsqfro(U'*A*V); 0131 % end 0132 % call manoptAD to prepare AD for the problem structure 0133 % problem = manoptAD(problem); 0134 0135 % Execute some checks on the derivatives for early debugging. 0136 % These things can be commented out of course. 0137 % checkgradient(problem); 0138 % pause; 0139 % checkhessian(problem); 0140 % pause; 0141 0142 % Issue a call to a solver. A random initial guess will be chosen and 0143 % default options are selected. Here, we specify a maximum trust 0144 % region radius (which in turn induces an initial trust region radius). 0145 % Note that this is not required: default values are used if we omit 0146 % this. The diameter of the manifold scales like sqrt(2*p), hence the 0147 % form of our (empirical) choice. 0148 options.Delta_bar = 4*sqrt(2*p); 0149 [X, Xcost, info] = trustregions(problem, [], options); %#ok<ASGLU> 0150 U = X.U; 0151 V = X.V; 0152 0153 % Finish the job by rotating U and V such that the middle matrix S can 0154 % be diagonal with nonnegative, decreasing entries. This requires a 0155 % small svd of size pxp. 0156 Spp = U'*A*V; 0157 [Upp, Spp, Vpp] = svd(Spp); 0158 U = U*Upp; 0159 S = Spp; 0160 V = V*Vpp; 0161 0162 % For our information, Manopt can also compute the spectrum of the 0163 % Riemannian Hessian on the tangent space at (any) X. Computing the 0164 % spectrum at the solution gives us some idea of the conditioning of 0165 % the problem. If we were to implement a preconditioner for the 0166 % Hessian, this would also inform us on its performance. 0167 % 0168 % Notice that if the optimization is performed on a product of Stiefel 0169 % manifolds instead of a product of Grassmannians, the double 0170 % invariance under the orthogonal group O(p) will appear as twice 0171 % p*(p-1)/2, thus p*(p-1) zero eigenvalues in the spectrum of the 0172 % Hessian. This means that the minimizers are not isolated, which 0173 % typically hinders convergence of second order algorithms. 0174 if M.dim() < 512 0175 evs = hessianspectrum(problem, X); 0176 stairs(sort(evs)); 0177 title(['Eigenvalues of the Hessian of the cost function ' ... 0178 'at the solution']); 0179 end 0180 0181 end