Home > manopt > manifolds > stiefel > stiefelgeneralizedfactory.m

stiefelgeneralizedfactory

PURPOSE ^

Returns a manifold structure of "scaled" orthonormal matrices.

SYNOPSIS ^

function M = stiefelgeneralizedfactory(n, p, B)

DESCRIPTION ^

 Returns a manifold structure of "scaled" orthonormal matrices.

 function M = stiefelgeneralizedfactory(n, p)
 function M = stiefelgeneralizedfactory(n, p, B)

 The generalized Stiefel manifold is the set of "scaled" orthonormal 
 nxp matrices X such that X'*B*X is identity. B must be positive definite.
 If B is identity, then this is the standard Stiefel manifold.

 The generalized Stiefel manifold is endowed with a scaled metric
 by making it a Riemannian submanifold of the Euclidean space,
 again endowed with the scaled inner product.

 Some notions (not all) are from Section 4.5 of the paper
 "The geometry of algorithms with orthogonality constraints",
 A. Edelman, T. A. Arias, S. T. Smith, SIMAX, 1998.

 Paper link: http://arxiv.org/abs/physics/9806030.

 Note: egrad2rgrad and ehess2rhess involve solving linear systems in B. If
 this is a bottleneck for a specific application, then a way forward is to
 create a modified version of this file which preprocesses B to speed this
 up (typically, by computing a Cholesky factorization of it, then calling
 an appropriate solver; or by exploiting sparsity.)

 See also: stiefelfactory  grassmannfactory  grassmanngeneralizedfactory

CROSS-REFERENCE INFORMATION ^

This function calls: This function is called by:

SUBFUNCTIONS ^

SOURCE CODE ^

0001 function M = stiefelgeneralizedfactory(n, p, B)
0002 % Returns a manifold structure of "scaled" orthonormal matrices.
0003 %
0004 % function M = stiefelgeneralizedfactory(n, p)
0005 % function M = stiefelgeneralizedfactory(n, p, B)
0006 %
0007 % The generalized Stiefel manifold is the set of "scaled" orthonormal
0008 % nxp matrices X such that X'*B*X is identity. B must be positive definite.
0009 % If B is identity, then this is the standard Stiefel manifold.
0010 %
0011 % The generalized Stiefel manifold is endowed with a scaled metric
0012 % by making it a Riemannian submanifold of the Euclidean space,
0013 % again endowed with the scaled inner product.
0014 %
0015 % Some notions (not all) are from Section 4.5 of the paper
0016 % "The geometry of algorithms with orthogonality constraints",
0017 % A. Edelman, T. A. Arias, S. T. Smith, SIMAX, 1998.
0018 %
0019 % Paper link: http://arxiv.org/abs/physics/9806030.
0020 %
0021 % Note: egrad2rgrad and ehess2rhess involve solving linear systems in B. If
0022 % this is a bottleneck for a specific application, then a way forward is to
0023 % create a modified version of this file which preprocesses B to speed this
0024 % up (typically, by computing a Cholesky factorization of it, then calling
0025 % an appropriate solver; or by exploiting sparsity.)
0026 %
0027 % See also: stiefelfactory  grassmannfactory  grassmanngeneralizedfactory
0028 
0029 % This file is part of Manopt: www.manopt.org.
0030 % Original author: Bamdev Mishra, June 30, 2015.
0031 % Contributors: Hiroyuki Sato and Kensuke Aihara, Dec. 27, 2018.
0032 %
0033 % Change log:
0034 %
0035 %    Sep.  6, 2018 (NB):
0036 %        Removed M.exp() as it was not implemented.
0037 %
0038 %    Dec. 27, 2018 (NB):
0039 %        Added retraction M.retr_qr (based on QR factorization) as an
0040 %        alternative to the previous retraction, which is now accessible as
0041 %        M.retr_polar. The new retraction should be more efficient: it is
0042 %        now the default, as M.retr = M.retr_qr. This new retraction is a
0043 %        contribution of H. Sato and K. Aihara, as per their paper:
0044 %        'Cholesky QR-based retraction on the generalized Stiefel manifold'
0045 %        https://link.springer.com/article/10.1007%2Fs10589-018-0046-7
0046 
0047     
0048     if ~exist('B', 'var') || isempty(B)
0049         B = speye(n); % Standard Stiefel manifold.
0050     end
0051     
0052     M.name = @() sprintf('Generalized Stiefel manifold St(%d, %d)', n, p);
0053     
0054     M.dim = @() (n*p - .5*p*(p+1));
0055     
0056     M.inner = @(X, eta, zeta) trace(eta'*(B*zeta)); % Scaled metric.
0057     
0058     M.norm = @(X, eta) sqrt(M.inner(X, eta, eta));
0059     
0060     M.dist = @(X, Y) error('stiefelgeneralizedfactory.dist not implemented yet.');
0061     
0062     M.typicaldist = @() sqrt(p);
0063     
0064     % Orthogonal projection of an ambient vector U to the tangent space
0065     % at X.
0066     M.proj = @projection;
0067     function Up = projection(X, U)
0068         BX = B*X;
0069         
0070         % Projection onto the tangent space
0071         Up = U - X*symm(BX'*U);  
0072     end
0073     
0074     M.tangent = M.proj;
0075     
0076     M.egrad2rgrad = @egrad2rgrad;
0077     function rgrad = egrad2rgrad(X, egrad)
0078         
0079         % First, scale egrad according the to the scaled metric in the
0080         % Euclidean space. Ideally, B should be preprocessed to ease
0081         % solving linear systems, e.g., via Cholesky factorization.
0082         egrad_scaled = B\egrad;
0083         
0084         % Second, project onto the tangent space.
0085         % rgrad = egrad_scaled - X*symm((B*X)'*egrad_scaled);
0086         %
0087         % Verify that symm(BX'*egrad_scaled) = symm(X'*egrad).
0088         
0089         rgrad = egrad_scaled - X*symm(X'*egrad);
0090     end
0091     
0092     
0093     
0094     M.ehess2rhess = @ehess2rhess;
0095     function rhess = ehess2rhess(X, egrad, ehess, H)
0096         egraddot = ehess;
0097         Xdot = H;
0098         
0099         % Directional derivative of the Riemannian gradient.
0100         egrad_scaleddot = B\egraddot;
0101         rgraddot = egrad_scaleddot - Xdot*symm(X'*egrad) ...
0102                                    - X*symm(Xdot'*egrad) ...
0103                                    - X*symm(X'*egraddot);
0104         
0105         % Project onto the tangent space.
0106         rhess = M.proj(X, rgraddot);
0107     end
0108     
0109     
0110     M.retr_polar = @retraction_polar;
0111     function Y = retraction_polar(X, U, t)
0112         if nargin < 3
0113             t = 1.0;
0114         end
0115         Y = guf(X + t*U); % Ensures Y'*B*Y is identity.
0116     end
0117 
0118     M.retr_qr = @retraction_qr;
0119     function Y = retraction_qr(X, U, t)
0120         if nargin < 3
0121             t = 1.0;
0122         end
0123         Y = gqf(X + t*U); % Ensures Y'*B*Y is identity.
0124     end
0125 
0126     % By default, we use the QR retraction
0127     M.retr = M.retr_qr;
0128 
0129     M.hash = @(X) ['z' hashmd5(X(:))];
0130     
0131     M.rand = @random;
0132     function X = random()
0133         X = guf(randn(n, p)); % Ensures X'*B*X is identity;
0134         % TODO: test if this can be replaced by gqf.
0135     end
0136     
0137     M.randvec = @randomvec;
0138     function U = randomvec(X)
0139         U = projection(X, randn(n, p));
0140         U = U / norm(U(:));
0141     end
0142     
0143     M.lincomb = @matrixlincomb;
0144     
0145     M.zerovec = @(X) zeros(n, p);
0146     
0147     % This transport is compatible with the generalized polar retraction.
0148     M.transp = @(X1, X2, d) projection(X2, d);
0149     
0150     M.vec = @(X, u_mat) u_mat(:);
0151     M.mat = @(X, u_vec) reshape(u_vec, [n, p]);
0152     M.vecmatareisometries = @() false;
0153     
0154     % Some auxiliary functions
0155     symm = @(D) (D + D')/2;
0156     
0157 
0158     function X = gqf(Y)
0159         % Generalized QR decomposition of an n-by-p matrix Y.
0160         % X'*B*X is identity. See Sato and Aihara 2018,
0161         % Cholesky QR-based retraction on the generalized Stiefel manifold,
0162         % https://link.springer.com/article/10.1007%2Fs10589-018-0046-7
0163         %
0164         % Comment by Nicolas Boumal, Dec. 27, 2018, following discussions
0165         % with Hiroyuki Sato, Kensuke Aihara and Bamdev Mishra:
0166         %
0167         % In principle, to orthonormalize the columns of Y against the
0168         % inner product defined by B, it would be better to implement (for
0169         % example) a modified Gram-Schmidt algorithm, possibly run twice.
0170         % Indeed, the latter would be affected by the condition number of
0171         % sqrtm(B)*Y. In contrast, the method below first squares the data,
0172         % hence is affected by the condition number of Y'*B*Y, which is the
0173         % square of that of sqrtm(B)*Y. Fortunately, it is easily shown
0174         % that, when Y = X + U for a tangent vector U at X, the condition
0175         % number of sqrtm(B)*Y is upper bounded by the square root of
0176         % 1 + ||U||_X^2. In Riemannian optimization, it is seldom necessary
0177         % to retract very large vectors, hence it is expected that the
0178         % condition numbers encountered in practice will be reasonable. As
0179         % a result, the implementation below which calls directly upon
0180         % low-level routines (Cholesky and a small triangular system solve)
0181         % is expected to run faster in practice than a home-made
0182         % implementation of modified Gram-Schmidt, which would involve a
0183         % loop, and still be sufficiently accurate.
0184         R = chol(symm(Y'*(B*Y)));
0185         X = Y / R;
0186     end
0187 
0188     function X = guf(Y)
0189         % Generalized polar decomposition of an n-by-p matrix Y.
0190         % X'*B*X is identity.
0191         
0192         % Method 1
0193         [u, ~, v] = svd(Y, 0);
0194   
0195         % Instead of the following three steps, an equivalent, but an
0196         % expensive way is to do X = u*(sqrtm(u'*(B*u))\(v')).
0197         [q, ssquare] = eig(u'*(B*u));
0198         qsinv = q/sparse(diag(sqrt(diag(ssquare))));
0199         X = u*((qsinv*q')*v'); % X'*B*X is identity.
0200         
0201         
0202         % Another computation using restricted_svd
0203         % [u, ~, v] = restricted_svd(Y);
0204         % X = u*v'; % X'*B*X is identity.
0205         
0206     end
0207     
0208     function [u, s, v] = restricted_svd(Y) %#ok<DEFNU>
0209         % We compute a thin svd-like decomposition of an n-by-p matrix Y
0210         % into matrices u, s, and v such that u is an n-by-p matrix
0211         % with u'*B*u being identity, s is a p-by-p diagonal matrix
0212         % with positive entries, and v is a p-by-p orthogonal matrix.
0213         % Y = u*s*v'.
0214         [v, ssquare] = eig(symm(Y'*(B*Y))); % Y*B*Y is positive definite
0215         ssquarevec = diag(ssquare);
0216         
0217         s = sparse(diag(abs(sqrt(ssquarevec))));
0218         u = Y*(v/s); % u'*B*u is identity.
0219     end
0220 
0221 end

Generated on Sun 05-Sep-2021 17:57:00 by m2html © 2005