Home > examples > sparse_pca.m

# sparse_pca

## PURPOSE Sparse principal component analysis based on optimization over Stiefel.

## SYNOPSIS function [Z, P, X, A] = sparse_pca(A, m, gamma)

## DESCRIPTION ``` Sparse principal component analysis based on optimization over Stiefel.

[Z, P, X] = sparse_pca(A, m, gamma)

We consider sparse PCA applied to a data matrix A of size pxn, where p is
the number of samples (observations) and n is the number of variables
(features). We attempt to extract m different components. The parameter
gamma, which must lie between 0 and the largest 2-norm of a column of
A, tunes the balance between best explanation of the variance of the data
(gamma = 0, mostly corresponds to standard PCA) and best sparsity of the
principal components Z (gamma maximal, Z is zero). The variables
contained in the columns of A are assumed centered (zero-mean).

The output Z of size nxm represents the principal components. There are m
columns, each one of unit norm and capturing a prefered direction of the
data, while trying to be sparse. P has the same size as Z and represents
the sparsity pattern of Z. X is an orthonormal matrix of size pxm
produced internally by the algorithm.

With classical PCA, the variability captured by m components is
sum(svds(A, m))
With the outputted Z, which should be sparser than normal PCA, it is
sum(svd(A*Z))

The method is based on the maximization of a differentiable function over
the Stiefel manifold of dimension pxm. Notice that this dimension is
independent of n, making this method particularly suitable for problems
with many variables but few samples (n much larger than p). The
complexity of each iteration of the algorithm is linear in n as a result.

The theory behind this code is available in the paper
http://jmlr.org/papers/volume11/journee10a/journee10a.pdf
Generalized Power Method for Sparse Principal Component Analysis, by
Journee, Nesterov, Richtarik and Sepulchre, JMLR, 2010.
This implementation is not equivalent to the one described in that paper
(and is independent from their authors) but is close in spirit
nonetheless. It is provided with Manopt as an example file but was not
optimized: please do not judge the quality of the algorithm described by
the authors of the paper based on this implementation.```

## CROSS-REFERENCE INFORMATION This function calls:
• stiefelfactory Returns a manifold structure to optimize over orthonormal matrices.
• norm NORM Norm of a TT/MPS tensor.
• norm NORM Norm of a TT/MPS block-mu tensor.
• trustregions Riemannian trust-regions solver for optimization on manifolds.
This function is called by:

## SOURCE CODE ```0001 function [Z, P, X, A] = sparse_pca(A, m, gamma)
0002 % Sparse principal component analysis based on optimization over Stiefel.
0003 %
0004 % [Z, P, X] = sparse_pca(A, m, gamma)
0005 %
0006 % We consider sparse PCA applied to a data matrix A of size pxn, where p is
0007 % the number of samples (observations) and n is the number of variables
0008 % (features). We attempt to extract m different components. The parameter
0009 % gamma, which must lie between 0 and the largest 2-norm of a column of
0010 % A, tunes the balance between best explanation of the variance of the data
0011 % (gamma = 0, mostly corresponds to standard PCA) and best sparsity of the
0012 % principal components Z (gamma maximal, Z is zero). The variables
0013 % contained in the columns of A are assumed centered (zero-mean).
0014 %
0015 % The output Z of size nxm represents the principal components. There are m
0016 % columns, each one of unit norm and capturing a prefered direction of the
0017 % data, while trying to be sparse. P has the same size as Z and represents
0018 % the sparsity pattern of Z. X is an orthonormal matrix of size pxm
0019 % produced internally by the algorithm.
0020 %
0021 % With classical PCA, the variability captured by m components is
0022 % sum(svds(A, m))
0023 % With the outputted Z, which should be sparser than normal PCA, it is
0024 % sum(svd(A*Z))
0025 %
0026 % The method is based on the maximization of a differentiable function over
0027 % the Stiefel manifold of dimension pxm. Notice that this dimension is
0028 % independent of n, making this method particularly suitable for problems
0029 % with many variables but few samples (n much larger than p). The
0030 % complexity of each iteration of the algorithm is linear in n as a result.
0031 %
0032 % The theory behind this code is available in the paper
0033 % http://jmlr.org/papers/volume11/journee10a/journee10a.pdf
0034 % Generalized Power Method for Sparse Principal Component Analysis, by
0035 % Journee, Nesterov, Richtarik and Sepulchre, JMLR, 2010.
0036 % This implementation is not equivalent to the one described in that paper
0037 % (and is independent from their authors) but is close in spirit
0038 % nonetheless. It is provided with Manopt as an example file but was not
0039 % optimized: please do not judge the quality of the algorithm described by
0040 % the authors of the paper based on this implementation.
0041
0042 % This file is part of Manopt and is copyrighted. See the license file.
0043 %
0044 % Main author: Nicolas Boumal, Dec. 24, 2013
0045 % Contributors:
0046 %
0047 % Change log:
0048 %
0049 %   Aug. 23, 2021 (XJ):
0051
0052     % If no input is provided, generate random data for a quick demo
0053     if nargin == 0
0054         n = 100;
0055         p = 10;
0056         m = 2;
0057
0058         % Data matrix
0059         A = randn(p, n);
0060
0061         % Regularization parameter. This should be between 0 and the largest
0062         % 2-norm of a column of A.
0063         gamma = 1;
0064
0065     elseif nargin ~= 3
0066         error('Please provide 3 inputs (or none for a demo).');
0067     end
0068
0069     % Execute the main algorithm: it will compute a sparsity pattern P.
0070     [P, X] = sparse_pca_stiefel_l1(A, m, gamma);
0071
0072     % Compute the principal components in accordance with the sparsity.
0073     Z = postprocess(A, P, X);
0074
0075 end
0076
0077
0078 % Sparse PCA based on the block sparse PCA algorithm with l1-penalty as
0079 % featured in the reference paper by Journee et al. This is not the same
0080 % algorithm but it is the same cost function optimized over the same search
0081 % space. We force N = eye(m).
0082 function [P, X] = sparse_pca_stiefel_l1(A, m, gamma)
0083
0084     [p, n] = size(A); %#ok<NASGU>
0085
0086     % The optimization takes place over a Stiefel manifold whose dimension
0087     % is independent of n. This is especially useful when there are many
0088     % more variables than samples.
0089     St = stiefelfactory(p, m);
0090     problem.M = St;
0091
0092     % In this helper function, given a point 'X' on the manifold we check
0093     % whether the caching structure 'store' has been populated with
0094     % quantities that are useful to compute at X or not. If they were not,
0095     % then we compute and store them now.
0096     function store = prepare(X, store)
0098             store.AtX = A'*X;
0099             store.absAtX = abs(store.AtX);
0100             store.pos = max(0, store.absAtX - gamma);
0102         end
0103     end
0104
0105     % Define the cost function here and set it in the problem structure.
0106     problem.cost = @cost;
0107     function [f, store] = cost(X, store)
0108         store = prepare(X, store);
0109         pos = store.pos;
0110         f = -.5*norm(pos, 'fro')^2;
0111     end
0112
0114     % grad) : Manopt will take care of converting it to the Riemannian
0117     function [G, store] = egrad(X, store)
0118         if ~isfield(store, 'G')
0119             store = prepare(X, store);
0120             pos = store.pos;
0121             AtX = store.AtX;
0122             sgAtX = sign(AtX);
0123             factor = pos.*sgAtX;
0124             store.G = -A*factor;
0125         end
0126         G = store.G;
0127     end
0128
0129     % An alternative way to compute the egrad is to use automatic
0130     % differentiation provided in the deep learning toolbox (slower).
0131     % Notice that the function norm is not supported for AD so far.
0132     % Replace norm(...,'fro')^2 with cnormsqfro described in the file
0133     % manoptADhelp.m. Notice that the abs function is not differentiable
0136     % function f = cost_AD(X)
0137     %    AtX = A'*X;
0138     %    absAtX = abs(AtX);
0139     %    pos = max(0, absAtX - gamma);
0140     %    f = -.5*cnormsqfro(pos);
0141     % end
0143
0145     % pause;
0146
0147     % The optimization happens here. To improve the method, it may be
0148     % interesting to investigate better-than-random initial iterates and,
0149     % possibly, to fine tune the parameters of the solver.
0150     X = trustregions(problem);
0151
0152     % Compute the sparsity pattern by thresholding
0153     P = abs(A'*X) > gamma;
0154
0155 end
0156
0157
0158 % This post-processing algorithm produces a matrix Z of size nxm matching
0159 % the sparsity pattern P and representing sparse principal components for
0160 % A. This is to be called with the output of the main algorithm. This
0161 % algorithm is described in the reference paper by Journee et al.
0162 function Z = postprocess(A, P, X)
0163     fprintf('Post-processing... ');
0164     counter = 0;
0165     maxiter = 1000;
0166     tolerance = 1e-8;
0167     while counter < maxiter
0168         Z = A'*X;
0169         Z(~P) = 0;
0170         Z = Z*diag(1./sqrt(diag(Z'*Z)));
0171         X = ufactor(A*Z);
0172         counter = counter + 1;
0173         if counter > 1 && norm(Z0-Z, 'fro') < tolerance*norm(Z0, 'fro')
0174             break;
0175         end
0176         Z0 = Z;
0177     end
0178     fprintf('done, in %d iterations (max = %d).\n', counter, maxiter);
0179 end
0180
0181 % Returns the U-factor of the polar decomposition of X
0182 function U = ufactor(X)
0183     [W, S, V] = svd(X, 0); %#ok<ASGLU>
0184     U = W*V';
0185 end```

Generated on Fri 30-Sep-2022 13:18:25 by m2html © 2005