Home > examples > sparse_pca.m

# sparse_pca

## PURPOSE Sparse principal component analysis based on optimization over Stiefel.

## SYNOPSIS function [Z, P, X, A] = sparse_pca(A, m, gamma)

## DESCRIPTION ``` Sparse principal component analysis based on optimization over Stiefel.

[Z, P, X] = sparse_pca(A, m, gamma)

We consider sparse PCA applied to a data matrix A of size pxn, where p is
the number of samples (observations) and n is the number of variables
(features). We attempt to extract m different components. The parameter
gamma, which must lie between 0 and the largest 2-norm of a column of
A, tunes the balance between best explanation of the variance of the data
(gamma = 0, mostly corresponds to standard PCA) and best sparsity of the
principal components Z (gamma maximal, Z is zero). The variables
contained in the columns of A are assumed centered (zero-mean).

The output Z of size nxm represents the principal components. There are m
columns, each one of unit norm and capturing a prefered direction of the
data, while trying to be sparse. P has the same size as Z and represents
the sparsity pattern of Z. X is an orthonormal matrix of size pxm
produced internally by the algorithm.

With classical PCA, the variability captured by m components is
sum(svds(A, m))
With the outputted Z, which should be sparser than normal PCA, it is
sum(svd(A*Z))

The method is based on the maximization of a differentiable function over
the Stiefel manifold of dimension pxm. Notice that this dimension is
independent of n, making this method particularly suitable for problems
with many variables but few samples (n much larger than p). The
complexity of each iteration of the algorithm is linear in n as a result.

The theory behind this code is available in the paper
http://jmlr.org/papers/volume11/journee10a/journee10a.pdf
Generalized Power Method for Sparse Principal Component Analysis, by
Journee, Nesterov, Richtarik and Sepulchre, JMLR, 2010.
This implementation is not equivalent to the one described in that paper
(and is independent from their authors) but is close in spirit
nonetheless. It is provided with Manopt as an example file but was not
optimized: please do not judge the quality of the algorithm described by
the authors of the paper based on this implementation.```

## CROSS-REFERENCE INFORMATION This function calls:
• stiefelfactory Returns a manifold structure to optimize over orthonormal matrices.
• trustregions Riemannian trust-regions solver for optimization on manifolds.
This function is called by:

## SOURCE CODE ```0001 function [Z, P, X, A] = sparse_pca(A, m, gamma)
0002 % Sparse principal component analysis based on optimization over Stiefel.
0003 %
0004 % [Z, P, X] = sparse_pca(A, m, gamma)
0005 %
0006 % We consider sparse PCA applied to a data matrix A of size pxn, where p is
0007 % the number of samples (observations) and n is the number of variables
0008 % (features). We attempt to extract m different components. The parameter
0009 % gamma, which must lie between 0 and the largest 2-norm of a column of
0010 % A, tunes the balance between best explanation of the variance of the data
0011 % (gamma = 0, mostly corresponds to standard PCA) and best sparsity of the
0012 % principal components Z (gamma maximal, Z is zero). The variables
0013 % contained in the columns of A are assumed centered (zero-mean).
0014 %
0015 % The output Z of size nxm represents the principal components. There are m
0016 % columns, each one of unit norm and capturing a prefered direction of the
0017 % data, while trying to be sparse. P has the same size as Z and represents
0018 % the sparsity pattern of Z. X is an orthonormal matrix of size pxm
0019 % produced internally by the algorithm.
0020 %
0021 % With classical PCA, the variability captured by m components is
0022 % sum(svds(A, m))
0023 % With the outputted Z, which should be sparser than normal PCA, it is
0024 % sum(svd(A*Z))
0025 %
0026 % The method is based on the maximization of a differentiable function over
0027 % the Stiefel manifold of dimension pxm. Notice that this dimension is
0028 % independent of n, making this method particularly suitable for problems
0029 % with many variables but few samples (n much larger than p). The
0030 % complexity of each iteration of the algorithm is linear in n as a result.
0031 %
0032 % The theory behind this code is available in the paper
0033 % http://jmlr.org/papers/volume11/journee10a/journee10a.pdf
0034 % Generalized Power Method for Sparse Principal Component Analysis, by
0035 % Journee, Nesterov, Richtarik and Sepulchre, JMLR, 2010.
0036 % This implementation is not equivalent to the one described in that paper
0037 % (and is independent from their authors) but is close in spirit
0038 % nonetheless. It is provided with Manopt as an example file but was not
0039 % optimized: please do not judge the quality of the algorithm described by
0040 % the authors of the paper based on this implementation.
0041
0042 % This file is part of Manopt and is copyrighted. See the license file.
0043 %
0044 % Main author: Nicolas Boumal, Dec. 24, 2013
0045 % Contributors:
0046 %
0047 % Change log:
0048 %
0049
0050     % If no input is provided, generate random data for a quick demo
0051     if nargin == 0
0052         n = 100;
0053         p = 10;
0054         m = 2;
0055
0056         % Data matrix
0057         A = randn(p, n);
0058
0059         % Regularization parameter. This should be between 0 and the largest
0060         % 2-norm of a column of A.
0061         gamma = 1;
0062
0063     elseif nargin ~= 3
0064         error('Please provide 3 inputs (or none for a demo).');
0065     end
0066
0067     % Execute the main algorithm: it will compute a sparsity pattern P.
0068     [P, X] = sparse_pca_stiefel_l1(A, m, gamma);
0069
0070     % Compute the principal components in accordance with the sparsity.
0071     Z = postprocess(A, P, X);
0072
0073 end
0074
0075
0076 % Sparse PCA based on the block sparse PCA algorithm with l1-penalty as
0077 % featured in the reference paper by Journee et al. This is not the same
0078 % algorithm but it is the same cost function optimized over the same search
0079 % space. We force N = eye(m).
0080 function [P, X] = sparse_pca_stiefel_l1(A, m, gamma)
0081
0082     [p, n] = size(A); %#ok<NASGU>
0083
0084     % The optimization takes place over a Stiefel manifold whose dimension
0085     % is independent of n. This is especially useful when there are many
0086     % more variables than samples.
0087     St = stiefelfactory(p, m);
0088     problem.M = St;
0089
0090     % In this helper function, given a point 'X' on the manifold we check
0091     % whether the caching structure 'store' has been populated with
0092     % quantities that are useful to compute at X or not. If they were not,
0093     % then we compute and store them now.
0094     function store = prepare(X, store)
0096             store.AtX = A'*X;
0097             store.absAtX = abs(store.AtX);
0098             store.pos = max(0, store.absAtX - gamma);
0100         end
0101     end
0102
0103     % Define the cost function here and set it in the problem structure.
0104     problem.cost = @cost;
0105     function [f, store] = cost(X, store)
0106         store = prepare(X, store);
0107         pos = store.pos;
0108         f = -.5*norm(pos, 'fro')^2;
0109     end
0110
0112     % grad) : Manopt will take care of converting it to the Riemannian
0115     function [G, store] = egrad(X, store)
0116         if ~isfield(store, 'G')
0117             store = prepare(X, store);
0118             pos = store.pos;
0119             AtX = store.AtX;
0120             sgAtX = sign(AtX);
0121             factor = pos.*sgAtX;
0122             store.G = -A*factor;
0123         end
0124         G = store.G;
0125     end
0126
0128     % pause;
0129
0130     % The optimization happens here. To improve the method, it may be
0131     % interesting to investigate better-than-random initial iterates and,
0132     % possibly, to fine tune the parameters of the solver.
0133     X = trustregions(problem);
0134
0135     % Compute the sparsity pattern by thresholding
0136     P = abs(A'*X) > gamma;
0137
0138 end
0139
0140
0141 % This post-processing algorithm produces a matrix Z of size nxm matching
0142 % the sparsity pattern P and representing sparse principal components for
0143 % A. This is to be called with the output of the main algorithm. This
0144 % algorithm is described in the reference paper by Journee et al.
0145 function Z = postprocess(A, P, X)
0146     fprintf('Post-processing... ');
0147     counter = 0;
0148     maxiter = 1000;
0149     tolerance = 1e-8;
0150     while counter < maxiter
0151         Z = A'*X;
0152         Z(~P) = 0;
0153         Z = Z*diag(1./sqrt(diag(Z'*Z)));
0154         X = ufactor(A*Z);
0155         counter = counter + 1;
0156         if counter > 1 && norm(Z0-Z, 'fro') < tolerance*norm(Z0, 'fro')
0157             break;
0158         end
0159         Z0 = Z;
0160     end
0161     fprintf('done, in %d iterations (max = %d).\n', counter, maxiter);
0162 end
0163
0164 % Returns the U-factor of the polar decomposition of X
0165 function U = ufactor(X)
0166     [W, S, V] = svd(X, 0); %#ok<ASGLU>
0167     U = W*V';
0168 end```

Generated on Mon 10-Sep-2018 11:48:06 by m2html © 2005