The oblique manifold $\mathcal{OB}(n, m)$ (the set of matrices of size nxm with unit-norm columns) is endowed with a Riemannian manifold structure by considering it as a Riemannian submanifold of the embedding Euclidean space $\mathbb{R}^{n\times m}$ endowed with the usual inner product $\langle H_1, H_2 \rangle = \operatorname{trace}(H_1^T H_2)$. Its geometry is exactly the same as that of the product manifold of spheres $\mathbb{S}^{n-1}\times \cdots \times \mathbb{S}^{n-1}$ ($m$ copies), see the sphere manifold.
Factory call: M = obliquefactory(n, m)
.
Alternatively, one can call M = obliquefactory(n, m, true)
to represent the same manifold ($m$ spheres in $\mathbb{R}^n$)
using matrices of size mxn
with unit-norm rows. In this case, for $X$ a point on the
manifold, we have $(XX^T)_{ii} = 1$ for $i = 1:m$.
There is also a complex version of this factory: see obliquecomplexfactory
.
Name | Formula | Numerical representation |
Set | $\mathcal{OB}(n, m) = \{ X \in \mathbb{R}^{n\times m} : (X^TX)_{ii} = 1, i = 1:m \}$ | $X$ is represented as a matrix X of size nxm whose columns
have unit 2-norm, i.e., X(:, i).'*X(:, i) = 1
for i = 1:m . |
Tangent space at $X$ | $T_X \mathcal{OB}(n, m) = \{ U \in \mathbb{R}^{n\times m} : (X^TU)_{ii} = 0, i = 1:m \}$ | A tangent vector $U$ at $X$ is represented as a matrix U
of size nxm such
that each column of U is orthogonal to the
corresponding column of X , i.e., X(:,
i).'*U(:, i) = 0 for i = 1:m . |
Ambient space | $\mathbb{R}^{n\times m}$ | Points and vectors in the ambient space are, naturally, represented as matrices of size nxm. |
The following table shows some of the nontrivial available
functions in the structure M
. The norm $\|\cdot\|$
refers to the norm in the ambient space, which is the Frobenius
norm. The tutorial page
gives more details about the functionality implemented by each
function.
Name | Field usage | Formula |
Dimension | M.dim() |
$\operatorname{dim}\mathcal{M} = m(n-1)$ |
Metric | M.inner(X, U, V) |
$\langle U, V\rangle_X = \operatorname{trace}(U^T V)$ |
Norm | M.norm(X, U) |
$\|U\|_X = \sqrt{\langle U, U \rangle_X}$ |
Distance | M.dist(X, Y) |
$\operatorname{dist}(X, Y) = \sqrt{\sum_{i=1}^m \arccos^2((X^T Y)_{ii})}$ |
Typical distance | M.typicaldist() |
$\pi\sqrt{m}$ |
Tangent space projector | M.proj(X, H) |
$P_X(H) = H - X\operatorname{ddiag}(X^T H)$, where H
represents a vector in the ambient space and
$\operatorname{ddiag}$ sets all off-diagonal entries of a
matrix to zero. |
Euclidean to Riemannian gradient | M.egrad2rgrad(X, egrad) |
$\operatorname{grad} f(X) = P_X(\nabla f(X))$, where egrad
represents the Euclidean gradient $\nabla f(X)$, which is a
vector in the ambient space. |
Euclidean to Riemannian Hessian | M.ehess2rhess(X, egrad, ehess, U) |
$\operatorname{Hess} f(X)[U] = P_X(\nabla^2 f(X)[U]) - U
\operatorname{ddiag}(X^T \nabla f(X))$, where egrad
represents the Euclidean gradient $\nabla f(X)$ and ehess
represents the Euclidean Hessian $\nabla^2 f(X)[U]$, both
being vectors in the ambient space. |
Exponential map | M.exp(X, U, t) |
See the sphere manifold: the same exponential map is applied column-wise. |
Retraction | M.retr(X, U, t) |
$\operatorname{Retr}_X(tU) = \operatorname{normalize}(X+tU)$, where $\operatorname{normalize}$ scales each column of the input matrix to have norm 1. |
Logarithmic map | M.log(X, Y) |
See the sphere manifold: the same logarithmic map is applied column-wise. |
Random point | M.rand() |
Returns a point uniformly at random w.r.t. the natural measure as follows: generate $X$ with i.i.d. normal entries; return $\operatorname{normalize}(X)$. |
Random vector | M.randvec(X) |
Returns a unit-norm tangent vector at $X$ with uniformly random direction, obtained as follows: generate $H$ with i.i.d. normal entries; return: $U = P_X(H) / \|P_X(H)\|$. |
Vector transport | M.transp(X, Y, U) |
$\operatorname{Transp}_{Y\leftarrow X}(U) = P_Y(U)$, where $U$ is a tangent vector at $X$ that is transported to the tangent space at $Y$. |
Pair mean | M.pairmean(X, Y) |
$\operatorname{mean}(X, Y) = \operatorname{normalize}(X+Y)$ |
Let $A\in\mathbb{R}^{n\times m}$ be any matrix. We search for the matrix with unit-norm columns that is closest to $A$ according to the Frobenius norm. Of course, this problem has an obvious solution (simply normalize the columns of $A$). We treat it merely for the sake of example. We minimize the following cost function:
$$f(X) = \frac{1}{2} \|X-A\|^2,$$
such that $X \in \mathcal{OB}(n, m)$. Compute the Euclidean gradient and Hessian of $f$:
$$\nabla f(X) = X-A,$$
$$\nabla^2 f(X)[U] = U.$$
The Riemannian gradient and Hessian are obtained by applying the
M.egrad2rgrad
and M.ehess2rhess
operators. Notice that there is no need to compute these
explicitly: it suffices to write code for the Euclidean quantities
and to apply the conversion tools on them to obtain the Riemannian
quantities, as in the following code:
% Generate the problem data. n = 5; m = 8; A = randn(n, m); % Create the problem structure. manifold = obliquefactory(n, m); problem.M = manifold; % Define the problem cost function and its derivatives. problem.cost = @(X) .5*norm(X-A, 'fro')^2; egrad = @(X) X-A; ehess = @(X, U) U; problem.grad = @(X) manifold.egrad2rgrad(X, egrad(X)); problem.hess = @(X, U) manifold.ehess2rhess(X, egrad(X), ehess(X, U), U); % Numerically check the differentials. checkgradient(problem); pause; checkhessian(problem); pause;
Of course, this is not as efficient as it could be: there are redundant computations. See the various ways of describing the cost function for better alternatives.
Let us consider a second, more interesting, example. A
correlation matrix $C \in \mathbb{R}^{n\times n}$ is a symmetric,
positive semidefinite matrix with 1's on the diagonal. If $C$ is
of rank $k$, there always exists a matrix $X \in \mathcal{OB}(k,
n)$ such that $C = X^TX$. In fact, there exist many such matrices:
given such an $X$, a whole manifold of equivalent matrices is
obtained by considering $QX$ with $Q$ an orthogonal matrix of size
$k$. Disregarding this equivalence relation (see help
elliptopefactory
), we can address the problem of nearest
low-rank correlation matrix as follows:
Let $A \in \mathbb{R}^{n\times n}$ be a given symmetric matrix. We wish to find the correlation matrix $C = X^TX$ of rank at most $k$ which is closest to $A$, according to the Frobenius norm [Hig01]. That is, we wish to minimize:
$$f(X) = \frac{1}{4} \|X^TX - A\|^2$$
with $X \in \mathcal{OB}(k, n)$.The Euclidean gradient and Hessian are given by:
$$\nabla f(X) = X(X^TX - A),$$
$$\nabla^2 f(X)[U] = X(U^TX + X^TU) + U(X^TX-A).$$
In Manopt code, this yields:
% Generate the problem data. n = 10; k = 3; A = randn(n); A = (A + A.')/2; % Create the problem structure. manifold = obliquefactory(k, n); problem.M = manifold; % Define the problem cost function and its derivatives. problem.cost = @(X) .25*norm(X.'*X-A, 'fro')^2; egrad = @(X) X*(X.'*X-A); ehess = @(X, U) X*(U.'*X+X.'*U) + U*(X.'*X-A); problem.grad = @(X) manifold.egrad2rgrad(X, egrad(X)) problem.hess = @(X, U) manifold.ehess2rhess(X, egrad(X), ehess(X, U), U); % Numerically check the differentials. checkgradient(problem); pause; checkhessian(problem); pause; % Solve X = trustregions(problem); C = X.'*X; % C is a rank k (at most) symmetric, positive semidefinite matrix with ones on the diagonal: disp(C); disp(eig(C));
Again, there is a fair bit of redundant computations in this formulation. See the various ways of describing the cost function for better alternatives.
For theory on Riemannian submanifolds, see [AMS08], section 3.6.1 (first-order derivatives) and section 5.3.3 (second-order derivatives, i.e., connections).
For content specifically about the oblique manifold with applications, see [AG06].
For more info about the nearest correlation matrix problem, see [Hig01].
[AMS08] P.-A. Absil, R. Mahony and R. Sepulchre, Optimization Algorithms
on Matrix Manifolds, Princeton University Press, 2008.
[AG06] P.-A. Absil, K. A. Gallivan, Joint
Diagonalization on the Oblique Manifold for Independent
Component Analysis, 2006.
[Hig01] N.J. Higham, Computing
the nearest correlation matrix: a problem from finance,
2001.
packing_on_the_sphere, maxcut can be implemented with obliquefactory
as well.