The set $S_+(n, k)$ of rank-$k$ symmetric positive semidefinite matrices $X$ of size $n\times n$ is parametrized as $X = YY^T$ where $Y \in \mathbb{R}^{n\times k}_*$ is a full column rank matrix of size $n\times k$. Notice that for all orthogonal matrices $O$ of size $k$, the transformation $Y \rightarrow YO$ leaves $X$ unchanged. As a result, we may consider all matrices of the form $YO$ as being *equivalent* for our purpose. Quotienting this equivalence relation out results in a quotient space, $\mathbb{R}^{n \times k}_* / {\rm O}(k)$, with ${\rm O}(k)$ the group of orthogonal matrices. A point on the quotient space is (conceptually) an equivalence class $[Y] = \{YO : O \in {\rm O}(k)\}$. Numerically, we represent that equivalence class with (any) matrix $Y$ in $[Y]$.

A function $\phi : S_+(n,k) \to \mathbb{R}: X \mapsto \phi(X) $ induces a function $f: \mathbb{R}^{n \times k}_* \rightarrow \mathbb{R}: Y \mapsto f(Y)$ defined as $f(Y) = \phi(YY^T)$. Of course, $f$ has the appropriate invariance $f(Y) = f(YO)$ for all orthogonal $O$. Thus, $f$ is defined on the total space $\mathbb{R}^{n\times k}_*$ and descends as a well-defined function on the quotient space $\mathbb{R}^{n \times k}_* / {\rm O}(k)$. Endowing the total space with the usual Riemannian structure of a Euclidean space (the inner product $\langle H_1, H_2 \rangle = \operatorname{trace}(H_1^T H_2)$), a Riemannian structure for the quotient space follows.

See the paper [JBAS10] referenced below for details on this geometry, and [AMS08] for general background.

Factory call: `M = symfixedrankYYfactory(n, k)`

.

Name | Formula | Numerical representation |

Set | $S_+(n,k) = \{X \in \mathbb{R}^{n\times n} : X=X^T\succeq 0, {\rm rank}(X) = k\} \\ = \{ YY^T: Y \in \mathbb{R}^{n\times k}, {\rm rank}(Y) = k\}$ | $X$ is represented by means of $Y$ as a matrix `Y` of size $n\times k$. |

Vertical space at $Y$ | $\{ Y\Omega : \Omega \in \mathbb{R}^{k\times k}, \Omega + \Omega^T = 0 \}$ | A vertical vector $U$ at $Y$ is represented as a matrix `U` of size $n\times k$: it is tangent to the equivalence classes. |

Horizontal space at $Y$ | $\{ U \in \mathbb{R}^{n\times k}: U^T Y = Y^T U \}$ | A horizontal vector $U$ at $Y$ is represented as a matrix `U` of size $n\times k$: it is orthogonal to all vertical vectors. |

Ambient space | $\mathbb{R}^{n\times k}$ | Points and vectors in the ambient space are, naturally, represented as matrices of size $n\times k$. |

The following table shows some of the nontrivial available functions in the structure `M`

. The tutorial page gives more details about the functionality implemented by each function.

Name | Field usage | Formula |

Dimension | `M.dim() ` |
$\operatorname{dim}\mathcal{M} = kn - k(k-1)/2$ |

Metric | `M.inner(Y, U, V) ` |
$\langle U, V\rangle_Y = \operatorname{trace}(U^T V)$ |

Norm | `M.norm(Y, U) ` |
$\|U\|_Y = \sqrt{\langle U, U \rangle_Y}$ |

Horizontal space projector | `M.proj(Y, H) ` |
$P_Y(H) = H - Y\Omega$, where `H` represents a vector in the ambient space. $\Omega$ is a skew-symmetric matrix obtained as the unique solution to the Sylvester equation $\Omega (Y^T Y) + (Y^T Y)\Omega = Y^T H - H^T Y $ |

Euclidean to Riemannian gradient | `M.egrad2rgrad(Y, egrad)` |
$\operatorname{grad} f(Y) = \nabla f(Y)$, where `egrad` represents the Euclidean gradient $\nabla f(Y)$, which is a vector in the ambient space. There is indeed nothing to do, since the Euclidean gradient of $f$ is already a horizontal vector thanks to the fact that $f(Y) = f(YO)$ for all orthogonal $O$. |

Euclidean to Riemannian Hessian | `M.ehess2rhess(Y, egrad, ehess, U)` |
$\operatorname{Hess} f(Y)[U] = P_Y(\nabla^2 f(Y)[U])$, where `egrad` represents the Euclidean gradient $\nabla f(Y)$ (which happens to not be necessary in this case) and `ehess` represents the Euclidean Hessian $\nabla^2 f(Y)[U]$, both being vectors in the ambient space. |

Retraction | `M.retr(Y, U, t) ` |
$\operatorname{Retr}_Y(tU) = Y + tU$ |

Random point | `M.rand() ` |
Returns a point at random as follows: generate $Y$ with i.i.d. normal entries; return $Y$. With high probability, $Y$ will indeed be full-rank. |

Random vector | `M.randvec(Y) ` |
Returns a horizontal vector at $Y$ with uniformly random direction, obtained as follows: generate $H$ with i.i.d. normal entries; return $P_Y(H)/\|P_Y(H)\|_Y$. |

Vector transport | `M.transp(Y, Z, U) ` |
$\operatorname{Transp}_{Z\leftarrow Y}(U) = P_Z(U)$, where $U$ is a horizontal vector at $Y$ that is transported to the horizontal space at $Z$. |

This is an example of low-rank matrix approximation. Let $A\in\mathbb{R}^{n\times n}$ be a symmetric matrix. We minimize the following cost function:

$$\phi(X) =\| X - A \|_F ^2,$$

such that $X \in S_+(n, k)$. $X$ is factorized as $YY^T$ with $Y \in \mathbb{R}^{n \times k}_*$. The function $f:\mathbb{R}^{n \times k}_* \rightarrow \mathbb{R}$ is defined as $f(Y) = \| YY^T - A \|_F ^2 $. Compute the Euclidean gradient and Hessian of $f$:

$$\nabla f(Y) = 4(YY^T - A)Y,$$

$$\nabla^2 f(Y)[U] = 4(YU^T + UY^T)Y + 4(YY^T - A)U.$$

The Riemannian gradient and Hessian are obtained by applying the `M.egrad2rgrad`

and `M.ehess2rhess`

operators:

$$\operatorname{grad} f(x) = 4(YY^T - A)Y,$$

$$\operatorname{Hess} f(Y)[U] = \nabla^2 f(Y)[U] - Y\Omega$$ where $\Omega$ is obtained by solving the Sylvester equation, $$ \Omega (Y^T Y) + (Y^T Y )\Omega = Y^T \nabla^2 f(Y)[U] - \nabla^2 f(Y)[U] ^T Y. $$

The Sylvester equation can be solved efficiently using `lyap`

of MATLAB. Notice that there is no need to compute these explicitly: it suffices to write code for the Euclidean quantities and to apply the conversion tools on them to obtain the Riemannian quantities, as in the following code:

% Generate the problem data. n = 1000; k = 5; Y_org = randn(n,k); A = Y_org*Y_org'; % Create the problem structure. manifold = symfixedrankYYfactory(n,k); problem.M = manifold; % Define the problem cost function and its gradient.

problem.cost = @(Y) norm( Y*Y' - A, 'fro')^2;

egrad = @(Y) 4*(Y*Y' - A)*Y;

ehess = @(Y, U) 4*(Y*U' + U*Y')*Y + 4*(Y*Y' - A)*U; problem.grad = @(Y) manifold.egrad2rgrad(Y, egrad(Y));

problem.hess = @(Y, U) manifold.ehess2rhess(Y, egrad(Y), ehess(Y, U), U);

% Numerically check the differentials.

checkgradient(problem); pause;

checkhessian(problem); pause;

% Execute the optimization

Y = trustregions(problem);

[AMS08] P.-A. Absil, R. Mahony and R. Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, 2008.

We are still working on building a collection of examples for Manopt. The relevant ones will be referenced here when the time comes.