Random Matrix Lecture 02

[[lecture-data]]

:LiArrowBigLeftDash: Random Matrix Lecture 01 | Random Matrix Lecture 03 :LiArrowBigRightDash:

Class Notes

pg 7 - 12 (1.4 - 2.1)

Summary

  • Proof of concentration of Gaussian random vector norms (see end of [[Random Matrix Lecture 01]] / [[concentration inequality for magnitude of standard gaussian random vector]])
  • how to think about high-dimensional geometry
  • multiplying by a Gaussian matrix
  • Gaussian process viewpoint

1. Random Vector Theory

1.4 Consequences for High-Dimensional Geometry

Last time, we saw the statement and proof for the [[concentration inequality for magnitude of standard gaussian random vector]]. This is one of the fundamental facts of high-dimensional geometry.

This tells us that, with high probability, we have

\begin{align} \lvert \lvert x \rvert \rvert { #2} &= d + {\cal O}(\sqrt{ d }) \\ \implies \lvert \lvert x \rvert \rvert &= \sqrt{ d + {\cal O}(\sqrt{ d }) } \\ &= \sqrt{ d } \cdot \sqrt{ 1+ {\cal O}\left( \frac{1}{\sqrt{ d }} \right)} \\ &= \sqrt{ d }\left( 1+{\cal O}\left( \frac{1}{\sqrt{ d }} \right) \right) \\ &= \sqrt{ d } + {\cal O}(1) \end{align}

ie, the standard [[gaussian random vector]] falls close to the spherical shell of width O(1) around the sphere of radius d.

Important

The standard [[gaussian random vector]] tends towards {as||the surface of the sphere of radius d||"typical set" shape} and falls within an "error region" surrounding {ha||that surface that has a width of O(1)||error region description}

Note

This means a high-dimensional Gaussian has a {1||non-convex} typical set - it is the shape of {2||a hollow spherical shell||shape}!

  • (Sometimes these vectors are called thin-shell vectors in convex geometry and probability)
  • Contrast this with how we normally think of a Guassian high-probability/"typical" set in 1- or 2-D. This is usually some sort of solid blob around the origin.

There are some other high-dimensional behaviors of random variables that can be explained in the same way.

Example

Consider xUniform([1,+1])d. A simple calculation yields

\mathbb{E}[\lvert \lvert x \rvert \rvert { #2} ] = d \mathbb{E}[x_{1}^2] = \frac{d}{3}

And a similar calculation to the concentration inequality in the gaussian case yields, with high probability, that

||x||=d3+O(1)

Which means almost all the mass of this distribution is at the corners of this d-dimensional box instead of the inscribed solid unit ball Bd:={xRd:||x||1}. The concentration inequality in this case also implies that the ball relative to the box has bound

\begin{align} \frac{\text{Vol}(B^d)}{\text{Vol}([-1, +1]^d)} &= \mathbb{P}_{x \sim \text{Unif}([-1, +1]^d)}\,[\lvert \lvert x \rvert \rvert \leq 1] \\ &\leq \mathbb{P}_{x} [\,\lvert \,\,\lvert \lvert x \rvert \rvert { #2-d} \,\rvert \geq d-1] \\ &= \exp(-{\cal O}(d)) \end{align}

ie, the volume of the unit sphere is exponentially smaller than the volume of the box.

Our [[concentration inequality for magnitude of standard gaussian random vector]] also gives us an approximation

Take Away

N(0,Id)Unif(Sd1(d))

Caution

This is an informal approximation!

We get this by noting

  1. When xN(0,Id), we have Law(x||x||)=Unif(Sd1) ( which we get from [[normalized standard gaussian random vectors have the orthogonally invariant distribution on the unit sphere]] )
  2. ||x||d, which follows from the [[concentration inequality for magnitude of standard gaussian random vector]]

This idea is formalized in [[Borel's central limit theorem]]

(see [[gaussian random vectors are approximately uniform on the hollow sphere]])

Theorem

Let xUnif(Sd1). Then x1dN(0,1) in distribution as d.

  • Further, for any fixed k1, we have d(x1,,xk)N(0,Ik) in distribution as d

(without proof, for now)

See [[Borel's central limit theorem]]


2. Rectangular Matrices

We begin by looking at asymmetric, rectangular matrices with iid Gaussian entries.

We are particularly interested in the d×m matrix

GN(0,1)d×m

2.1 Gaussian Random Field Viewpoint

We can use our results about gaussian random vectors to characterize these images.

Proposition

Let GN(0,1)d×m be a [[random matrix]] and yRm. Then

\text{Law}(Gy) = {\cal N}(0, \lvert \lvert y \rvert \rvert { #2} I_{d})
Proof

Gy is linear in G. The entries of G form a [[gaussian random vector]], and so Gy must be Gaussian as well. Thus, it suffices to calculate the mean and covariance to find its law.

Since each entry of G, call them Giα, is iid N(0,1), we get

E(Gy)i=E[α=1mGiαyα]=0

And for the variance

\begin{align} \text{Cov}(\,(Gy)_{i}, (Gy)_{j}\,) &= \mathbb{E}[\,(Gy)_{i}(Gy)_{j}\,] \\ &= \mathbb{E}\left( \left( \sum_{\alpha=1}^m G_{i\alpha} y_{\alpha} \right) \left( \sum_{\beta=1}^m G_{j\beta}y_{\beta} \right) \right) \\ &= \sum_{\alpha,\beta=1}^m y_{\alpha}y_{\beta} \cdot \mathbb{E} [G_{i\alpha} G_{j\beta}] \\ &= \begin{cases} 0 & i \neq j \\ \lvert \lvert y \rvert \rvert { #2} & i=j \end{cases} \\ \implies \text{Cov}(Gy,Gy) &= \lvert \lvert y \rvert \rvert { #2} I_{d} \end{align}

Above, we found the expectation and covariance in terms of the entry-wise products and sums. We can also do the calculation in terms of matrices.

Proof (matrices)

Expectation
If there is only one random matrix involved, we can write

E[Gy]=E[G]y(=0)()

_() In terms of our same setting above. This is a simple result of linearity of expectation.

Covariance
If g1,,gmRd are the columns of G so that each gαN(0,Id) iid, then we see that

\begin{align} \text{Cov}(Gy) &= \mathbb{E}\left[ (Gy)(Gy)^{\intercal} \right] \\ &= \mathbb{E}\left[ \left( \sum_{\alpha=1}^m y_{\alpha} g_{\alpha} \right)\left( \sum_{\beta=1}^m y_{\beta} g_{\beta}^{\intercal} \right) \right] \\ &= \sum_{\alpha,\beta=1}^m y_{\alpha} y_{\beta} \cdot \mathbb{E}\left[ g_{\alpha}g_{\beta}^{\intercal} \right] \\ (*) &= \sum_{\alpha=1}^m y_{\alpha}^2 \cdot I_{d} \\ &= \lvert \lvert y \rvert \rvert { #2} I_{d} \end{align}

Where again () is in terms of the setting above. In this line we use the law of the gα.

see [[gaussian random matrix transforms vectors into gaussian random vectors]]

Using the same techniques, we can easily find the joint law of Gy1,,Gyn for n<.

Proposition

Let y1,,ynRm and GN(0,1)d×m . Then

\text{Law}\left(\begin{bmatrix} Gy_{1} \\ \vdots \\ Gy_{n} \end{bmatrix}\right) = {\cal N}\left(0,\, \begin{bmatrix} \lvert \lvert y_{1} \rvert \rvert { #2} I_{d} & \langle y_{1},y_{2} \rangle I_{d} & \dots & \langle y_{1},y_{n} \rangle I_{d} \\ \langle y_{2} , y_{1} \rangle I_{d} & \lvert \lvert y_{2} \rvert \rvert { #2} I_{d} & \dots & \langle y_{2}, y_{n} \rangle I_{d} \\ \vdots & \vdots & \ddots & \vdots \\ \langle y_{n}, y_{1} \rangle I_{d} & \langle y_{n} , y_{2} \rangle I_{d} & \dots & \lvert \lvert y_{n} \rvert \rvert { #2} I_{d} \end{bmatrix}\right)

Or, if we define Y such that the ith column is yi, then the the covariance matrix is given by

Cov([Gy1Gyn])=YYId
Proof

This follows immediately from [[gaussian random matrix transforms vectors into gaussian random vectors]] by applying the calculation to each block of the covariance matrix. (the expectation is identical)

see [[joint of gaussian random transform of finite vectors]]

This [[joint of gaussian random transform of finite vectors]] is the characterization of a Gaussian random field, which is a term for a high(er)-dimensional Gaussian Process.

Characterization
Proposition

Let y1,,ynRm and GN(0,1)d×m . Then
Law([Gy1Gyn])=N(0,[||y1||2Idy1,y2Idy1,ynIdy2,y1Id||y2||2Idy2,ynIdyn,y1Idyn,y2Id||yn||2Id])
Or, if we define Y such that the ith column is yi, then the the covariance matrix is given by

$\text{Cov}\left(\begin{bmatrix}
Gy_{1} \
\vdots \
Gy_{n}
\end{bmatrix}\right) = Y^{\intercal}Y \otimes I_{d} $

In particular, this characterization associates the [[random vector]] GyRd to each fixed vector yRm.

(see gaussian random field )

Geometric Interpretations of Multiplication

From the characterization, we can better understand what G is doing when it transforms vectors.

Expected Magnitude

\begin{align} \mathbb{E}[\lvert \lvert Gy \rvert \rvert { #2} ] &= \mathrm{Tr}(\text{Cov}(Gy) ) \\ &= d \lvert \lvert y \rvert \rvert { #2}

\end{align}$$

Expected Direction

E[Gy1,Gy2]=Tr(Cov(Gy1,Gy2))=dy1,y2

>>[!exercise]>>Verify>Infact,$G$actsinthiswayontheentire[[Grammatrix]]foranyfinitecollectionofvectors.>[!def]Recall>Let$v1,,vnFk$bea(finite)collectionofvectorsin$Fk$,whichhas[[innerproduct]]$,$.TheGramMatrix$G$ofthe$vi$isgivenby>$$Gij=vi,vj,G=VV

Where V is the matrix with vi as its ith column

We can see then that if GN(0,1)d×m and YRm×n, then

Expected Gram Matrix

E[(GY)(GY)]=dYY

Notes on the Gram Matrix

Important

In this case, it is possible to get from one point cloud to the other via some orthogonal transformation - this is an if and only if!

From the [[joint of gaussian random transform of finite vectors]], we can also consider

G^:=1dG,Law(G^)=N(0,1d)d×m

Then we can see that

\begin{align} \mathbb{E}[\,\lvert \lvert \hat{G}y \rvert \rvert { #2} \,] &= \lvert \lvert y \rvert \rvert { #2} \\ \mathbb{E}[\,\langle \hat{G}y_{1}, \hat{G}y_{2} \rangle \,] &= \langle y_{1}, y_{2} \rangle \\ \mathbb{E}\left[ \,(\hat{G}Y)^{\intercal}(\hat{G}Y)\, \right] &= Y^{\intercal}Y \end{align}

ie, in expectation, G^ preserves the lengths, angles, and Gram matrices. That is, it preserves all the geometry of Rm

see [[normalized gaussian random matrix preserves geometry in expectation]]

Note

Note that this holds for any d, including dm and d=1. This is because we are taking expectations.

If y is selected first before we realize G^, then we hope that ||G^y|| will concentrate about ||y||. Our calculations for the joint show that

\text{Law}(\hat{G}y) = {\cal N}\left( 0, \frac{1}{d}\lvert \lvert y \rvert \rvert { #2} I_{d} \right)

So (with rescaling, obviously), our [[concentration inequality for magnitude of standard gaussian random vector]] holds and we should see the desired behavior.

Caution

Assuming we ensure G^y, the probability of actually preserving geometry depends on d.

Example

If d=1, then G^=g for some gN(0,Id). We still have

\mathbb{E}[\lvert \lvert \hat{G}y_{i} \rvert \rvert { #2} ] = \mathbb{E}[\langle g,y_{i} \rangle { #2} ] =\lvert \lvert y_{i} \rvert \rvert { #2}

But clearly g,yi2||yi||2 cannot hold simultaneously for many yi regardless of if we choose them before realizing G^.

Exercise

Construct some examples of this case

Caution

If dm, then once G^ is realized, we can pick yNull(G^) so that 0=||G^y||||y||.

  • In this case, G^ will not preserve the geometry of arbitrary vectors y.
  • In particular, we can find vectors y that depend on G^ such that the geometry is badly distorted.

Next time:
As long as

  1. d is not too small
  2. The point cloud y1,,yn is fixed before drawing G^
    G^ approximately preserves the geometry of the yi

acting as approximate isometry even for dm

Review

#flashcards/math

const { dateTime } = await cJS()

return function View() {
	const file = dc.useCurrentFile();
	return <p class="dv-modified">Created {dateTime.getCreated(file)}     ֍     Last Modified {dateTime.getLastMod(file)}</p>
}