Random Matrix Lecture 03

[[lecture-data]]

:LiArrowBigLeftDash: Random Matrix Lecture 02 | Random Matrix Lecture 04 :LiArrowBigRightDash:

Class Notes

pg 12 - 15 (2.2 Johnson-Lindenstrauss Lemma)

Summary

  • Random matrices for dimensionality reduction
    • Johnson-Lindenstrauss transform
  • First moment method
  • Some applications

2. Rectangular Matrices

2.2 Dimensionality Reduction and Johnson-Lindenstrauss

From last time, we saw that for a point cloud , as long as

  1. d is not too small with respect to m and
  2. The yi are fixed before drawing G^
    Then G^ approximately preserves the geometry of the yi
ε-faithful

Suppose y1,,ynRm. A function f:RmRd is called ε-faithful on the yi is for all i,j[n] we have

(1-\varepsilon)\lvert \lvert y_{i} -y_{j} \rvert \rvert { #2} \leq \lvert \lvert f(y_{i}) - f(y_{j}) \rvert \rvert { #2} \leq (1+\varepsilon) \lvert \lvert y_{i} - y_{j} \rvert \rvert { #2}

see [[epsilon faithful function]]

Theorem (Johnson-Lindenstrauss)

Let y1,,ynRm and fix ε(0,1). Let G^N(0,1d)d×m. Suppose that

d24lognε2

And denote () as the event that multiplication by G^ is pairwise ε- faithful for the yi. Then we have

P[()]11n
Proof

Note that being ε-faithful pairwise to the yi requires the preservation of the (n2) vector norms within a factor of 1±ε.

note we can do this for any (n2) vectors and in particular the yi specified in the statement!

Let xRm. Recall that the distribution of G^x is N(0,1d||x||2Id). Thus we have

\begin{align} \mathbb{P}_{\hat{G} \sim {\cal N\left( 0, \frac{1}{d} \right)}^{\otimes d\times m}} \left[\,\left\lvert \,\lvert \lvert \hat{G}x \rvert \rvert^2 - \lvert \lvert x \rvert \rvert { #2} \, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert { #2} \,\right] &= \mathbb{P}_{g \sim {\cal N}\left( 0, \frac{1}{d} \lvert \lvert x \rvert \rvert { #2} \right)} \left[ \,\left\lvert \,\lvert \lvert g \rvert \rvert { #2}

{ #2}
, \right\rvert > \varepsilon \lvert \lvert x \rvert \rvert
{ #2}
, \right] \

&= \mathbb{P}{g \sim {\cal N}(0, I)} \left[, \left\lvert \frac{,1}{d}\lvert \lvert x \rvert \rvert
{ #2}
\cdot \lvert \lvert g \rvert \rvert

&= \mathbb{P}_{g \sim {\cal N}(0, I_d)} \left[ ,\left\lvert ,\lvert \lvert g \rvert \rvert

\end{align}$$
So we can apply our [[concentration inequality for magnitude of standard gaussian random vector]]! Since ε<1, we have εd<d and thus min{(εd)2d,εd}=ε2d. Thus we have

\begin{align} \mathbb{P}_{\hat{G} \sim {\cal N}\left( 0, \frac{1}{d} \right)^{\otimes d\times m}} \left[\,\left\lvert \,\lvert \lvert \hat{G}x \rvert \rvert^2 - \lvert \lvert x \rvert \rvert { #2} \, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert { #2} \,\right] &\leq 2 \exp\left( -\frac{1}{8} \varepsilon^2d \right) \\ \implies \mathbb{P}_{\hat{G} \sim {\cal N}\left( 0, \frac{1}{d} \right)^{\otimes d\times m}} \left[\,(1-\varepsilon) \lvert \lvert x \rvert \rvert { #2} \leq\,\lvert \lvert \hat{G}x \rvert \rvert^2 \leq(1+\varepsilon)\lvert \lvert x \rvert \rvert { #2} \,\right] &\geq 1 - 2\exp\left( -\frac{1}{8}\varepsilon^2d \right) \end{align}

Now, let Sij be the event that (1-\varepsilon) \lvert \lvert x \rvert \rvert { #2} \leq\,\lvert \lvert \hat{G}x \rvert \rvert^2 \leq(1+\varepsilon)\lvert \lvert x \rvert \rvert { #2} for x=yiyj. Then by the union bound, we have

P[SijC for some i,j](n2)2exp(18ε2d)n2exp(18ε2d)=1nexp(3logn18ε2dd24lognε2)1nexp(3logn3logn)=1n
Note

In fact, we can write an expression for the bound on d such that if dC(k)lognε2 then we can get a success probability of 11nk.

Note

Johnson-Lindenstrauss only deals with the pairwise distances between the yi and there are other aspects of the geometry that it does not address.

Despite this, it is an intuitive result and is still relevant to numerous applications.

see [[Johnson-Lindenstrauss lemma]]

2.2.1 Simple Extensions

It is easy to see that

(1ε)||yi||||f(yi)||(1+ε)||yi||

with the same probability bound by adding 0 to the collection of yi and increasing n to n+1.

2.2.2 Lower Bounds

Is the [[Johnson-Lindenstrauss lemma]] result optimal? Can we reduce the output dimension d more?

The following two papers show that this results is tight up to constants:

2.2.3 Sparse and Fast Johnson-Lindenstrauss Transforms

How can we speed up the multiplications G^(yi)?

The main approach to this is to find special matrices G^ that allow for faster matrix-vector multiplication.

2.2.4 Proof Technique: First Moment Method

The proof of the union bound. This turns out to be a very powerful tool if we can carefully choose our events.

If we want to bound the probability of a "bad" event, we can decompose it into E=E1E2EN and bound it as

P[E]i=1NP[Ei]Nmaxi{P[Ei]}

Often, for a well-chosen set of Ei, the P[Ei] are about the same. If N is large and P[Ei] is very small, we can often see a bound with an exponential scale (like we did in the proof)

This approach is sometimes called the first moment method because we may write

P[ some Ei occurs ]=P[i=1N1{Ei}1]E[i=1N1{Ei}]=i=1NP[Ei]

which we get from [[Markov's Inequality]]. Here, we compute the expectation or first moment of the random variable #E=|{i:Ei occurs}|.

Note

There is also the "second moment method" which involves computing the second moment of #E. This method is usually used to show that some Ei does occur with high probability.

Example

If d is too small, then G^ is not pairwise ε-faithful.

see [[first moment method]]

Review

TODO

const { dateTime } = await cJS()

return function View() {
	const file = dc.useCurrentFile();
	return <p class="dv-modified">Created {dateTime.getCreated(file)}     ֍     Last Modified {dateTime.getLastMod(file)}</p>
}