Random Matrix Lecture 03

[[lecture-data]]

:LiArrowBigLeftDash: Random Matrix Lecture 02 | Random Matrix Lecture 04 :LiArrowBigRightDash:

Class Notes

pg 12 - 15 (2.2 Johnson-Lindenstrauss Lemma)

Summary

Random matrices for dimensionality reduction
- Johnson-Lindenstrauss transform
First moment method
Some applications

2. Rectangular Matrices

2.2 Dimensionality Reduction and Johnson-Lindenstrauss

From last time, we saw that for a point cloud , as long as

$d$ is not too small with respect to $m$ and
The $y_{i}$ are fixed before drawing $\hat{G}$
Then $\hat{G}$ approximately preserves the geometry of the $y_{i}$

ε

-faithful

Suppose $y_{1}, \dots, y_{n} \in R^{m}$ . A function $f : R^{m} \to R^{d}$ is called $ε$ -faithful on the $y_{i}$ is for all $i, j \in [n]$ we have

(1-\varepsilon)\lvert \lvert y_{i} -y_{j} \rvert \rvert {#2} \leq \lvert \lvert f(y_{i}) - f(y_{j}) \rvert \rvert {#2} \leq (1+\varepsilon) \lvert \lvert y_{i} - y_{j} \rvert \rvert {#2}

see [[epsilon faithful function]]

Theorem (Johnson-Lindenstrauss)

Let $y_{1}, \dots, y_{n} \in R^{m}$ and fix $ε \in (0, 1)$ . Let $\hat{G} \sim N {(0, \frac{1}{d})}^{\otimes d \times m}$ . Suppose that

d \geq 24 \frac{\log n}{ε^{2}}

And denote $(*)$ as the event that multiplication by $\hat{G}$ is pairwise $ε$ - faithful for the $y_{i}$ . Then we have

P [(*)] \geq 1 - \frac{1}{n}

Proof

Note that being $ε$ -faithful pairwise to the $y_{i}$ requires the preservation of the $(\binom{n}{2})$ vector norms within a factor of $1 \pm ε$ .

note we can do this for any $(\binom{n}{2})$ vectors and in particular the $y_{i}$ specified in the statement!

Let $x \in R^{m}$ . Recall that the distribution of $\hat{G} x$ is $N (0, \frac{1}{d} | | x | |^{2} I_{d})$ . Thus we have

\begin{align} \mathbb{P}_{\hat{G} \sim {\cal N\left( 0, \frac{1}{d} \right)}^{\otimes d\times m}} \left[\,\left\lvert \,\lvert \lvert \hat{G}x \rvert \rvert^2 - \lvert \lvert x \rvert \rvert {#2} \, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert {#2} \,\right] &= \mathbb{P}_{g \sim {\cal N}\left( 0, \frac{1}{d} \lvert \lvert x \rvert \rvert {#2} \right)} \left[ \,\left\lvert \,\lvert \lvert g \rvert \rvert {#2}

{ #2}
, \right\rvert > \varepsilon \lvert \lvert x \rvert \rvert
{ #2}
, \right] \

&= \mathbb{P}{g \sim {\cal N}(0, I)} \left[, \left\lvert \frac{,1}{d}\lvert \lvert x \rvert \rvert
{ #2}
\cdot \lvert \lvert g \rvert \rvert

\lvert \lvert x \rvert \rvert
{ #2}
, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert
{ #2}
, \right] \

&= \mathbb{P}_{g \sim {\cal N}(0, I_d)} \left[ ,\left\lvert ,\lvert \lvert g \rvert \rvert

d, \right\rvert > \varepsilon d, \right]

\end{align}$$
So we can apply our [[concentration inequality for magnitude of standard gaussian random vector]]! Since $ε < 1$ , we have $ε d < d$ and thus $min {\frac{(ε d)^{2}}{d}, ε d} = ε^{2} d$ . Thus we have
$\begin{align} \mathbb{P}_{\hat{G} \sim {\cal N}\left( 0, \frac{1}{d} \right)^{\otimes d\times m}} \left[\,\left\lvert \,\lvert \lvert \hat{G}x \rvert \rvert^2 - \lvert \lvert x \rvert \rvert {#2} \, \right\rvert >\varepsilon \lvert \lvert x \rvert \rvert {#2} \,\right] &\leq 2 \exp\left( -\frac{1}{8} \varepsilon^2d \right) \\ \implies \mathbb{P}_{\hat{G} \sim {\cal N}\left( 0, \frac{1}{d} \right)^{\otimes d\times m}} \left[\,(1-\varepsilon) \lvert \lvert x \rvert \rvert {#2} \leq\,\lvert \lvert \hat{G}x \rvert \rvert^2 \leq(1+\varepsilon)\lvert \lvert x \rvert \rvert {#2} \,\right] &\geq 1 - 2\exp\left( -\frac{1}{8}\varepsilon^2d \right) \end{align}$
Now, let $S_{i j}$ be the event that $(1-\varepsilon) \lvert \lvert x \rvert \rvert {#2} \leq\,\lvert \lvert \hat{G}x \rvert \rvert^2 \leq(1+\varepsilon)\lvert \lvert x \rvert \rvert {#2}$ for $x = y_{i} - y_{j}$ . Then by the union bound, we have
$\begin{aligned} P [S_{i j}^{C} for some i, j] & \leq (\binom{n}{2}) \cdot 2 \exp (\frac{1}{8} ε^{2} d) \\ \leq n^{2} \exp (- \frac{1}{8} ε^{2} d) \\ = \frac{1}{n} \exp (3 \log n - \frac{1}{8} ε^{2} \underset{d \geq 24 \frac{\log n}{ε^{2}}}{\underset{⏟}{d}}) \\ ⟹ & \leq \frac{1}{n} \exp (3 \log n - 3 \log n) \\ = \frac{1}{n} \end{aligned}$ $\begin{matrix} ◼ \end{matrix}$

Note

In fact, we can write an expression for the bound on $d$ such that if $d \geq C (k) \frac{\log n}{ε^{2}}$ then we can get a success probability of $1 - \frac{1}{n^{k}}$ .

Note

Johnson-Lindenstrauss only deals with the pairwise distances between the $y_{i}$ and there are other aspects of the geometry that it does not address.

Despite this, it is an intuitive result and is still relevant to numerous applications.

see [[Johnson-Lindenstrauss lemma]]

2.2.1 Simple Extensions

It is easy to see that

(1 - ε) | | y_{i} | | \leq | | f (y_{i}) | | \leq (1 + ε) | | y_{i} | |

with the same probability bound by adding $\vec{0}$ to the collection of $y_{i}$ and increasing $n$ to $n + 1$ .

2.2.2 Lower Bounds

Is the [[Johnson-Lindenstrauss lemma]] result optimal? Can we reduce the output dimension $d$ more?

The original paper showed that dimension $Ω (\log n)$ is required independent of $ε$ .

The following two papers show that this results is tight up to constants:

Kasper Green Larsen and Jelani Nelson. The johnson-lindenstrauss lemma is optimal for linear dimensionality reduction. arXiv preprint arXiv:1411.2404, 2014.
Kasper Green Larsen and Jelani Nelson. Optimality of the johnson-lindenstrauss lemma. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 633–638. IEEE, 2017.

2.2.3 Sparse and Fast Johnson-Lindenstrauss Transforms

How can we speed up the multiplications $\hat{G} (y_{i})$ ?

The main approach to this is to find special matrices $\hat{G}$ that allow for faster matrix-vector multiplication.

Simplest results deal with sparse matrices
Also fast [[Fourier Transform]] via multiplication with the discrete Fourier transform matrix
- multiply then subsample entries

2.2.4 Proof Technique: First Moment Method

The proof of the union bound. This turns out to be a very powerful tool if we can carefully choose our events.

If we want to bound the probability of a "bad" event, we can decompose it into $E = E_{1} \cup E_{2} \cup \dots \cup E_{N}$ and bound it as

P [E] \leq \sum_{i = 1}^{N} P [E_{i}] \leq N \cdot max_{i} {P [E_{i}]}

Often, for a well-chosen set of $E_{i}$ , the $P [E_{i}]$ are about the same. If $N$ is large and $P [E_{i}]$ is very small, we can often see a bound with an exponential scale (like we did in the proof)

This approach is sometimes called the first moment method because we may write

\begin{aligned} P [some E_{i} occurs] & = P [\sum_{i = 1}^{N} 1_{{E_{i}}} \geq 1] \\ \leq E [\sum_{i = 1}^{N} 1_{{E_{i}}}] \\ = \sum_{i = 1}^{N} P [E_{i}] \end{aligned}

which we get from [[Markov's Inequality]]. Here, we compute the expectation or first moment of the random variable $# E = | {i : E_{i} occurs} |$ .

Note

There is also the "second moment method" which involves computing the second moment of $# E$ . This method is usually used to show that some $E_{i}$ does occur with high probability.

Example

If $d$ is too small, then $\hat{G}$ is not pairwise $ε$ -faithful.

see [[first moment method]]

Review

TODO

Add flashcards to random matrix lecture 03 ⏳ 2025-09-25

const { dateTime } = await cJS()

return function View() {
	const file = dc.useCurrentFile();
	return <p class="dv-modified">Created {dateTime.getCreated(file)}     ֍     Last Modified {dateTime.getLastMod(file)}</p>
}