\end{align}$$
So we can apply our [[concentration inequality for magnitude of standard gaussian random vector]]! Since , we have and thus . Thus we have
Now, let be the event that for . Then by the union bound, we have
Note
In fact, we can write an expression for the bound on such that if then we can get a success probability of .
Note
Johnson-Lindenstrauss only deals with the pairwise distances between the and there are other aspects of the geometry that it does not address.
Despite this, it is an intuitive result and is still relevant to numerous applications.
see [[Johnson-Lindenstrauss lemma]]
2.2.1 Simple Extensions
It is easy to see that
with the same probability bound by adding to the collection of and increasing to .
2.2.2 Lower Bounds
Is the [[Johnson-Lindenstrauss lemma]] result optimal? Can we reduce the output dimension more?
The original paper showed that dimension is required independent of .
The following two papers show that this results is tight up to constants:
Kasper Green Larsen and Jelani Nelson. The johnson-lindenstrauss lemma is optimal for linear dimensionality reduction. arXiv preprint arXiv:1411.2404, 2014.
Kasper Green Larsen and Jelani Nelson. Optimality of the johnson-lindenstrauss lemma. In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), pages 633–638. IEEE, 2017.
2.2.3 Sparse and Fast Johnson-Lindenstrauss Transforms
How can we speed up the multiplications ?
The main approach to this is to find special matrices that allow for faster matrix-vector multiplication.
Also fast [[Fourier Transform]] via multiplication with the discrete Fourier transform matrix
multiply then subsample entries
2.2.4 Proof Technique: First Moment Method
The proof of the union bound. This turns out to be a very powerful tool if we can carefully choose our events.
If we want to bound the probability of a "bad" event, we can decompose it into and bound it as
Often, for a well-chosen set of , the are about the same. If is large and is very small, we can often see a bound with an exponential scale (like we did in the proof)
This approach is sometimes called the first moment method because we may write
𝟙𝟙
which we get from [[Markov's Inequality]]. Here, we compute the expectation or first moment of the random variable .
Note
There is also the "second moment method" which involves computing the second moment of . This method is usually used to show that some does occur with high probability.