2025-02-18 equivariant lecture 3

[[lecture-data]]

2025-02-18

Readings

Summary

Today

n. Subject

Invariant functions on sets
Zaheer et al 2017
Goals:

  1. Define a class of functions that can deal with sets/multisets as inputs
  2. Translate this class of functions into an efficient neural network
  3. Show that it works well (state-of-the-art) for several real-world applications

Sets/multisets {x1,,xn}={xσ(1),,xσ(n)} for all permutations σ:[n][n]
+++ see notes
We can think of these as Rn/Sn, or unordered tuples from the reals

example
(1,1,2)=(1,2,1)(1,2)

Motivation

prior implementations of invariant functions on these sets
f:RnR,f(x1,,xn)=f(xσ(1),xσ(n))

Invariant polynomials

e1(x1,,xn)=i=1nxi,e2=1ijnxixj,,en=x1x2xn,en+1=0
Equivalently e1=ixi,e2=ixi2,e3=ixi3,en=ixin is also a basis/generator for my symmetric polynomials
+++++ see notes

This requires n polynomials of degree n.

Another way to describe them

Canonicalization

Given (x1,,xn), find σSn that sorts the entries such that
xσ(1)xσ(2)xσ(n)
A function h is invariant then h is a function of the sorted entries

choose a "representative" or "template" of my tuple

Deep Sets Paper

Input X={x1,,xm},xiX where X is countable. Consider space of functions f:2XR
f is permutation invariant if and only if f can be decomposed as

f(x)=ρ(xXϕ(x))

for suitable ϕ:XR and ρ:RR

(can think of this as a layer in a neural network)

Wait whoa

Previous polynomial representation, we needed n features per element.

Note

f(x)=ρ(i=1Mϕ(x))

in 2nd lecture

Here, G=Sm, permutations in M objects. So |G|=M! which is a huge set

But Deep Seets tells us that we only need to sum over M objects instead of M!

Proof (sketch)
() Suppose we have a function f(x) such that

f(x)=ρ(i=1Mϕ(x))

f is permutation invariant to permutation π:

f(π(x))=ρi=1Mϕ(xπ(i))=ρ(i=1Mϕ(xi))

() (not the cleanest version, but first paper showing this)

  1. use ϕ to encode each possible input into a real number
    • injective over the whole set of sets
  2. learn ρ:RR that executes that function on the set

Since X is countable, there exists a bijection c:XN. We can encode each subset of X (ie each element of 2X)

Let YX. We can encode Y as a binary sequence 0.a1a2a3aj
such that aj=1xiY

There are also better theoretical proofs of the same thing Tabaghi and Wang 2024
f:Rk×nRs

If this function is Sn invariant (permutation invariant to the elements in the set), then there exists some
ϕ:RkR and ρ:RRs such that

f(V)=ρ(vicols(V)ϕ(vi))

and can be chosen such that 2kn

architecture (++++++see notes)

point clouds

invariant functions on point clouds

motivation:

n points in d dimensions

f:Rd×nR,E(d) invariant and vf(v) permutation invariant

translations

permutations
Want O(d) and Sn

want functions that are invariant to permutations of rows and columns

Why invariant theory?

  1. construct the graph with O(n2) complexity, we'll be able to learn invariant functions with O(nd) complexity
  2. ++++ see notes
  3. Galois theory - interesting mathematical properties

Galois Theory (setup)
Sn is acting by conjugation (permuting rows and columns)

the Sn above is a subgroup ofSn×S(n2) (which are off-diag to off-diag and diag to diag independently)
2024-02-18-equivariant.png

++++see notes
f is Sn invariant then if f is Sn×S(n2) invariant also
fd,fo characterize ++++ notes

Galois theory approach

2024-02-18-equivariant-1.png
IdSnSn×S(n2)

++see notes

R(fd,fo)=polynomials generated by (fd,fo)polynomials generated by (fd,fo)
fd={xii,xii2,}
fo={ijxij,ijxij2,}

that is the field generated by fd and fo and the sets are field generators for the space.
And note that R(fd,fo,f) is algebraically independent from R(fd,fo).

(galois theory is not a necessary argument, uses easier argument - construct set of invariants that are only fixed by desired group, then can separate orbits almost everywhere)

using Deep Sets
universally approximate invariant functions on symmetric function sets outside an invariant, closed, zero-measure "bad set" where orbits are not separable

From O(n2) to O(nd) invariant features
V=[v1vn]Rd×n

VC=[]

Invariant machine learning model with O(nd) features :
Vh(deepset(CTV),CTC) and this is Sn invariant

Theorem (Huang Blum-Smith 2024 etc)