2025-02-19 graphs lecture 9

[[lecture-data]]

2025-02-19

Summary

Last Time

Today

  • G(y)ATs
  • Expressivity in graph-level tasks
  • Graph isomorphism problem

1. Graph Signals and Graph Signal Processing

Graph Attention (GAT)

(x)i=σ(jN(i)αij(x1)jH)
Where the αij are the learned graph attention coefficients, computed as

αij=softmaxN(i)(eij),eij=ρ(a(xiH||xjH))jN(i)ρ(a(xiH||xjH))
Here, we are learning the graph weight matrix E via learning H. We can think of this as calculating the "relative similarity" between transformed features of i and j vs the similarity of all neighbors of i.

  • (||) is the row-wise concatenation operation.
    • so aR1×2d
    • and each xiHRd
  • ρ is a pointwise nonlinearity, typically leaky ReLU

The learnable parameters for this model are a,H,H for 1L.

Notes

  • This architecture is local because it is still respecting the graph sparsity pattern, and learning the "similarities" of the transformed features at each layer of the network.
  • This architecture does not depend on the size of the graph, only the dimension of the features
  • It can still be expensive to compute however, if the graph is dense. We need to compute |E| coefficients αij, which can be up to n2 in complete graphs.

This additional flexibility increases the capacity of the architecture (less likely to underfit to training data), and this has ben observed empirically. But this comes at the cost of a more expensive forward pass.

This architecture is available and implemented in PyTorch Geometric

Note

We can think of the GAT layer (x)i=σ(jN(i)αij(x1)jH) as a convolutional layer where the GSO S, is learned

  • however, this architecture is not convolutional since S is changing at each step.
  • But it can be useful to see how we can write/think about it as something that looks convolutional (if you squint)
Leaky ReLU

The leaky ReLU is a pointwise nonlinearity that is given as:

ρ(x)={x,x0αx,x<0

(see leaky ReLU)

the "readout" in graph-level tasks

Idea

In graph-level problems, the output does not need to be a graph signal - it could be a scalar, a class label, or some vector in Rd. That is, yR,y{0,1,,C},yRd are all possible.

Question

How do we map GNN layers to such outputs?

Answer

We can use "readout" layers

Readout Layer

A readout layer is an additional layer added to a GNN to achieve the desired output type/dimension for graph-level tasks or other learning tasks on graphs that require an output that is not a graph signal.

(see readout layer)

Example

In node-level tasks (ex source localization, community detection, citation networks, etc), both the input and the output are graph signals xRn×d0,yRn×dL. Thus, the map ΦH is composed strictly of GNN layers.

GNN layer equation

X=σ(U)=σ(k=0K1SkX1H,k)

When we learn our GNN layers, we fix S and learn only H,kR which does not depend on the graph size.

Option 1

fully connected readout layer

In a fully connected readout layer we define

y=ρ(C.vec(x))

Where

  • CRd×ndL
  • vec(x)RndL
    • in general vec() vectorizes Rm×n matrices into Rmn vectors
  • ρ can be the identity or some other pointwise nonlinearity (ReLU, softmax, etc)

(see fully connected readout layer)

Note

There are some downsides of a fully connected readout layer

  • The number of parameters depends on n - adding ndLd learning parameters, which grows with the graph size. This is not amenable to groups of large graphs
  • No longer permutation invariant because of the vec() operation
Exercise

Verify that fully connected readout layers are no longer permutation invariant

  • No longer transferrable across graphs.
    • unlike in GNNs, C depends on n. So if the number of nodes n changes, we have to relearn C

These make this a not-so-attractive option, so we usually use an aggregation readout layer

Aggregation Readout Layer

The aggregation readout layer is a readout layer given by

y=aggr(xLC)

where CRdL×d and aggr:Rn×dR1×d is node-level aggregation
2025-02-19-graph.png
Typical aggregation functions are the mean, sum, max, min, median, etc

Note

  • this is now independent of n (my aggregation functions don't change based on the size of my graph)
  • permutation invariant since the aggregation functions are also permutation invariant
  • this remains transferrable across graphs, unlike the fully connected readout layer

graph isomorphism problem

Graph Isomorphism

Let G and G be two graphs. A graph isomorphism between G and G is a bijection M:V(G)V(G) such that for all i,jV(G)

(i,j)E(G)(M(i),M(j))E(G)

(see graph isomorphism)

A graph isomorphism exists exactly when two graphs are equivalent.

Question

Can we train a GNN to detect if graphs are isomorphic?

  • ie, to produce identical outputs for graphs in the same equivalence class/graphs which are isomorphic, and different outputs for graphs that are not isomorphic?

Consider two graphs G and G with laplacian L=VΛVT and L=VΛVT. Assume they do not have node features (but we can impute them)

y=k=0K1hkLkx()

Suppose there is some λj such that λjλi for all i - ie, L has at least one eigenvalue that is different from L

We can verify whether graphs without node features and different laplacian eigenvalues are not isomorphic

Theorem

Consider two graphs G and G without node features, with laplacian L=VΛVT and L=VΛVT. Further suppose that there is some λj such that λjλi for all i - ie, L has at least one eigenvalue that is different from L.

Then there exists a graph convolution we can use to verify whether two graphs are NOT isomorphic, ie to test that GG

Proof

Consider the following single-layer (linear) GNN

y=k=0K1hkLkx()

Define the graph feature x to be white, ie such that E[x^i]=1i=1,,n.

Set hk={1 if k=10otherwise

Then we have

E(y^)=E(VTy)=E(VThkLx)=E(VTVΛVTx)=E(Λx)E(y^)=[λ1λ2λn]

And if there is some λj such that λjλi for all i - ie, L has at least one eigenvalue that is different from L as above, then it suffices to compare

E(1Ty^)=iλi=Tr(L)to E(1Ty^)=iλi=Tr(L)

Since the laplacian is positive semidefinite, the sums will be different if and only if we have at least eigenvalue that is different between them.

(see paper GNNs are more powerful than you think)

However, there are many non-isomorphic graphs that share the same eigenvalues

Example

2025-02-19-graph-1.png
These graphs are clearly non-isomorphic, but their laplacian eigenvalues are the same!

Thus, a strong alternative is the

So our eigenvalue-based heuristic doesn't work for some graphs, but a simple heuristic that does work is counting degrees. Then the problem amounts to comparing the multisets of node degree counts.

(continuing the example from above)
G1 has degrees {1,3,3,2,2,3} and G2 has degrees {2,2,2,4,2,2}

The Weisfeiler-Leman graph isomorphism test consists of running the color refinement algorithm for two graphs.

Motivation

Since the eigenvalue-based heuristic doesn't work for some graphs, a simple heuristic that does work is counting degrees. Then the problem amounts to comparing the multisets of node degree counts.

The color refinement algorithm is an algorithm to assign colorings to nodes.

Algorithm

Given graph G=(V,E) with node features x

let ci(0)=f(,{xi})iV (assign colors based on node feature values)
let t=0

while True:

  1. let ci(t)=f(ci(t1),{{cj(t1),jN(i)}})iV
  • assign colors based on previously assigned colors
  1. if ci(t)==ci(t1)i, break. else, t=t+1

Here, f is a hash function that assigns colors.

Example

2025-02-19-graph-2.png

Color refinement for G1

No node features, so assume all 1.
Assign colors based on node features c1(0)=c2(0)=c3(0)=f(,1)=k1

step t=1:

c1(1)=f(k1,{k1})=k1c3(1)=f(k1,{k1})=k1c2(1)=f(k1,{k1,k1})=k2

Evaluate: did the colors change?

  • Yes: continue!

step t=2:

c1(2)=f(k1,{k1})=k1c3(2)=f(k1,{k1})=k1c2(2)=f(k2,{k1,k1})=k2

Evaluate: did the colors change?

  • No - so we are done!
  • This is the final coloring.