2025-02-19 graphs lecture 9

[[lecture-data]]

2025-02-19

Summary

Last Time

Today

G(y)ATs
Expressivity in graph-level tasks
Graph isomorphism problem

1. Graph Signals and Graph Signal Processing

Graph Attention (GAT)

$(x_{ℓ})_{i} = σ (\sum_{j \in N (i)} α_{i j} (x_{ℓ - 1})_{j} H_{ℓ})$
Where the $α_{i j}$ are the learned graph attention coefficients, computed as

$α_{i j} = {softmax}_{N (i)} (e_{i j}), e_{i j} = \frac{ρ (\vec{a} (x_{i} H^{'} | | x_{j} H^{'}))}{\sum_{j^{'} \in N (i)} ρ (\vec{a} (x_{i} H^{'} | | x_{j^{'}} H^{'}))}$
Here, we are learning the graph weight matrix $E$ via learning $H^{'}$ . We can think of this as calculating the "relative similarity" between transformed features of $i$ and $j$ vs the similarity of all neighbors of $i$ .

$(\cdot | | \cdot)$ is the row-wise concatenation operation.
- so $\vec{a} \in R^{1 \times 2 d}$
- and each $x_{i} H^{'} \in R^{d}$
$ρ$ is a pointwise nonlinearity, typically leaky ReLU

The learnable parameters for this model are $\vec{a}, H_{ℓ}, H^{'}$ for $1 \leq ℓ \leq L$ .

Notes

This architecture is local because it is still respecting the graph sparsity pattern, and learning the "similarities" of the transformed features at each layer of the network.
This architecture does not depend on the size of the graph, only the dimension of the features
It can still be expensive to compute however, if the graph is dense. We need to compute $| E |$ coefficients $α_{i j}$ , which can be up to $n^{2}$ in complete graphs.

This additional flexibility increases the capacity of the architecture (less likely to underfit to training data), and this has ben observed empirically. But this comes at the cost of a more expensive forward pass.

This architecture is available and implemented in PyTorch Geometric

Note

We can think of the GAT layer $(x_{ℓ})_{i} = σ (\sum_{j \in N (i)} α_{i j} (x_{ℓ - 1})_{j} H_{ℓ})$ as a convolutional layer where the GSO $S$ , is learned

however, this architecture is not convolutional since $S$ is changing at each step.
But it can be useful to see how we can write/think about it as something that looks convolutional (if you squint)

Leaky ReLU

The leaky ReLU is a pointwise nonlinearity that is given as:

ρ (x) = {\begin{cases} x, x \geq 0 \\ α x, x < 0 \end{cases}

(see leaky ReLU)

the "readout" in graph-level tasks

Idea

In graph-level problems, the output does not need to be a graph signal - it could be a scalar, a class label, or some vector in $R^{d}$ . That is, $y \in R, y \in {0, 1, \dots, C}, y \in R^{d}$ are all possible.

Question

How do we map GNN layers to such outputs?

Answer

We can use "readout" layers

Readout Layer

A readout layer is an additional layer added to a GNN to achieve the desired output type/dimension for graph-level tasks or other learning tasks on graphs that require an output that is not a graph signal.

(see readout layer)

Example

In node-level tasks (ex source localization, community detection, citation networks, etc), both the input and the output are graph signals $x \in R^{n \times d_{0}}, y \in R^{n \times d_{L}}$ . Thus, the map $Φ_{H}$ is composed strictly of GNN layers.

Benefit: convolutional graph filters are local. This ensures a parameterization that is independent of graph size $n$

GNN layer equation

$X_{ℓ} = σ (U_{ℓ}) = σ (\sum_{k = 0}^{K - 1} S^{k} X_{ℓ - 1} H_{ℓ, k})$

When we learn our GNN layers, we fix $S$ and learn only $H_{ℓ, k} \in R^{}$ which does not depend on the graph size.

Option 1

fully connected readout layer

In a fully connected readout layer we define

y = ρ (C . vec (x_{ℓ}))

Where

$C \in R^{d \times n d_{L}}$
$vec (x_{ℓ}) \in R^{n d_{L}}$
- in general $vec (\cdot)$ vectorizes $R^{m \times n}$ matrices into $R^{m n}$ vectors
$ρ$ can be the identity or some other pointwise nonlinearity (ReLU, softmax, etc)

(see fully connected readout layer)

Note

There are some downsides of a fully connected readout layer

The number of parameters depends on $n$ - adding $n d_{L} d$ learning parameters, which grows with the graph size. This is not amenable to groups of large graphs
No longer permutation invariant because of the $vec (\cdot)$ operation

Exercise

Verify that fully connected readout layers are no longer permutation invariant

No longer transferrable across graphs.
- unlike in GNNs, $C$ depends on $n$ . So if the number of nodes $n$ changes, we have to relearn $C$

These make this a not-so-attractive option, so we usually use an aggregation readout layer

Aggregation Readout Layer

The aggregation readout layer is a readout layer given by

y = aggr (x_{L} C)

where $C \in R^{d_{L} \times d}$ and $aggr : R^{n \times d} \to R^{1 \times d}$ is node-level aggregation

Typical aggregation functions are the mean, sum, max, min, median, etc

Note

this is now independent of $n$ (my aggregation functions don't change based on the size of my graph)
permutation invariant since the aggregation functions are also permutation invariant
this remains transferrable across graphs, unlike the fully connected readout layer

graph isomorphism problem

Graph Isomorphism

Let $G$ and $G^{'}$ be two graphs. A graph isomorphism between $G$ and $G^{'}$ is a bijection $M : V (G) \to V (G^{'})$ such that for all $i, j \in V (G)$

(i, j) \in E (G) ⟺ (M (i), M (j)) \in E (G^{'})

(see graph isomorphism)

A graph isomorphism exists exactly when two graphs are equivalent.

Question

Can we train a GNN to detect if graphs are isomorphic?

ie, to produce identical outputs for graphs in the same equivalence class/graphs which are isomorphic, and different outputs for graphs that are not isomorphic?

Consider two graphs $G$ and $G^{'}$ with laplacian $L = V Λ V^{T}$ and $L^{'} = V^{'} Λ^{'} V^{' T}$ . Assume they do not have node features (but we can impute them)

Consider the following single-layer (linear) GNN

y = \sum_{k = 0}^{K - 1} h_{k} L^{k} x (*)

Suppose there is some $λ_{j}$ such that $λ_{j} \neq λ_{i}^{'}$ for all $i$ - ie, $L$ has at least one eigenvalue that is different from $L^{'}$

We can verify whether graphs without node features and different laplacian eigenvalues are not isomorphic

Theorem

Consider two graphs $G$ and $G^{'}$ without node features, with laplacian $L = V Λ V^{T}$ and $L^{'} = V^{'} Λ^{'} V^{' T}$ . Further suppose that there is some $λ_{j}$ such that $λ_{j} \neq λ_{i}^{'}$ for all $i$ - ie, $L$ has at least one eigenvalue that is different from $L^{'}$ .

Then there exists a graph convolution we can use to verify whether two graphs are NOT isomorphic, ie to test that $G \neq G^{'}$

Proof

Consider the following single-layer (linear) GNN

y = \sum_{k = 0}^{K - 1} h_{k} L^{k} x (*)

Define the graph feature $x$ to be white, ie such that $E [{\hat{x}}_{i}] = 1 \forall i = 1, \dots, n$ .

Set $h_{k} = {\begin{cases} 1 if k = 1 \\ 0 otherwise \end{cases}$

Then we have

\begin{aligned} E (\hat{y}) & = E (V^{T} y) \\ = E (V^{T} h_{k} L x) \\ = E (V^{T} V Λ V^{T} x) \\ = E (Λ x) \\ ⟹ E (\hat{y}) & = [\begin{array}{c} λ_{1} \\ λ_{2} \\ ⋮ \\ λ_{n} \end{array}] \end{aligned}

And if there is some $λ_{j}$ such that $λ_{j} \neq λ_{i}^{'}$ for all $i$ - ie, $L$ has at least one eigenvalue that is different from $L^{'}$ as above, then it suffices to compare

$E (1^{T} \hat{y}) = \sum_{i} λ_{i} = Tr (L) to E (1^{T} {\hat{y}}^{'}) = \sum_{i} λ_{i}^{'} = Tr (L^{'})$
Since the laplacian is positive semidefinite, the sums will be different if and only if we have at least eigenvalue that is different between them.

(see paper GNNs are more powerful than you think)

However, there are many non-isomorphic graphs that share the same eigenvalues

Example

These graphs are clearly non-isomorphic, but their laplacian eigenvalues are the same!

Thus, a strong alternative is the

So our eigenvalue-based heuristic doesn't work for some graphs, but a simple heuristic that does work is counting degrees. Then the problem amounts to comparing the multisets of node degree counts.

(continuing the example from above)
$G_{1}$ has degrees ${1, 3, 3, 2, 2, 3}$ and $G_{2}$ has degrees ${2, 2, 2, 4, 2, 2}$

Weisfeiler-Leman Graph Isomorphism Test

The Weisfeiler-Leman graph isomorphism test consists of running the color refinement algorithm for two graphs.

Motivation

Since the eigenvalue-based heuristic doesn't work for some graphs, a simple heuristic that does work is counting degrees. Then the problem amounts to comparing the multisets of node degree counts.

color refinement algorithm

The color refinement algorithm is an algorithm to assign colorings to nodes.

Algorithm

Given graph $G = (V, E)$ with node features $x$

let $c_{i} (0) = f (\cdot, {x_{i}}) \forall i \in V$ (assign colors based on node feature values)
let $t = 0$

while True:

let $c_{i} (t) = f (c_{i} (t - 1), {{c_{j} (t - 1), j \in N (i)}}) \forall i \in V$

assign colors based on previously assigned colors

if $c_{i} (t) == c_{i} (t - 1) \forall i$ , break. else, $t = t + 1$

Here, $f$ is a hash function that assigns colors.

Example

Color refinement for

G_{1}

No node features, so assume all 1.
Assign colors based on node features $c_{1} (0) = c_{2} (0) = c_{3} (0) = f (\cdot, 1) = k_{1}$

step $t = 1$ :

\begin{aligned} c_{1} (1) & = f (k_{1}, {k_{1}}) & = k_{1} \\ c_{3} (1) & = f (k_{1}, {k_{1}}) & = k_{1} \\ c_{2} (1) & = f (k_{1}, {k_{1}, k_{1}}) & = k_{2} \end{aligned}

Evaluate: did the colors change?

Yes: continue!

step $t = 2$ :

\begin{aligned} c_{1} (2) & = f (k_{1}, {k_{1}}) & = k_{1} \\ c_{3} (2) & = f (k_{1}, {k_{1}}) & = k_{1} \\ c_{2} (2) & = f (k_{2}, {k_{1}, k_{1}}) & = k_{2} \end{aligned}

Evaluate: did the colors change?

No - so we are done!
This is the final coloring.