The 0th layer of the GNN assigns for all since all feature values are the same
Induction Hypothesis
Suppose holds for iteration/layer .
the WL test outputs the same colors and for all . That is,
By the induction hypothesis, we have that
Since the same and functions are applied to both graphs, then we automatically get that
Thus, by induction, if the colors for all , the the embeddings at any .
Note
This creates a valid map for all between the embeddings of the nodes and the coloring assignment.
We can extend this mapping to include the multiset of embeddings/colors of the neighbors as well, and say that the multiset of all embeddings and neigeborhood embedding pairs also has a map
And these are each the same for the two graphs.
Thus, the and are the same for all . And, due to the permutation invariance of the readout layer as defined above, we must have .
we assumed that the embeddings were NOT equal! Thus, it must be the case that the WL testCAN decide if and are non-isomorphic.
Introduced by Kipf and Willing, 2017, a graph convolutional network layer is given by
Note that are row vectors. We can think of each layer as a "degree normalized aggregation"
In AGGREGATE-COMBINE form, a GCN layer is given by
WLOG, assume . The first layer with readout gives us
But then we get the same output for every layer and this GNN fails for permutation invariant readouts.
Takeaway
By the example above, we can see easily that not everyGNN is as powerful as the WL test. However, there are some GNNs that can do better, and which can be at least as powerful.
In the example above, where did things break? Pros: GCNs and any permutation equivariantGNN does not distinguish between nodes 1 and 3. This is an implicit data augmentation mechanism! Cons: The GCN cannot distinguish between (1 and 3) and 2, but the structure at 1 or 3 is different from at node 2.
Let's define the computational graph
computational graph
The computational graph tracks the communications that each node makes at each layer or iteration of an algorithm/network.
Example
Since 1 and 3 are symmetric, their computational graph is symmetric
Note
is sufficient because the diameter (or the longest shortest path between two nodes) is of length 2.
We can imagine that we are mapping computational graphs to some embedding space. The issue with the GCN is that it is mapping all three computational graphs to a single embedding space.
We need functions/embeddings/maps that are injective. ie, which map different computational graphs (or different multisets of embeddings) to different values in embedding space.