fully connected readout layer

[[concept]]

In a fully connected readout layer we define

y = ρ (C . vec (x_{ℓ}))

Where

$C \in R^{d \times n d_{L}}$
$vec (x_{ℓ}) \in R^{n d_{L}}$
- in general $vec (\cdot)$ vectorizes $R^{m \times n}$ matrices into $R^{m n}$ vectors
$ρ$ can be the identity or some other pointwise nonlinearity (ReLU, softmax, etc)

Note

There are some downsides of a fully connected readout layer

The number of parameters depends on $n$ - adding $n d_{L} d$ learning parameters, which grows with the graph size. This is not amenable to groups of large graphs
No longer permutation invariant because of the $vec (\cdot)$ operation

Exercise

Verify that fully connected readout layers are no longer permutation invariant

No longer transferrable across graphs.
- unlike in GNNs, $C$ depends on $n$ . So if the number of nodes $n$ changes, we have to relearn $C$

These make this a not-so-attractive option, so we usually use an aggregation readout layer