Lecture 1.7
Theorem: G-Convs Are All You Need
We conclude Module 1 with the fundamental theorem of equivariant deep learning. Is the Group Convolution just one way to achieve equivariance? Or is it the only way?
1. Linear Operators as Integral Transforms
In classical (fully connected) neural networks, layers are matrix-vector multiplications. For continuous data (functions), the analog is an integral transform with a two-argument kernel $\kappa(x, x')$:
$$ [\mathcal{K} f](x) = \int \kappa(x, x') f(x') dx' $$2. The Theorem
A bounded linear operator $\Phi$ is equivariant to a group $G$ acting transitively on the domain/codomain if and only if it is a Group Convolution.
Proof Sketch:
- Assume equivariance: $\mathcal{K}(\mathcal{L}_g f) = \mathcal{L}_g (\mathcal{K} f)$.
- Write out the integrals for both sides.
- Use a change of variables (substitution) to align the domains.
- This implies a symmetry constraint on the kernel: $\kappa(g \cdot x, g \cdot x') = \kappa(x, x')$. The kernel must be invariant under the diagonal action of the group.
- Using transitivity, we can relate any $x$ to an origin $x_0$. This allows us to reduce the two-argument kernel $\kappa(x, x')$ to a single-argument kernel $k(g)$ defined on the group.
- The resulting expression matches the definition of a group convolution.
3. Kernel Constraints
If we map between homogeneous spaces $X=G/H$, the kernel $k$ must satisfy further constraints imposed by the stabilizer $H$.
- Planar Convolution ($\mathbb{R}^2 \to \mathbb{R}^2$): The stabilizer is the rotation group. The kernel must be isotropic (rotationally symmetric). This is very restrictive (e.g., Laplacian filters).
- Group Convolution ($SE(2) \to SE(2)$): The stabilizer is trivial. No constraints on the kernel! We can learn any kernel we want. This explains why lifting to the group is necessary for expressive power.
4. Conclusion
This theorem justifies the widespread use of group convolutions. If you want a linear layer to be equivariant, you have no choice but to use a group convolution (or a constrained version thereof).