Lecture 2.2
Regular G-Conv in Steerable Basis
In this lecture, we connect the concept of steerable bases back to Regular Group Convolutions. We show how to perform a group convolution without explicitly rotating the kernel for every angle, using the power of the steerable basis.
1. Regular Group Convolution Recap
Recall that a standard (lifting) group convolution involves correlating an input image $f$ with a filter $k$ rotated by every possible angle $\theta \in G$:
$$ (k \star f)(x, \theta) = \int_{\mathbb{R}^2} k(\mathbf{R}_{-\theta}(\mathbf{y}-x)) f(\mathbf{y}) d\mathbf{y} $$In a standard implementation, we discretize the group (e.g., 8 orientations), rotate the kernel 8 times, and run 8 separate convolutions. This is computationally heavy and introduces interpolation artifacts.
2. The Steerable Trick
Instead of rotating the kernel pixels, we expand the kernel $k$ in our steerable basis $\Psi = (\psi_1, \dots, \psi_N)^T$ with weights $\mathbf{w}$:
$$ k(\mathbf{y}) = \sum_n w_n \psi_n(\mathbf{y}) = \mathbf{w}^T \Psi(\mathbf{y}) $$Because the basis is steerable, the rotated kernel is simply a linear combination of the same basis functions, but with steered weights:
$$ k_{\theta}(\mathbf{y}) = (\rho(\theta)\mathbf{w})^T \Psi(\mathbf{y}) $$3. The Algorithm: Implicit Rotation
We can now rewrite the group convolution operation by swapping the order of summation and integration:
$$ \begin{aligned} (k \star f)(x, \theta) &= \int (\rho(\theta)\mathbf{w})^T \Psi(\mathbf{y}-x) f(\mathbf{y}) d\mathbf{y} \\ &= (\rho(\theta)\mathbf{w})^T \left( \int \Psi(\mathbf{y}-x) f(\mathbf{y}) d\mathbf{y} \right) \\ &= \mathbf{w}_{\theta}^T (\Psi \star f)(x) \end{aligned} $$This gives us a highly efficient two-step algorithm:
- Basis Convolution: Convolve the input image $f$ with the basis functions $\Psi$. This produces a vector field of responses $\mathbf{F}(x)$.
- Steering (Query): To obtain the response for any rotation $\theta$, simply take the dot product of the steered weights $\mathbf{w}_\theta$ with the basis response $\mathbf{F}(x)$.
4. Vector Fields & Benefits
The intermediate result $\mathbf{F}(x)$ is a Vector Field (or Feature Field). It stores all the necessary information to reconstruct the response for any orientation. This approach has massive advantages:
- Exact Equivariance: Steering is analytical (multiplying by $\rho(\theta)$), so there are no interpolation errors.
- Continuous Groups: We are not limited to a grid of 4, 8, or 12 rotations. We can query the network response at any continuous angle.
- Memory Efficiency: We only store the basis responses, which is often much more compact than storing a full stack of rotated feature maps.