Lecture 2.6

Activation Functions

Designing activation functions for Steerable CNNs requires care. We cannot simply apply element-wise ReLU to a vector field, because this would break the equivariance.

1. Why Element-wise ReLU Fails

Applying a non-linearity $\sigma$ to each component of a vector $\mathbf{v} = (v_1, v_2)$ destroys its geometric properties.

Example: Consider a vector $\mathbf{v} = (1, 0)$. Rotating 90° gives $\mathbf{v}' = (0, 1)$.

  • Apply ReLU then Rotate: $\text{Rot}(\text{ReLU}(1,0)) = \text{Rot}(1,0) = (0,1)$.
  • Rotate then Apply ReLU: $\text{ReLU}(\text{Rot}(1,0)) = \text{ReLU}(0,1) = (0,1)$. (Matches)

Counter-Example: Consider $\mathbf{v} = (1, -1)$. Rotating 90° gives $(1, 1)$.

  • Apply ReLU then Rotate: $\text{Rot}(1, 0) = (0, 1)$.
  • Rotate then Apply ReLU: $\text{ReLU}(1, 1) = (1, 1)$. Mismatch!

2. Permissible Nonlinearities

To maintain equivariance, the operation must commute with the group action. We have three main options:

A. Norm-Based Nonlinearities

Apply the nonlinearity only to the magnitude of the vector, preserving its direction:

$$ \mathbf{v}' = \sigma(\|\mathbf{v}\|) \frac{\mathbf{v}}{\|\mathbf{v}\|} $$

Since the norm $\|\mathbf{v}\|$ is invariant to rotation, this operation is safe.

B. Gated Nonlinearities

This is the most common approach. We produce a separate Scalar Gate $s$ (Type-0 field) along with our vector field $\mathbf{v}$. We apply the sigmoid function to the gate and multiply specific vector channels:

$$ \mathbf{v}' = \sigma(s) \cdot \mathbf{v} $$

Since $s$ transforms trivially ($\rho(g)=1$), the scaling factor $\sigma(s)$ is invariant to rotation, preserving the covariance of $\mathbf{v}$.

C. Inverse Fourier Nonlinearities

We can also (1) use the Inverse Fourier Transform to compute the signal on the group, (2) apply standard Point-wise ReLU, and (3) transform back with Fourier Transform. This is computationally more expensive but allows using standard ReLUs.