Lecture 1.2

Group Theory I: Groups, Actions, and Representations

This lecture covers the minimal group theoretical prerequisites needed to understand Regular Group Convolutional Neural Networks. We look at groups as mathematical constructs representing transformations.

1. What is a Group?

Intuitively, a group is a set of transformations that we can combine (perform one after another) and un-do (invert). Formally:

Definition: Group

A group is a set $G$ equipped with a binary operation $\cdot$ (group product) that satisfies four axioms:

Closure: For all $g, h \in G$, $g \cdot h \in G$.
Associativity: $(g \cdot h) \cdot k = g \cdot (h \cdot k)$.
Identity: There exists an element $e \in G$ such that $e \cdot g = g \cdot e = g$.
Inverse: For every $g \in G$, there exists $g^{-1} \in G$ such that $g \cdot g^{-1} = g^{-1} \cdot g = e$.

Example: The Translation Group $(\mathbb{R}^2, +)$

The set of 2D translation vectors forms a group.
Product: Adding two translation vectors $\mathbf{x} + \tilde{\mathbf{x}}$.
Identity: The zero vector $\mathbf{0}$.
Inverse: The negative vector $-\mathbf{x}$.

2. The Roto-Translation Group ($SE(2)$)

Things get interesting when we combine different types of transformations, like translations and rotations. The Special Euclidean group $SE(2)$ consists of pairs $(\mathbf{x}, \theta)$ where $\mathbf{x} \in \mathbb{R}^2$ is a translation and $\theta \in [0, 2\pi)$ is a rotation.

Notice that we cannot just add these elements as vectors. If you rotate-then-translate, the outcome is different than if you translate-then-rotate. The group product reflects the concatenation of transformations:

SE(2) Group Product $$ g_2 \cdot g_1 = (\mathbf{x}_2, \theta_2) \cdot (\mathbf{x}_1, \theta_1) = (\mathbf{R}_{\theta_2}\mathbf{x}_1 + \mathbf{x}_2, \theta_2 + \theta_1) $$

Here, the rotation $\mathbf{R}_{\theta_2}$ acts on the translation vector $\mathbf{x}_1$. This mixing of components makes $SE(2)$ a semi-direct product group: $SE(2) \cong \mathbb{R}^2 \rtimes SO(2)$.

Matrix Representation

Often, it is convenient to represent group elements as matrices. For $SE(2)$, we can use $3 \times 3$ homogeneous matrices:

$$ \mathbf{M}(g) = \begin{pmatrix} \cos\theta & -\sin\theta & x \\ \sin\theta & \cos\theta & y \\ 0 & 0 & 1 \end{pmatrix} $$

Then the group product corresponds simply to matrix multiplication.

3. Affine Groups

Many groups we care about (translation, scale, rotation) fall under the umbrella of Affine Groups. These are groups $G = \mathbb{R}^d \rtimes H$ consisting of a translation part $\mathbb{R}^d$ and a linear transformation group $H$ (like rotations $SO(d)$ or scaling).

4. Group Actions

The group product tells us how group elements interact with each other. But how do they affect the world (e.g., vectors, images)? This is defined by a Group Action.

Definition: Group Action

A group action $\odot$ of $G$ on a set $X$ is a map $G \times X \to X$ such that:

$$ g \cdot (h \cdot x) = (g \cdot h) \cdot x $$ $$ e \cdot x = x $$

For example, $SO(2)$ acts on vectors in $\mathbb{R}^2$ by matrix-vector multiplication: $\mathbf{R}_\theta \odot \mathbf{x}$.

5. Representations

When the set $X$ is a vector space (like $\mathbb{R}^d$ or the space of images/functions $\mathbb{L}_2(\mathbb{R}^d)$) and the action is linear, we call it a Representation, denoted $\rho(g)$.

The Regular Representation

How does a group act on an image/function? If we have a function $f: \mathbb{R}^2 \to \mathbb{R}$ (an image), and we rotate the domain, the function transforms as:

Left-Regular Representation $$ [\mathcal{L}_g f](x) = f(g^{-1} \odot x) $$

Basically, to find the value of the transformed image at $x$, we look up the value of the original image at the coordinate where $x$ came from (the inverse transform $g^{-1} x$).

6. Equivariance Defined

Finally, we define the core concept of the course formally.

Definition: Equivariance

An operator $\Phi$ is equivariant if it commutes with the group action:

$$ \Phi(\mathcal{L}_g f) = \mathcal{L}'_g (\Phi(f)) $$

Note that the representations $\mathcal{L}$ and $\mathcal{L}'$ can be different. For example, the input might rotate (standard representation), while the output (a feature map) might undergo a shift-twist transformation (induced representation).