Linear Algebra

\( \)

Linear algebra is the branch of mathematics concerning linear equations and numbers represented in scalars, vectors, matrices, and tensors. As activations, parameters or weights in machine learning and neural networks are usually denoted as vectors, matrices or tensors, Linear Algebra is central to the underlying theory of machine learning.

Set of Numbers

Scalars, vectors, matrices, and tensors are containing numbers, and these numbers belong to sets of numbers. There are a few common sets of numbers:

\(\mathbb{N}\) represents the set of positive integers \((1, 2, 3, 4, …)\) (dependent on the actual definition 0 belongs to this set or not).

\(\mathbb{Z}\) represents the set of negative, zero and positive integers \((…, -4, -3, -2, -1, 0, 1, 2, 3, 4, …)\).

\(\mathbb{Q}\) represents the set of rational numbers (numbers that may be expressed as a fraction of two integers).

\(\mathbb{R}\) represents the set of real-valued numbers, which contains the rational numbers (\(\mathbb{Q}\)) and the non-rational numbers like \(\pi\) or \(\sqrt{2}\). In the following, scalars, vectors, matrices, or tensors are usually containing numbers from the set \(\mathbb{R}\).

Scalars, Vectors, Matrices, and Tensors

Scalars

Scalars are single values like:
\begin{equation}
x \in \mathbb{R}
\end{equation}

Vectors

Vectors are ordered one-dimensional lists of \(n \in \mathbb{N}\) single numbers or scalars. They are noted in boldface lower case letters:
\begin{equation}
\textbf{x} = [x_1, x_2, …., x_n] \in \mathbb{R}^n
\end{equation}
Vectors with \(n\) numbers can be interpreted as points in an \(n\)-dimensional vector space.

Matrices

Matrices are rectangular two-dimensional arrays consisting of numbers or scalars. Matrices are denoted in boldface upper case letters:
\begin{equation}
\boldsymbol{A}=\begin{bmatrix}
\kern4pt a_{11} & a_{12} & a_{13} & \ldots & a_{1n} \kern4pt \\
\kern4pt a_{21} & a_{22} & a_{23} & \ldots & a_{2n} \kern4pt \\
\kern4pt a_{31} & a_{32} & a_{33} & \ldots & a_{3n} \kern4pt \\
\kern4pt \vdots & \vdots & \vdots & \ddots & \vdots \kern4pt \\
\kern4pt a_{m1} & a_{m2} & a_{m3} & \ldots & a_{mn} \kern4pt \\
\end{bmatrix} \in \mathbb{R}^{mxn}
\end{equation}
The matrix \(\boldsymbol{A}\) can also be written as:
\begin{equation}
\boldsymbol{A} = [a_{ij}]_{m \times n} \; m, n \in \mathbb{N}\
\end{equation}
For a \(\boldsymbol{A}^{mxn}\) Matrix, \(m\) always denotes the number of rows and \(n\) always denotes the number of columns.

Vectors can either be a row vector (which is a \(1 \times m\) Matrix):
\begin{equation}
\textbf{x} = [x_1, x_2, …., x_n]
\end{equation}
or a column vector (which is a \(m \times 1\) Matrix):
\begin{equation}
\textbf{x}{^T}=\begin{bmatrix}
\kern4pt x_1 \kern4pt \\
\kern4pt x_2 \kern4pt \\
\kern4pt \vdots \kern4pt \\
\kern4pt x_n \kern4pt
\end{bmatrix}
\end{equation}
Column vectors are transformed row-vectors and therefore noted as \(\textbf{x}{^T}\).

Tensors

Tensors are more general entities that encapsulate scalars, vectors and matrices. Scalars are 0th-order tensors, vectors are 1st-order tensors and matrices are 2th-order tensors. Tensors can have higher orders than 2. A tensor that represents a 2-dimensional pixel image where each pixel is represented by three numbers describing the three colors (RGB) is a 3rd-order tensor (sometimes also called a 3-dimensional matrix).

Operations on Vectors and Matrices

While everybody is familiar with operations on scalars, we will look now into the operations addition and multiplication with vectors and matrices. This can be generalized to tensors.

Matrix Transpose

If you have an \(m \times n\) Matrix \(\boldsymbol{A}\):
\begin{equation}
\boldsymbol{A} = [a_{ij}]_{m \times n} \; m, n \in \mathbb{N}\
\end{equation}
then the transposed Matrix \(\boldsymbol{A}^{T}\) has the size \(n \times m\):
\begin{equation}
\boldsymbol{A}^{T} = [a_{ji}]_{n \times m} \; m, n \in \mathbb{N}\
\end{equation}
The transpose of a matrix swapps the indices \(i\) and \(j\).

Examples:

\(\boldsymbol{A}\) is a \(2 \times 3\) Matrix and \(\boldsymbol{A}^{T}\) is the transposed \(3 \times 2\) matrix:

\begin{equation}
\boldsymbol{A} = \left[ \begin{array}{ccc}
a_{11} & a_{12} & a_{13} \\
a_{21} & a_{22} & a_{23}
\end{array} \right], \quad
\boldsymbol{A}^T = \left[ \begin{array}{cc}
a_{11} & a_{21} \\
a_{12} & a_{22} \\
a_{13} & a_{23}
\end{array} \right]
\end{equation}

The transpose of a \(1 \times n\) row vector leads to a \(n \times 1\) column vector and vice versa.
\begin{equation}
\textbf{x} = [x_1, x_2, …., x_n] , \quad
\textbf{x}{^T}=\begin{bmatrix}
\kern4pt x_1 \kern4pt \\
\kern4pt x_2 \kern4pt \\
\kern4pt \vdots \kern4pt \\
\kern4pt x_n \kern4pt
\end{bmatrix}
\end{equation}

Matrix-Matrix Multiplication

In a matrix-matrix multiplication, each element of the resulting matric is calculated by an entire row of the first matrix and an entire column of the second matrix. Therefore, matrix-matrix multiplications are only defined for matrices with a certain size. The first matrix must have as many columns as the second matrix has rows, and the resulting matrix has the number of rows from the first matrix and the number of columns of the second matrix:
\begin{equation}
\boldsymbol{A}^{\, m \times n} \boldsymbol{B}^{\, n \times p} = \boldsymbol{C}^{\; m \times p}
\end{equation}

If \(\boldsymbol{A}=[a_{ij}]_{m \times n}\) and \(\boldsymbol{B}=[b_{jk}]_{n \times p}\), then the matrix product is defined as:
\begin{equation}
\boldsymbol{C}=\boldsymbol{A}\boldsymbol{B}=[c_{ik}]_{m \times p}
\quad \text{ with } \quad
c_{ik} = \sum^n_{j=1} a_{ij} b_{jk}
\end{equation}

The element \(c_{ik}\) of the matrix \( \boldsymbol{C}=\boldsymbol{AB}\) are given by summing the products of the elements of the \(i\)-th row of \( \boldsymbol{A}\) with the elements of the \(k\)-th column of \( \boldsymbol{B}\).

Examples

\begin{equation}
\boldsymbol{A} = \left[ \begin{array}{ccc}
2 & 5 & 1 \\
7 & 3 & 6
\end{array} \right], \quad
\boldsymbol{B} = \left[ \begin{array}{cc}
1 & 8 \\
9 & 4 \\
3 & 5
\end{array} \right]
\end{equation}

\begin{equation}
\boldsymbol{AB} = \left[ \begin{array}{cc}
2 \cdot 1 + 5 \cdot 9 + 1 \cdot 3 & 2 \cdot 8 + 5 \cdot 4 + 1 \cdot 5 \\
7 \cdot 1 + 3 \cdot 9 + 6 \cdot 3 & 7 \cdot 8 + 3 \cdot 4 + 6 \cdot 5
\end{array} \right] =
\left[ \begin{array}{cc}
50 & 41 \\
52 & 98
\end{array} \right]
\end{equation}

\begin{equation}
\boldsymbol{BA} = \left[ \begin{array}{ccc}
1 \cdot 2 + 8 \cdot 7 & 1 \cdot 5 + 8 \cdot 3 & 1 \cdot 1 + 8 \cdot 6 \\
9 \cdot 2 + 4 \cdot 7 & 9 \cdot 5 + 4 \cdot 3 & 9 \cdot 1 + 4 \cdot 6 \\
3 \cdot 2 + 5 \cdot 7 & 3 \cdot 5 + 5 \cdot 3 & 3 \cdot 1 + 5 \cdot 6 \\
\end{array} \right] =
\left[ \begin{array}{ccc}
58 & 29 & 49 \\
46 & 57 & 33 \\
41 & 30 & 33
\end{array} \right]
\end{equation}

Links

A great refresher course about linear algebra can be found here:
Part 1: Scalars, Vectors, Matrices, and Tensors
Part 2: Matrix Algebra

Intellification

All you want to know about Machine Learning