Determinant
By: Gijs Bellaard
Definition
We loosely define the determinant of a square matrix\( \in \R^{n \times n}\) as the signed hypervolume of the parallelepiped spanned by its columns vectors. This way the determinant is nothing but a function that takes in \(n\) vectors and spits out a real number: \[ \det: \underbrace{\R^n \times \cdots \times \R^n}_\text{\(n\) times} \to \R\] But how does this help us calculate and understand the determinant? We will see in this article that it actually tells us all we need to know. Starting with just two dimensions, we'll explore some of the natural properties the determinant has under this definition, which afterwards allows us to derive how the determinant is calculated in full generality.
Two-dimensional case
In the two-dimension case the determinant is a function that takes in two vectors and returns the signed area of the parallelogram spanned by them. So, as promised, let us explore some of the properties the determinant has under our vague definition.
Identity
We can all agree that the area of a square with side lengths 1 has an area of 1. So this property is quite natural: \[ \det(e_1,e_2) = 1\] Or, in other words, the determinant of the identity matrix is 1.
Bilinearity
We see that the determinant is bilinear, that is, it is linear in both in arguments.
Alternating
Because we are measuring area the determinant of two equal vectors is 0, because they make an "empty" parallelogram. When a bilinear map has this property it is said to be alternating. \[ \det(v,v) = 0 \]
Forced
Take any two vectors \(a\) and \(b\). Then the following must be true due to the bilinearity and the alternating property: \[ \begin{align*} 0 &= \det(a+b,a+b) \\ &= \det(a,a) + \det(a,b) + \det(b,a) + \det(b,b)\\ &= \det(a,b) + \det(b,a) \end{align*} \] So: \[ \det(a,b) = -\det(b,a) \] We are forced to the conclude that the determinant takes on negative values. I hear you saying: "But how does that make any sense, how can an area be negative?" Well, that is because we aren't measuring the absolute area with the determinant but the signed area. That is, it also encodes the orientation of the vectors into the sign of the output. The normal notion of area is encoded in the magnitude of the answer.
Derivation
Even though we have no way (yet) of calculating the area of the parallelogram directly, we can use our (new-found) knowledge of the properties of the determinant to derive it! Take a general \(2\times2\) matrix: \[ \begin{bmatrix} a & b \\ c & d \end{bmatrix}\] We will use the following notation to allow us to stop writing \(\det\) over and over again: \[ \begin{vmatrix} a & b \\ c & d \end{vmatrix} := \det\begin{pmatrix} a & b \\ c & d \end{pmatrix} \] Using the additivity of the bilinearity of the determinant on the first vector we can split this into: \[ \begin{vmatrix} a & b \\ c & d \end{vmatrix} = \begin{vmatrix} a & b \\ 0 & d \end{vmatrix} + \begin{vmatrix} 0 & b \\ c & d \end{vmatrix} \] And on the second vector...: \begin{equation} \label{additivity} ... = \begin{vmatrix} a & b \\ 0 & 0 \end{vmatrix} + \begin{vmatrix} a & 0 \\ 0 & d \end{vmatrix} + \begin{vmatrix} 0 & b \\ c & 0 \end{vmatrix} + \begin{vmatrix} 0 & 0 \\ c & d \end{vmatrix} \end{equation} Then we can use homogeneity to extract the scalars: \begin{equation} \label{homogeneity} ... = ab\begin{vmatrix} 1 & 1 \\ 0 & 0 \end{vmatrix} + ad\begin{vmatrix} 1 & 0 \\ 0 & 1 \end{vmatrix} + bc\begin{vmatrix} 0 & 1 \\ 1 & 0 \end{vmatrix} + cd\begin{vmatrix} 0 & 0 \\ 1 & 1 \end{vmatrix} \end{equation} Lastly we use that the determinant is alternating together with the determinant of an identity matrix being 1: \begin{equation} \label{final} ... = ab \cdot 0 + ad\cdot 1 + bc\cdot -1 + cd\cdot 0 \end{equation} And we conclude that the determinant of a general \(2\times2\) matrix is equal to: \[ ad - bc \]
General case
We say, in full generality, that the determinant is multilinear and alternating, and that the determinant of an identity matrix is 1. Just like in the two-dimensional case we can use these three properties to calculate the determinant of any matrix in exactly the same way. We start by repeatedly using the additivity of the multilinearity to split the determinant into loads of simpler determinants in which every column has only one entry, analogously to equation \eqref{additivity}. Then we extract the scalars using homogeneity to arrive at an equation analogously to \eqref{homogeneity}. We are left with only "trivial determinants" in which every column consists of only one \(1\). Finally we use the alternating property together with that the determinant of an identity matrix is 1, to calculate the value of all the trivial determinants, just like equation \eqref{final}.
Final Step
As we saw in the two-dimensional case a lot of the trivial determinants we are left with evaluate to \(0\) because of a duplicate column. The only trivial determinants that matter are the ones that evaluate to either \(-1\) or \(1\) which correspond to column-permuted identity matrices. This is why such matrices are also called permutation matrices. The determinant of a permutation matrix is equal to the sign of the corresponding permutation. This is obvious in hindsight because the sign of a permutation flips when two elements are swapped, just like the sign of the determinant flips when two columns are swapped!
All this talk about permutations allows us to write down how to calculate the determinant in full generality: \[ \det(a^1, \ldots, a^n) = \sum_{\pi \in S_n} a^1_{\pi(1)} \dots a^n_{\pi(n)} \textrm{sgn}(\pi) \] Where \(S_n\) is the set of all permutations on \(n\) elements. We take a permutation, extract the product of the relevant elements in the matrix, multiply it by the sign of the permutation, and sum this amount over all possible permutations.