I spent years feeling like I didn’t know what a matrix truly was. I no longer feel that way, but I think it’s just because I gave up. I don’t think I really know any more about them now than I did when I was confused.
Box o’ Numbers
is a matrix. It could be the matrix of the row vectors and . It could be the matrix of the column vectors and . It could be the matrix of the coefficients of the left-hand side of the linear equation . We can say is an element of the set of 2×2 matrices, or an element of the set of invertible linear transformations from the plane to itself.
Initially, I wanted to motivate all that is The Matrix, but that turned out to be a super huge ordeal that I decided was ultimately not worth it.
Okay, so last time we saw that vectors were arrows, but I didn’t say how to write them down. If you understand what the coordinate (1,2) is, then replace the ( ) with and you’ve got the vector which starts at the origin and ends at the point (1, 2). Since a vector is just the arrow, not attached to any particular location, you can move it around (without changing length or direction) and still call it the same thing, so could also be the vector that starts at (1, 1) and goes up 1 and over 2 to the point (2, 3).
Canoes and Waterfalls
Given two vectors u and v, there’s this thing called the dot product that you can do. You can’t just multiply vectors together (our intuition of multiplying numbers together sort of fails us when we’re dealing with numbers that have direction), but there are two natural(ish) sorts of products you can do. If and , then , and we get the nice property that the length of u is .
Since we can write any vector in the plane as a column vector , this gives us a first hint at how we might multiply boxes of numbers together. If u and v are two column vectors, then is defined to be , where is the row vector obtained by taking the “transpose” of u.
How I picture the dot product operation is by seeing the row vector as a canoe and the column vector as a waterfall. Seriously. You multiply the entries that line up and then add those products together.
We can extend this to matrix multiplication by viewing the left-hand matrix as a box of canoes (rows) and the right-hand matrix as a box of waterfalls (columns). If R and C are matrices such that RC makes sense, then the entry of the resulting product matrix which is in the ith row and jth column is the dot product of the ith canoe with the jth waterfall:
For matrix multiplication to work, the length of the canoes must equal the length of the waterfalls, in which case the product will have the same number of canoes as your canoe matrix and the same number of waterfalls as your waterfall matrix. We also have that matrix multiplication does not “commute,” which is to say that in general for two matrices A and B, . If you consider matrix multiplication “on the left,” which is to say that “A acts on B” in the product AB, then we have that the rows of A act on the rows of B (meaning that each row in the product is just a linear combination of the rows in B). On the other hand if you consider the product AB to be B acting on A “on the right,” then we have that the columns of B act on the columns of A.
About that dot product
Now that we see that matrix multiplication can fill our world with dot products, let’s say a bit more about what the dot product gives us. For two vectors u and v, it turns out that , where |vector| means its length, and is the angle between the two vectors. (From here we can confirm what I said earlier about being its length squared (since the angle between a vector and itself is 0 and the cosine of 0 is 1).) So what? This means that our product of canoes and waterfalls above is actually a bunch of lengths and angles:
where is the angle between and .
So, remember early on I said we could turn a column vector into a row vector by taking its “transpose”? Let’s look at this now for matrices. The transpose of a matrix R is the matrix whose ith column is the ith row of R:
And you might ask yourself what happens when you multiply a matrix by its transpose.
This is the Gram matrix of the vectors and and it encodes the lengths of the vectors and all the angles between them. Uhhhhh, hold that thought.
Okay, so we have boxes of numbers and we have a way to multiply two matrices together. This means we might be able to form a “group.” In math, a group is a set G equipped with an operation such that operating on any two elements in the set keeps you in the set, there is an identity element e such that for all g in G, and every element g has an inverse such that . If that seems super random and abstract, you can think of , the set of non-zero rational numbers (fractions, including (non-zero) integers, which are just secret fractions) with respect to multiplication. The identity is 1, and every element of is invertible (take the reciprocal).
I’m just going to stick to 2×2 matrices, but aside from explicit formulas, nothing I say is dependent on the size of the matrix. That it needs to be “square” (same number of rows as columns) will become clear.
Given two 2×2 matrices, we know how to multiply them together and we saw that we would get a 2×2 matrix back. Now we need an identity element. You can find the identity by setting up two arbitrary matrices A and I (i.e., you fill in their entries with distinct letters rather than specific numbers) and using the equation that AI = IA = A and solve for the entries of I. You could also possibly just stare at the canoe and waterfall matrices above until enlightenment strikes. Either way, the identity matrix will be , and you should verify that this satisfies AI = IA = A for all 2×2 matrices A.
Now we need to find the inverse of an arbitrary 2×2 matrix . First, let’s define a new matrix . Now find and .
You should get that which we also can write as . This means that whenever , we have that and thus all matrices A for which are invertible. Furthermore, if ad-bc = 0 , then we’ve found a non-zero matrix B such that AB=0. If A is in fact invertible, then we get that which implies that B=0, but we already said B was non-zero, therefore A cannot be invertible.
So, no the set of 2×2 matrices is not a group, but the set of 2×2 invertible matrices is. When the allowable entries are real numbers, we call this group . And ad-bc is the determinant of A written det(A).
(To see why an invertible matrix must be square, remember what we said about the number of rows and columns in the product matrix and the fact that has to equal .)
Lattice as a box o’ numbers
Back to lattices! For any 2-dimensional lattice Lin the plane, pick two basis vectors u and v (which determine the fundamental parallelogram) and put them as rows of a 2×2 matrix: .
Fun fact! The (absolute value of the) determinant of gives the area of the fundamental parallelogram!
I grabbed two images from the previous post. They aren’t labeled, but I’ve assumed that one basis vector is length 1 on the positive x-axis.
This lattice could be represented by , and the area of its fundamental parallelogram is 1.
This lattice, on the other hand, could be represented by , and the area of its fundamental parallelogram is .
Last time, we mentioned that certain actions don’t affect the shape. Namely, change of basis, scaling, and rotating and/or reflecting. Let’s now see how we can represent these actions in terms of matrix multiplication.
Change of basis. What does it mean for u and v to be a basis for a lattice L? It means that L is the set of all linear combinations of u and v. So and notice that the whole plane is the set of xu+yv where x and y are real numbers. The first thing we could do to a lattice that doesn’t change its shape is do absolutely nothing to the lattice. When we pick a new basis for a lattice, we aren’t touching the lattice at all, though we do end up with a different matrix representation for it. Picking a new basis means we take two new (“linearly independent”) vectors from our lattice and use those as a basis instead. That means that these new basis elements have to be integer linear combinations of our old basis elements. Now, since we put our original basis vectors as the rows of our matrix , what we want is an action that takes integer linear combinations of the rows. To get this we act on the left by a matrix with integer entries.
So far, so good. The condition that our new vectors be linearly independent (in two dimensions, this is just that one is not a scalar multiple of the other) means that we must act by an invertible matrix (because the new basis elements are linearly dependent if, and only if, the area of the fundamental parallelogram (which will just be a line) is zero). Okay, so take an element g of with integer entries. Then represents a sublattice of L (a subset of L which is itself a lattice). If also has integer entries (i.e., g is an element of , the set of 2×2 matrices invertible “over the integers”), then represents a sublattice of . If two lattices are sublattices of each other (or if two sets are subsets of each other) then they are actually equal to each other. Thus, acting on the left by is how we get a change of basis.
Scaling. To see how we can scale our lattice using matrices let’s start with the the square lattice whose points are all points (m, n) where m and n are integers. This lattice is generated by the vectors and , and so we can define its matrix representation as . Scaling this lattice by 2 means forming the lattice generated by and , and we have that . In general, scaling a matrix by , by which I mean the shape preserving scaling of each generator by , is acting by , and we say this is acting by .
Rotations and Reflections. Let O be an element of such that . This means that the Gram matrix of the rows of O is the identity. In other words the rows are orthogonal to each other (thus the cosine of the angle between them is 0) and they are all of length 1 (and the same is true of the columns). Such matrices form the orthogonal group which we write as . We want to see that acting on the right by such a matrix is just a rotation or reflection of our lattice. When two lattices differ only by rotations and/or reflections, it means that vector lengths and angles are preserved. In other words, the Gram matrix of the basis vectors is preserved, and in fact if two lattices have the same Gram matrix they can only differ by a product of rotations and reflections.
To see the relationship between Gram matrices and the orthogonal group, let and be two lattices such that their matrix representations differ by an orthogonal matrix acting on the right: . Then we have that meaning that lattices that differ by the orthogonal group acting on the right have the same Gram matrix. (If you’re wondering why two lattices that have the same Gram matrix necessarily differ only by an element of the orthogonal group, I’m told that you should look into Cholesky decomposition.)
Last time, we saw the shape of a two-dimensional lattice as a point in the plane. For any given lattice L, we rotated, reflected, scaled, and changed basis until we had that one generator was and the other generator was a point in a specific fundamental domain. Equivalently, we could start with , and act on the left by and on the right by until we get a matrix whose rows are and v where v lies in the fundamental domain (its x coordinate is between 0 and 1/2, and its length is at least 1). That gives you a representative shape matrix.
On the other hand, the Semi-Super* Mathy way to define shape is to use “double cosets.” Which is to say that the shape of the lattice is the set of all lattices with the same shape. Or, in terms of matrices, we can define which is the set of all matrices which represent L. It is this perspective that allows us to define the space of shapes of two-dimensional lattices to be . (And people familiar with such things can look at this and say “Ah, that space has finite volume!”)
*The actual Super Mathy way to define shape is something something quadratic form, but I have never enjoyed that view.
Alternatively, you could look at the Gram matrix of the lattice instead of its shape matrix.
AND MAYBE NEXT TIME I’LL KNOW WHY YOU WOULD DO THAT…