WEBVTT
00:00:00.600 --> 00:00:26.200
Traditionally, dot products are something that’s introduced really early on in a linear algebra course, typically right at the start.
00:00:26.720 --> 00:00:29.560
So it might seem strange that I’ve pushed them back this far in the series.
00:00:30.240 --> 00:00:35.720
I did this because there’s a standard way to introduce the topic which requires nothing more than a basic understanding of vectors.
00:00:36.240 --> 00:00:42.280
But a fuller understanding of the role the dot products play in math can only really be found under the light of linear transformations.
00:00:43.440 --> 00:00:50.400
Before that though, let me just briefly cover the standard way that products are introduced, which I’m assuming is at least partially review for a number of viewers.
00:00:51.320 --> 00:01:04.960
Numerically, if you have two vectors of the same dimension, two lists of numbers with the same length, taking their dot product means pairing up all of the coordinates, multiplying those pairs together, and adding the result.
00:01:06.800 --> 00:01:13.080
So the vector one, two dotted with three, four would be one times three plus two times four.
00:01:14.640 --> 00:01:23.520
The vector six, two, eight, three dotted with one, eight, five, three would be six times one plus two times eight plus eight times five plus three times three.
00:01:24.920 --> 00:01:28.680
Luckily, this computation has a really nice geometric interpretation.
00:01:29.320 --> 00:01:39.320
To think about the dot product between two vectors 𝐕 and 𝐖, imagine projecting 𝐖 onto the line that passes through the origin and the tip of 𝐕.
00:01:39.320 --> 00:01:44.440
Multiplying the length of this projection by the length of 𝐕, you have the dot product 𝐕 dot 𝐖.
00:01:46.520 --> 00:01:52.120
Except when this projection of 𝐖 is pointing in the opposite direction from 𝐕, that dot product will actually be negative.
00:01:54.000 --> 00:01:57.800
So when two vectors are generally pointing in the same direction, their dot product is positive.
00:01:59.560 --> 00:02:05.600
When they’re perpendicular, meaning the projection of one onto the other is the zero vector, their dot product is zero.
00:02:06.080 --> 00:02:09.520
And if they’re pointing generally the opposite direction, their dot product is negative.
00:02:11.760 --> 00:02:16.520
Now, this interpretation is weirdly asymmetric; it treats the two vectors very differently.
00:02:16.960 --> 00:02:19.880
So when I first learned this, I was surprised that order doesn’t matter.
00:02:21.360 --> 00:02:28.040
You could instead project 𝐕 onto 𝐖; multiply the length of the projected 𝐕 by the length of 𝐖 and get the same result.
00:02:30.360 --> 00:02:32.880
I mean, doesn’t that feel like a really different process?
00:02:35.560 --> 00:02:54.960
Here’s the intuition for why order doesn’t matter: if 𝐕 and 𝐖 happened to have the same length, we could leverage some symmetry, since projecting 𝐖 onto 𝐕 then multiplying the length of that projection by the length of 𝐕 is a complete mirror image of projecting 𝐕 onto 𝐖 then multiplying the length of that projection by the length of 𝐖.
00:02:57.280 --> 00:03:04.280
Now, if you scale one of them, say 𝐕 by some constant like two, so that they don’t have equal length, the symmetry is broken.
00:03:05.040 --> 00:03:09.920
But let’s think through how to interpret the dot product between this new vector two times 𝐕 and 𝐖.
00:03:10.960 --> 00:03:19.720
If you think of 𝐖 is getting projected onto 𝐕, then the dot product two 𝐕 dot 𝐖 will be exactly twice the dot product 𝐕 dot 𝐖.
00:03:20.480 --> 00:03:29.200
This is because when you scale 𝐕 by two, it doesn’t change the length of the projection of 𝐖, but it doubles the length of the vector that you’re projecting onto.
00:03:29.200 --> 00:03:34.080
But, on the other hand, let’s say you’re thinking about 𝐕 getting projected onto 𝐖.
00:03:35.000 --> 00:03:39.680
Well, in that case, the length of the projection is the thing to get scaled when we multiply 𝐕 by two.
00:03:40.200 --> 00:03:42.800
The length of the vector that you’re projecting onto stays constant.
00:03:43.520 --> 00:03:46.520
So the overall effect is still to just double the dot product.
00:03:47.240 --> 00:03:55.000
So, even though symmetry is broken in this case, the effect that this scaling has on the value of the dot product is the same under both interpretations.
00:03:56.920 --> 00:04:00.120
There’s also one other big question that confused me when I first learned this stuff.
00:04:00.880 --> 00:04:08.720
Why on earth does this numerical process of matching coordinates, multiplying pairs, and adding them together have anything to do with projection?
00:04:10.960 --> 00:04:21.160
Well, to give a satisfactory answer and also to do full justice to the significance of the dot product, we need to unearth something a little bit deeper going on here, which often goes by the name “duality.”
00:04:22.120 --> 00:04:29.880
But before getting into that, I need to spend some time talking about linear transformations from multiple dimensions to one dimension, which is just the number line.
00:04:32.680 --> 00:04:35.560
These are functions that take in a 2D vector and spit out some number.
00:04:36.160 --> 00:04:42.240
But linear transformations are, of course, much more restricted than your run-of-the-mill function with a 2D input and a 1D output.
00:04:43.000 --> 00:04:49.880
As with transformations in higher dimensions, like the ones I talked about in chapter 3, there are some formal properties that make these functions linear.
00:04:50.280 --> 00:04:58.040
But I’m going to purposely ignore those here so as to not distract from our end goal and instead focus on a certain visual property that’s equivalent to all the formal stuff.
00:04:59.160 --> 00:05:10.920
If you take a line of evenly spaced dots and apply a transformation, a linear transformation will keep those dots evenly spaced, once they land in the output space, which is the number line.
00:05:12.320 --> 00:05:17.080
Otherwise, if there’s some line of dots that gets unevenly spaced, then your transformation is not linear.
00:05:19.360 --> 00:05:26.360
As with the cases we’ve seen before, one of these linear transformations is completely determined by where it takes 𝑖-hat and 𝑗-hat.
00:05:26.960 --> 00:05:30.080
But this time, each one of those basis vectors just lands on a number.
00:05:30.640 --> 00:05:36.720
So when we record where they land as the columns of a matrix, each of those columns just has a single number.
00:05:38.480 --> 00:05:39.800
This is a one-by-two matrix.
00:05:41.840 --> 00:05:45.440
Let’s walk through an example of what it means to apply one of these transformations to a vector.
00:05:46.360 --> 00:05:51.520
Let’s say you have a linear transformation that takes 𝑖-hat to one and 𝑗-hat to negative two.
00:05:52.440 --> 00:06:00.880
To follow where a vector with coordinates, say, four, three, ends up, think of breaking up this vector as four times 𝑖-hat plus three times 𝑗-hat.
00:06:01.840 --> 00:06:15.280
A consequence of linearity is that after the transformation, the vector will be four times the place where 𝑖-hat lands, one, plus three times the place where 𝑗-hat lands, negative two, which in this case implies that it lands on negative two.
00:06:18.160 --> 00:06:22.280
When you do this calculation purely numerically, it’s matrix-vector multiplication.
00:06:26.080 --> 00:06:32.880
Now, this numerical operation of multiplying a one-by-two matrix by a vector feels just like taking the dot product of two vectors.
00:06:33.360 --> 00:06:36.760
Doesn’t that one-by-two matrix just look like a vector that we tipped on its side?
00:06:38.000 --> 00:06:52.600
In fact, we could say right now that there’s a nice association between one-by-two matrices and 2D vectors, defined by tilting the numerical representation of a vector on its side to get the associated matrix or to tip the matrix back up to get the associated vector.
00:06:53.560 --> 00:07:00.680
Since we’re just looking at numerical expressions right now, going back and forth between vectors and one-by-two matrices might feel like a silly thing to do.
00:07:01.760 --> 00:07:04.960
But this suggests something that’s truly awesome from the geometric view.
00:07:05.720 --> 00:07:11.800
There’s some kind of connection between linear transformations that take vectors to numbers and vectors themselves.
00:07:15.560 --> 00:07:21.240
Let me show an example that clarifies the significance and which just so happens to also answer the dot product puzzle from earlier.
00:07:22.080 --> 00:07:27.120
Unlearn what you have learned and imagine that you don’t already know that the dot product relates to projection.
00:07:29.120 --> 00:07:35.960
What I’m gonna do here is take a copy of the number line and place it diagonally and space somehow with the number zero sitting at the origin.
00:07:36.880 --> 00:07:41.840
Now think of the two-dimensional unit vector, whose tips sit where the number one on the number line is.
00:07:42.480 --> 00:07:44.480
I want to give that guy a name, 𝐮-hat.
00:07:45.720 --> 00:07:49.840
This little guy plays an important role in what’s about to happen, so just keep them in the back of your mind.
00:07:51.040 --> 00:07:58.960
If we project 2D vectors straight onto this diagonal number line, in effect, we’ve just defined a function that takes 2D vectors to numbers.
00:07:59.680 --> 00:08:08.800
What’s more, this function is actually linear since it passes our visual test that any line of evenly spaced dots remains evenly spaced once it lands on the number line.
00:08:11.920 --> 00:08:19.320
Just to be clear, even though I’ve embedded the number line in 2D space like this, the output of the function are numbers, not 2D vectors.
00:08:19.920 --> 00:08:23.600
You should think of a function that takes in two coordinates and outputs a single coordinate.
00:08:25.080 --> 00:08:29.000
But that vector 𝐮-hat is a two-dimensional vector living in the input space.
00:08:29.400 --> 00:08:33.040
It’s just situated in such a way that overlaps with the embedding of the number line.
00:08:34.680 --> 00:08:44.560
With this projection, we just defined a linear transformation from 2D vectors to numbers, so we’re gonna be able to find some kind of one-by-two matrix that describes that transformation.
00:08:45.480 --> 00:08:56.480
To find that one-by-two matrix, let’s zoom in on this diagonal number line setup and think about where 𝑖-hat and 𝑗-hat each land, since those landing spots are gonna be the columns of the matrix.
00:08:58.600 --> 00:09:01.680
This part is super cool; we can reason through it with a really elegant piece of symmetry.
00:09:01.680 --> 00:09:13.160
Since 𝑖-hat and 𝐮-hat are both unit vectors, projecting 𝑖-hat onto the line passing through 𝐮-hat looks totally symmetric to projecting 𝐮-hat onto the 𝑥-axis.
00:09:13.760 --> 00:09:17.000
So when we ask: what number does 𝑖-hat land on when it gets projected?
00:09:17.440 --> 00:09:22.120
The answer is gonna be the same as whatever 𝐮-hat lands on when its projected onto the 𝑥-axis.
00:09:22.880 --> 00:09:28.560
But projecting 𝐮-hat onto the 𝑥-axis just means taking the 𝑥-coordinate of 𝐮-hat.
00:09:29.080 --> 00:09:36.560
So, by symmetry, the number where 𝑖-hat lands when it’s projected onto that diagonal number line is gonna be the 𝑥-coordinate of 𝐮-hat.
00:09:37.120 --> 00:09:37.640
Isn’t that cool?
00:09:39.320 --> 00:09:41.760
The reasoning is almost identical for the 𝑗-hat case.
00:09:42.160 --> 00:09:43.000
Think about it for a moment.
00:09:49.480 --> 00:09:56.520
For all the same reasons, the 𝑦-coordinate of 𝐮-hat gives us the number where 𝑗-hat lands when it’s projected onto the number line copy.
00:09:57.560 --> 00:10:00.160
Pause and ponder that for a moment; I just think that’s really cool.
00:10:01.120 --> 00:10:07.160
So the entries of the one-by-two matrix describing the projection transformation are going to be the coordinates of 𝐮-hat.
00:10:07.960 --> 00:10:18.800
And computing this projection transformation for arbitrary vectors in space, which requires multiplying that matrix by those vectors, is computationally identical to taking a dot product with 𝐮-hat.
00:10:21.960 --> 00:10:30.360
This is why taking the dot product with a unit vector can be interpreted as projecting a vector onto the span of that unit vector and taking the length.
00:10:34.280 --> 00:10:35.880
So what about non-unit vectors?
00:10:36.320 --> 00:10:40.520
For example, let’s say we take that unit vector 𝐮-hat, but we scale it up by a factor of three.
00:10:41.320 --> 00:10:44.320
Numerically, each of its components gets multiplied by three.
00:10:44.840 --> 00:10:52.320
So looking at the matrix associated with that vector, it takes 𝑖-hat and 𝑗-hat to three times the values where they landed before.
00:10:55.600 --> 00:11:04.560
Since this is all linear, it implies more generally, that the new matrix can be interpreted as projecting any vector onto the number line copy and multiplying where it lands by three.
00:11:05.400 --> 00:11:14.840
This is why the dot product with a non-unit vector can be interpreted as first projecting onto that vector then scaling up the length of that projection by the length of the vector.
00:11:17.880 --> 00:11:19.080
Take a moment to think about what happened here.
00:11:19.440 --> 00:11:30.720
We had a linear transformation from 2D space to the number line, which was not defined in terms of numerical vectors or numerical dot products; it was just defined by projecting space onto a diagonal copy of the number line.
00:11:31.640 --> 00:11:36.880
But because the transformation is linear, it was necessarily described by some one-by-two matrix.
00:11:37.280 --> 00:11:47.880
And since multiplying a one-by-two matrix by a 2D vector is the same as turning that matrix on its side and taking a dot product, this transformation was inescapably related to some 2D vector.
00:11:49.840 --> 00:12:06.200
The lesson here is that anytime you have one of these linear transformations, whose output space is the number line, no matter how it was defined there’s gonna be some unique vector 𝐕 corresponding to that transformation, in the sense that applying the transformation is the same thing as taking a dot product with that vector.
00:12:09.920 --> 00:12:11.840
To me, this is utterly beautiful.
00:12:12.840 --> 00:12:15.120
It’s an example of something in math called “duality.”
00:12:16.320 --> 00:12:21.880
Duality shows up in many different ways and forms throughout math, and it’s super tricky to actually define.
00:12:22.560 --> 00:12:30.120
Loosely speaking, it refers to situations where you have a natural, but surprising correspondence between two types of mathematical thing.
00:12:31.040 --> 00:12:37.800
For the linear algebra case that you just learned about, you’d say that the dual of a vector is the linear transformation that it encodes.
00:12:38.720 --> 00:12:44.600
And the dual of a linear transformation from some space to one dimension is a certain vector in that space.
00:12:47.080 --> 00:12:56.240
So, to sum up, on the surface, the dot product is a very useful geometric tool for understanding projections and for testing whether or not vectors tend to point in the same direction.
00:12:56.920 --> 00:13:00.360
And that’s probably the most important thing for you to remember about the dot product.
00:13:00.360 --> 00:13:07.640
But at deeper level, dotting two vectors together is a way to translate one of them into the world of transformations.
00:13:08.480 --> 00:13:14.320
Again, numerically, this might feel like a silly point to emphasize; it’s just two computations that happen to look similar.
00:13:15.000 --> 00:13:30.040
But the reason I find this so important, is that throughout math, when you’re dealing with a vector, once you really get to know its personality sometimes you realize that it’s easier to understand it not as an arrow in space, but as the physical embodiment of a linear transformation.
00:13:30.760 --> 00:13:40.800
It’s as if the vector is really just a conceptual shorthand for certain transformation, since it’s easier for us to think about arrows and space rather than moving all of that space to the number line.
00:13:43.080 --> 00:14:10.640
In the next video, you’ll see another really cool example of this duality in action as I talk about the cross product.