Linear Algebra for Deep Learning

From Deep Learning Course Wiki
Jump to: navigation, search


Linear Algebra

Linear Algebra is a branch of mathematics that seeks to describe lines and planes using structures like vectors and matrices.


Vectors in geometry are 1-dimensional arrays of numbers or functions used to operate on points on a line or plane. Vectors store the magnitude and direction of a potential change to a point. A vector with more than one dimension is called a matrix.

Vector Notation

There are a variety of ways to represent vectors:

Vector notation.png

Or more simply:

How Vectors Work

Vectors typically represent movement from a point. They store both the magnitude and direction of potential changes to a point. This vector says move left 2 units and up 5 units.

A vector can be applied to any point on a plane:

Vector field scalar.png

The vector’s direction equals the slope created by moving up 5 and left 2. Its magnitude equals the length of the hypotenuse (the long side in a right angle triangle).

Vector magnitude direction.png


Why are vectors useful?

Vectors can be used to manipulate gradients. Vector operations on gradients help us find the slope of a point in any direction.

Vector Addition

Vector addition is fairly straightforward:

Dot Product

Vector multiplication is called Dot Product. The vectors must be of equal length and the output is a scalar number.


Unit Vectors

A unit vector is a vector with a magnitude of 1. Unit vectors can have any slope (move in any direction), but the magnitude (length of vector) must equal 1. They are useful when you care only about the direction of the change, and not the magnitude. Unit vectors are used by directional derivatives.


Given the vector (3, 4), the magnitude is 5 (hypotenuse). This is not a unit vector. To find the unit vector of this vector, we divide each value by the magnitude of the vector. The new vector (3/5, 4/5) points in the same direction and has a magnitude of 1.

Vector Fields

A vector field is a diagram that shows how for a given point in space (a,b), where that point would move in your if you apply a vector function to it. Given a point, the vector field shows the “power” and “direction” of our vector function. Here is an example vector field:

Vector field functions.png Source

Wait, why does the vector field point in different directions? Shouldn’t it look like this:

Vector field scalar.png Source

The difference is because the second vector contains only scalar numbers. So from any point we always move over 2 and up 5. The first vector on the other hand contains functions. So for each point, we derive the direction by inputing the coordinates into a function. For non-linear functions, things can become very fancy indeed.


A matrix is a rectangular grid of numbers. Like an Excel spreadsheet. We describe the dimensions of a matrix as Rows x Columns. There are a variety of matrix operations, but we will focus on multiplication as it is the most relevant to deep learning.

Matrix Dimensions

Matrix Multiplication

Matrix multiplication specifies a set of rules for multiplying matrices to produce a new matrix.

Why is it Useful?

It turns complicated problems into simple, more efficiently calculated problems. It’s used in a number of fields including machine learning, computer graphics, and population ecology. Source


Not all matrices are eligible for multiplication. In addition, there is a requirement on the dimensions of the matrix product. Source.

  1. The number of columns in the first matrix must equal the number of rows in the second
  2. The product of an M x N matrix and an N x K matrix is an M x K matrix. The new matrix takes the rows of M1 and columns of M2.


Matrix multiplication uses Dot Product to multiply various combinations of rows and columns to derive its product. In the image below, each entry in Matrix C is the dot product of a row in matrix A and a column in matrix B.

Matrix product steps.png Source.

In the image above, represents the vector and represents the vector . When we see , it really means we take the dot product of the first row in matrix A and the first column in matrix B.


Q1: What are the dimensions of the matrix product?

Q2: What are the dimension of the matrix product?

Q3: What is the matrix product?

Why Does It Work This Way?

It’s an arbitrary human construct. There is no mathematical law underlying why it's done this way. Mathematicians decided on this approach because it turned out to be very useful in real life.

Linear Algebra in Deep Learning

Explanation of how linear algebra is used in Deep Learning.