# Linear Algebra for Deep Learning

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

WORK IN PROGRESS!

# Linear Algebra

Linear Algebra is a branch of mathematics that seeks to describe lines and planes using structures like vectors and matrices. Khan Academy has awesome Linear Algebra tutorials.

# Vector

Vectors in geometry are 1-dimensional arrays of numbers or functions used to operate on points on a line or plane. Vectors store the magnitude and direction of a potential change to a point. A vector with more than one dimension is called a matrix.

## Vector Notation

There are a variety of ways to represent vectors:

Or more simply: $\vec v = (v1, v2)$

## How Vectors Work

Vectors typically represent movement from a point. They store both the magnitude and direction of potential changes to a point. This vector says move left 2 units and up 5 units. $\vec v = (-2,5)$

A vector can be applied to any point on a plane:

The vector’s direction equals the slope created by moving up 5 and left 2. Its magnitude equals the length of the hypotenuse (the long side in a right angle triangle).

## Scalar Operations

The simplest vector operation is a scalar operation. It involves one vector and one number. You simply apply the number to all the values in the larger vector--add, subtract, multiply, etc. $\begin{bmatrix} {2} \\ {2} \\ {2} \\ \end{bmatrix} + 1 = \begin{bmatrix} {3} \\ {3} \\ {3} \\ \end{bmatrix}$

## Elementwise Operations

The vector operations below are elementwise. The first value in first vector is matched with the first value in the second vector. The second with the second, and so on. This means the vectors MUST have equal dimensions to complete the operation. There is one exception: if one of the vectors is of size one it can be treated like a scalar and applied to all the values in the larger vector.

The output of vector addition is another vector. $\begin{bmatrix} {a_1} \\ {a_2} \\ {...} \\ {a_n} \\ \end{bmatrix} + \begin{bmatrix} {b_1} \\ {b_2} \\ {...} \\ {b_n} \\ \end{bmatrix} = \begin{bmatrix} {a_1+b_1} \\ {a_2+b_2} \\ {...} \\ {a_n+b_n} \\ \end{bmatrix}$

Code

y = np.array([1,2,3])
x = np.array([2,3,4])
y + x = [3, 5, 7]


## Subtraction

The output of vector subtraction is another vector. $\begin{bmatrix} {a_1} \\ {a_2} \\ {...} \\ {a_n} \\ \end{bmatrix} - \begin{bmatrix} {b_1} \\ {b_2} \\ {...} \\ {b_n} \\ \end{bmatrix} = \begin{bmatrix} {a_1-b_1} \\ {a_2-b_2} \\ {...} \\ {a_n-b_n} \\ \end{bmatrix}$

Code

y = np.array([1,2,3])
x = np.array([2,3,4])
y - x = [-1, -1, -1]


## Division

The output of vector division is another vector. $\begin{bmatrix} {a_1} \\ {a_2} \\ {...} \\ {a_n} \\ \end{bmatrix} / \begin{bmatrix} {b_1} \\ {b_2} \\ {...} \\ {b_n} \\ \end{bmatrix} = \begin{bmatrix} {\frac{a_1}{b_1}} \\ {\frac{a_2}{b_2}} \\ {...} \\ {\frac{a_n}{b_n}} \\ \end{bmatrix}$

Code

y = np.array([1,2,3]).astype(float)
x = np.array([2,3,4]).astype(float)
y / x = [.5, .67, .75]


## Multiplication

Deep learning relies on two different techniques for multiplying vectors/matrices: Dot Product and Hadamard Product. In both cases the inputs must be of equal dimensions.

### Dot Product

The output of dot product is a scalar number. $\begin{bmatrix} {a_1} \\ {a_2} \\ {...} \\ {a_n} \\ \end{bmatrix} \cdot \begin{bmatrix} {b_1} \\ {b_2} \\ {...} \\ {b_n} \\ \end{bmatrix} = a_1 b_1+a_2 b_2+ ... +a_n b_n$

Code

y = np.array([1,2,3])
x = np.array([2,3,4])
np.dot(y,x) = 20


Element-wise multiplication of two vectors. The output of Hadamard product is another vector. $\begin{bmatrix} {a_1} \\ {a_2} \\ {...} \\ {a_n} \\ \end{bmatrix} \cdot \begin{bmatrix} {b_1} \\ {b_2} \\ {...} \\ {b_n} \\ \end{bmatrix} = \begin{bmatrix} {a_1 \cdot b_1} \\ {a_2 \cdot b_2} \\ {...} \\ {a_n \cdot b_n} \\ \end{bmatrix}$

Code

y = np.array([1,2,3])
x = np.array([2,3,4])
y * x = [2, 6, 12]


## Unit Vectors

A unit vector is a vector with a magnitude of 1. Unit vectors can have any slope (move in any direction), but the magnitude (length of vector) must equal 1. They are useful when you care only about the direction of the change, and not the magnitude. Unit vectors are used by directional derivatives.

Example

Given the vector (3, 4), the magnitude is 5 (hypotenuse). This is not a unit vector. To find the unit vector of this vector, we divide each value by the magnitude of the vector. The new vector (3/5, 4/5) points in the same direction and has a magnitude of 1.

## Vector Fields

A vector field is a diagram that shows how for a given point in space (a,b), where that point would move in your if you apply a vector function to it. Given a point, the vector field shows the “power” and “direction” of our vector function. Here is an example vector field:

Wait, why does the vector field point in different directions? Shouldn’t it look like this:

The difference is because the second vector contains only scalar numbers. So from any point we always move over 2 and up 5. The first vector on the other hand contains functions. So for each point, we derive the direction by inputing the coordinates into a function. For non-linear functions, things can become very fancy indeed.

# Matrix

A matrix is a rectangular grid of numbers or terms (like an Excel spreadsheet) with special rules for addition, subtraction, and multiplication.

## Dimensions

We describe the dimensions of a matrix as rows x columns. $\begin{bmatrix} 2 & 4 \\ 5 & -7 \\ 12 & 5 \\ \end{bmatrix} = \text{3 x 2}$ $\begin{bmatrix} 2 \\ -3 \\ 5 \\ \end{bmatrix} = \text{3 x 1}$ $\begin{bmatrix} a^2 & 2a & 8\\ 18 & 7a-4 & 10\\ \end{bmatrix} = \text{2 x 3}$

Code

a = np.array([
[1,2,3],
[4,5,6]
])
a.shape == (2,3)


## Scalar Operations

Scalar operations with matrices work the same way as they do for vectors. Simply apply the scalar to every value in the matrix--add, subtract, divide, multiply, etc. $\begin{bmatrix} 2 & 3 \\ 2 & 3 \\ 2 & 3 \\ \end{bmatrix} + 1 = \begin{bmatrix} 3 & 4 \\ 3 & 4 \\ 3 & 4 \\ \end{bmatrix}$

Code

a = np.array([
[1,2],
[3,4]
])
a + 1
[[2,3],
[4,5]]


## Elementwise Operations

In order to add, subtract, or divide the dimensions must be the same. We combine the corresponding values in each matrix in elementwise fashion. $\begin{bmatrix} a & b \\ c & d \\ \end{bmatrix} + \begin{bmatrix} 1 & 2\\ 3 & 4 \\ \end{bmatrix} = \begin{bmatrix} a+1 & b+2\\ c+3 & d+4 \\ \end{bmatrix}$

Code

a = np.array([
[1,2],
[3,4]
])
b = np.array([
[1,2],
[3,4]
])

a + b
[[2, 4],
[6, 8]]

a - b
[[0, 0],
[0, 0]]


## Multiplication

Matrix multiplication specifies a set of rules for multiplying matrices to produce a new matrix.

### Why is it Useful?

It turns complicated problems into simple, more efficiently calculated problems. It’s used in a number of fields including machine learning, computer graphics, and population ecology. Source

### Rules

Not all matrices are eligible for multiplication. In addition, there is a requirement on the dimensions of the matrix product. Source.

1. The number of columns in the first matrix must equal the number of rows in the second
2. The product of an M x N matrix and an N x K matrix is an M x K matrix. The new matrix takes the rows of the first and columns of the second.

### Steps

Matrix multiplication uses Dot Product to multiply various combinations of rows and columns to derive its product. In the image below, each entry in Matrix C is the dot product of a row in matrix A and a column in matrix B.

The operation $a_1 \cdot b_1$ means we take the dot product of the 1st row in matrix A $(1, 7)$ and the 1st column in matrix B $(3, 5)$. $a_1 \cdot b_1 = \begin{bmatrix} 1 \\ 7 \\ \end{bmatrix} \cdot \begin{bmatrix} 3 \\ 5 \\ \end{bmatrix} = (1 \cdot 3) + (7 \cdot 5) = 38$

Here's another way to look at it: $\begin{bmatrix} a & b \\ c & d \\ e & f \\ \end{bmatrix} \cdot \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ \end{bmatrix} = \begin{bmatrix} 1a + 3b & 2a + 4b \\ 1c + 3d & 2c + 4d \\ 1e + 3f & 2e + 4f \\ \end{bmatrix}$

### Examples

Q1: What are the dimensions of the matrix product? $\begin{bmatrix} 1 & 2 \\ 5 & 6 \\ \end{bmatrix} \cdot \begin{bmatrix} 1 & 2 & 3 \\ 5 & 6 & 7 \\ \end{bmatrix} = \text{2 x 3}$

Q2: What are the dimension of the matrix product? $\begin{bmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \\ \end{bmatrix} \cdot \begin{bmatrix} 1 & 2 \\ 5 & 6 \\ 3 & 0 \\ 2 & 1 \\ \end{bmatrix} = \text{3 x 2}$

Q3: What is the matrix product? $\begin{bmatrix} 2 & 3 \\ 1 & 4 \\ \end{bmatrix} \cdot \begin{bmatrix} 5 & 4 \\ 3 & 5 \\ \end{bmatrix} = \begin{bmatrix} 19 & 23 \\ 17 & 24 \\ \end{bmatrix}$

Q4: What is the matrix product? $\begin{bmatrix} 3 \\ 5 \\ \end{bmatrix} \cdot \begin{bmatrix} 1 & 2 & 3\\ \end{bmatrix} = \begin{bmatrix} 3 & 6 & 9 \\ 5 & 10 & 15 \\ \end{bmatrix}$

Q5: What is the matrix product? $\begin{bmatrix} 1 & 2 & 3\\ \end{bmatrix} \cdot \begin{bmatrix} 4 \\ 5 \\ 6 \\ \end{bmatrix} = \begin{bmatrix} 32 \\ \end{bmatrix}$

### Why Does It Work This Way?

It’s an arbitrary human construct. There is no mathematical law underlying why it's done this way. Mathematicians decided on this approach because it turned out to be very useful in real life.

### Code

Matrix multiplication with Numpy.

a = np.array([
[1, 2]
])
a.shape == (1,2)

b = np.array([
[3, 4],
[5, 6]
])
b.shape == (2,2)

#Matrix Multiply
mm = np.dot(a,b)
mm == [13, 16]
mm.shape == (1,2)


Hadamard Product is an elementwise multiplication: positionally corresponding values are multiplied to product a new matrix. $\begin{bmatrix} a_1 & a_2 \\ a_3 & a_4 \\ \end{bmatrix} \cdot \begin{bmatrix} b_1 & b_2 \\ b_3 & b_4 \\ \end{bmatrix} = \begin{bmatrix} a_1 \cdot b_1 & a_2 \cdot b_2 \\ a_3 \cdot b_3 & a_4 \cdot b_4 \\ \end{bmatrix}$
a = np.array([
[2,3],
[2,3]
])
b = np.array([
[3,4],
[5,6]
])
a * b
[[ 6, 12],
[10, 18]]


You can take the Hadamard product of a matrix and a vector as long as they're dimensions are compatible (see numpy broadcasting). $\begin{bmatrix} {a_1} \\ {a_2} \\ \end{bmatrix} \cdot \begin{bmatrix} b_1 & b_2 \\ b_3 & b_4 \\ \end{bmatrix} = \begin{bmatrix} a_1 \cdot b_1 & a_1 \cdot b_2 \\ a_2 \cdot b_3 & a_2 \cdot b_4 \\ \end{bmatrix}$
#Same number of rows
a = np.array([
,

])
b = np.array([
[3,4],
[5,6]
])
a * b
[[ 3,  4],
[10, 12]]

#Same number of columns
c = np.array([
[1,2]
])
b * c
[[ 3,  8],
[5, 12]]


## Identity and Inverse

The Identity Matrix in an $n$ x $n$ square matrix where if multiplied by any other matrix of the same dimensions equals that matrix. It's like multiplying by 1. $M = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix}$

The identity matrix consists of all zeros except for a diagonal line of 1s from top left to the bottom right. Wikipedia.

The inverse matrix of another square matrix of equal dimensions is the matrix where when multiplied together produces the identity matrix.

## Transpose

To transpose a matrix there are two steps:

1. Rotate the matrix 90°
2. Reverse the order of elements in each row (e.g. [a b c] becomes [c b a])

Transpose M into T: $M = \begin{bmatrix} a & b \\ c & d \\ e & f \\ \end{bmatrix}$ $T = \begin{bmatrix} a & c & e \\ b & d & f \\ \end{bmatrix}$

Code
Matrix transpose with numpy.

a = np.array([
[1, 2],
[3, 4]
])

a.T
[[1, 3],
[2, 4]]


## Initializing Matrices

Here are some various ways to initialize a matrix with Numpy.

a = np.array([
[1, 2],
[3, 4]
])

b = np.ones((2,3))
[[1, 1, 1],
[1, 1, 1]]

c = np.zeros((3,2))
[[0, 0],
[0, 0],
[0, 0]]

d = np.random.randn(2, 2)
[[-0.474, -1.040],
[2.063, -0.083]]


# Tutorials

I recommend starting here. These tutorials explain things intuitively with very little math: