# Linear Algebra for Deep Learning

WORK IN PROGRESS!

# Linear Algebra

Linear Algebra is a branch of mathematics that seeks to describe lines and planes using structures like vectors and matrices. Khan Academy has awesome Linear Algebra tutorials.

# Vector

Vectors in geometry are 1-dimensional arrays of numbers or functions used to operate on points on a line or plane. Vectors store the magnitude and direction of a potential change to a point. A vector with more than one dimension is called a matrix.

## Vector Notation

There are a variety of ways to represent vectors:

Or more simply:

${\displaystyle {\vec {v}}=(v1,v2)}$

## How Vectors Work

Vectors typically represent movement from a point. They store both the magnitude and direction of potential changes to a point. This vector says move left 2 units and up 5 units.

${\displaystyle {\vec {v}}=(-2,5)}$

A vector can be applied to any point on a plane:

The vector’s direction equals the slope created by moving up 5 and left 2. Its magnitude equals the length of the hypotenuse (the long side in a right angle triangle).

## Why are Vectors Useful?

Vectors can be used to manipulate gradients. Vector operations on gradients help us find the slope of a point in any direction.

The output of vector addition is another vector.

${\displaystyle {\begin{bmatrix}{a_{1}}\\{a_{2}}\\{...}\\{a_{n}}\\\end{bmatrix}}+{\begin{bmatrix}{b_{1}}\\{b_{2}}\\{...}\\{b_{n}}\\\end{bmatrix}}={\begin{bmatrix}{a_{1}+b_{1}}\\{a_{2}+b_{2}}\\{...}\\{a_{n}+b_{n}}\\\end{bmatrix}}}$

Code

y = np.array([1,2,3])
x = np.array([2,3,4])
y + x = [3, 5, 7]


## Subtraction

The output of vector subtraction is another vector.

${\displaystyle {\begin{bmatrix}{a_{1}}\\{a_{2}}\\{...}\\{a_{n}}\\\end{bmatrix}}-{\begin{bmatrix}{b_{1}}\\{b_{2}}\\{...}\\{b_{n}}\\\end{bmatrix}}={\begin{bmatrix}{a_{1}-b_{1}}\\{a_{2}-b_{2}}\\{...}\\{a_{n}-b_{n}}\\\end{bmatrix}}}$

Code

y = np.array([1,2,3])
x = np.array([2,3,4])
y - x = [-1, -1, -1]


## Division

The output of vector division is another vector.

${\displaystyle {\begin{bmatrix}{a_{1}}\\{a_{2}}\\{...}\\{a_{n}}\\\end{bmatrix}}/{\begin{bmatrix}{b_{1}}\\{b_{2}}\\{...}\\{b_{n}}\\\end{bmatrix}}={\begin{bmatrix}{\frac {a_{1}}{b_{1}}}\\{\frac {a_{2}}{b_{2}}}\\{...}\\{\frac {a_{n}}{b_{n}}}\\\end{bmatrix}}}$

Code

y = np.array([1,2,3]).astype(float)
x = np.array([2,3,4]).astype(float)
y / x = [.5, .67, .75]


## Multiplication

Deep learning relies on two different techniques for multiplying vectors/matrices: Dot Product and Hadamard Product. In both cases the inputs must be of equal dimensions.

### Dot Product

The output of dot product is a scalar number.

${\displaystyle {\begin{bmatrix}{a_{1}}\\{a_{2}}\\{...}\\{a_{n}}\\\end{bmatrix}}\cdot {\begin{bmatrix}{b_{1}}\\{b_{2}}\\{...}\\{b_{n}}\\\end{bmatrix}}=a_{1}b_{1}+a_{2}b_{2}+...+a_{n}b_{n}}$

Code

y = np.array([1,2,3])
x = np.array([2,3,4])
np.dot(y,x) = 20


Element-wise multiplication of two vectors or matrices. The output of Hadamard product is another vector.

${\displaystyle {\begin{bmatrix}{a_{1}}\\{a_{2}}\\{...}\\{a_{n}}\\\end{bmatrix}}\cdot {\begin{bmatrix}{b_{1}}\\{b_{2}}\\{...}\\{b_{n}}\\\end{bmatrix}}={\begin{bmatrix}{a_{1}\cdot b_{1}}\\{a_{2}\cdot b_{2}}\\{...}\\{a_{n}\cdot b_{n}}\\\end{bmatrix}}}$

Code

y = np.array([1,2,3])
x = np.array([2,3,4])
y * x = [2, 6, 12]


You can take the Hadamard product of a matrix and a vector as long as they have at least one dimension in common: either the same number of rows or the same number of columns. This rule extends to all the above operations except for dot product.

${\displaystyle {\begin{bmatrix}{a_{1}}\\{a_{2}}\\\end{bmatrix}}\cdot {\begin{bmatrix}b_{1}&b_{3}\\b_{2}&b_{4}\\\end{bmatrix}}={\begin{bmatrix}a_{1}\cdot b_{1}&a_{1}\cdot b_{3}\\a_{2}\cdot b_{2}&a_{2}\cdot b_{4}\\\end{bmatrix}}}$
a = np.array([
[1],
[2]
])
b = np.array([
[3,4],
[5,6]
])
a * b
[[ 3,  4],
[10, 12]]


## Unit Vectors

A unit vector is a vector with a magnitude of 1. Unit vectors can have any slope (move in any direction), but the magnitude (length of vector) must equal 1. They are useful when you care only about the direction of the change, and not the magnitude. Unit vectors are used by directional derivatives.

Example

Given the vector (3, 4), the magnitude is 5 (hypotenuse). This is not a unit vector. To find the unit vector of this vector, we divide each value by the magnitude of the vector. The new vector (3/5, 4/5) points in the same direction and has a magnitude of 1.

## Vector Fields

A vector field is a diagram that shows how for a given point in space (a,b), where that point would move in your if you apply a vector function to it. Given a point, the vector field shows the “power” and “direction” of our vector function. Here is an example vector field:

Wait, why does the vector field point in different directions? Shouldn’t it look like this:

The difference is because the second vector contains only scalar numbers. So from any point we always move over 2 and up 5. The first vector on the other hand contains functions. So for each point, we derive the direction by inputing the coordinates into a function. For non-linear functions, things can become very fancy indeed.

# Matrix

A matrix is a rectangular grid of numbers or terms (like an Excel spreadsheet) with special rules for addition, subtraction, and multiplication.

## Dimensions

We describe the dimensions of a matrix as rows x columns.

${\displaystyle {\begin{bmatrix}2&4\\5&-7\\12&5\\\end{bmatrix}}={\text{3 x 2}}}$
${\displaystyle {\begin{bmatrix}2\\-3\\5\\\end{bmatrix}}={\text{3 x 1}}}$
${\displaystyle {\begin{bmatrix}a^{2}&2a&8\\18&7a-4&10\\\end{bmatrix}}={\text{2 x 3}}}$

Code

a = np.array([
[1,2,3],
[4,5,6]
])
a.shape == (2,3)


In order to add or subtract, the dimensions must be the same. We then add/subtract the corresponding values in each matrix.

${\displaystyle {\begin{bmatrix}a&b\\c&d\\\end{bmatrix}}+{\begin{bmatrix}1&2\\3&4\\\end{bmatrix}}={\begin{bmatrix}a+1&b+2\\c+3&d+4\\\end{bmatrix}}}$

Code

a = np.array([
[1,2],
[3,4]
])
b = np.array([
[1,2],
[3,4]
])

a + b
[[2, 4],
[6, 8]]

a - b
[[0, 0],
[0, 0]]


## Multiplication

Matrix multiplication specifies a set of rules for multiplying matrices to produce a new matrix.

### Why is it Useful?

It turns complicated problems into simple, more efficiently calculated problems. It’s used in a number of fields including machine learning, computer graphics, and population ecology. Source

### Rules

Not all matrices are eligible for multiplication. In addition, there is a requirement on the dimensions of the matrix product. Source.

1. The number of columns in the first matrix must equal the number of rows in the second
2. The product of an M x N matrix and an N x K matrix is an M x K matrix. The new matrix takes the rows of the first and columns of the second.

### Steps

Matrix multiplication uses Dot Product to multiply various combinations of rows and columns to derive its product. In the image below, each entry in Matrix C is the dot product of a row in matrix A and a column in matrix B.

The operation ${\displaystyle a_{1}\cdot b_{1}}$ means we take the dot product of the 1st row in matrix A ${\displaystyle (1,7)}$ and the 1st column in matrix B ${\displaystyle (3,5)}$.

${\displaystyle a_{1}\cdot b_{1}={\begin{bmatrix}1\\7\\\end{bmatrix}}\cdot {\begin{bmatrix}3\\5\\\end{bmatrix}}=(1\cdot 3)+(7\cdot 5)=38}$

Here's another way to look at it:

${\displaystyle {\begin{bmatrix}a&b\\c&d\\e&f\\\end{bmatrix}}\cdot {\begin{bmatrix}1&2\\3&4\\\end{bmatrix}}={\begin{bmatrix}1a+3b&2a+4b\\1c+3d&2c+4d\\1e+3f&2e+4f\\\end{bmatrix}}}$

### Examples

Q1: What are the dimensions of the matrix product?

${\displaystyle {\begin{bmatrix}1&2\\5&6\\\end{bmatrix}}\cdot {\begin{bmatrix}1&2&3\\5&6&7\\\end{bmatrix}}={\text{2 x 3}}}$

Q2: What are the dimension of the matrix product?

${\displaystyle {\begin{bmatrix}1&2&3&4\\5&6&7&8\\9&10&11&12\\\end{bmatrix}}\cdot {\begin{bmatrix}1&2\\5&6\\3&0\\2&1\\\end{bmatrix}}={\text{3 x 2}}}$

Q3: What is the matrix product?

${\displaystyle {\begin{bmatrix}2&3\\1&4\\\end{bmatrix}}\cdot {\begin{bmatrix}5&4\\3&5\\\end{bmatrix}}={\begin{bmatrix}19&23\\17&24\\\end{bmatrix}}}$

Q4: What is the matrix product?

${\displaystyle {\begin{bmatrix}3\\5\\\end{bmatrix}}\cdot {\begin{bmatrix}1&2&3\\\end{bmatrix}}={\begin{bmatrix}3&6&9\\5&10&15\\\end{bmatrix}}}$

Q5: What is the matrix product?

${\displaystyle {\begin{bmatrix}1&2&3\\\end{bmatrix}}\cdot {\begin{bmatrix}4\\5\\6\\\end{bmatrix}}={\begin{bmatrix}32\\\end{bmatrix}}}$

### Why Does It Work This Way?

It’s an arbitrary human construct. There is no mathematical law underlying why it's done this way. Mathematicians decided on this approach because it turned out to be very useful in real life.

### Code

Matrix multiplication with Numpy.

a = np.array([
[1, 2]
])
a.shape == (1,2)

b = np.array([
[3, 4],
[5, 6]
])
b.shape == (2,2)

#Matrix Multiply
mm = np.dot(a,b)
mm == [13, 16]
mm.shape == (1,2)


## Identity and Inverse

The Identity Matrix in an ${\displaystyle n}$ x ${\displaystyle n}$ square matrix where if multiplied by any other matrix of the same dimensions equals that matrix. It's like multiplying by 1.

${\displaystyle M={\begin{bmatrix}1&0&0\\0&1&0\\0&0&1\\\end{bmatrix}}}$

The identity matrix consists of all zeros except for a diagonal line of 1s from top left to the bottom right. Wikipedia.

The inverse matrix of another square matrix of equal dimensions is the matrix where when multiplied together produces the identity matrix.

## Transpose

To transpose a matrix there are two steps:

1. Rotate the matrix 90°
2. Reverse the order of elements in each row (e.g. [a b c] becomes [c b a])

Transpose M into T:

${\displaystyle M={\begin{bmatrix}a&b\\c&d\\e&f\\\end{bmatrix}}}$
${\displaystyle T={\begin{bmatrix}a&c&e\\b&d&f\\\end{bmatrix}}}$

Code
Matrix transpose with numpy.

a = np.array([
[1, 2],
[3, 4]
])

a.T
[[1, 3],
[2, 4]]


# Linear Algebra in Deep Learning

Explanation of how linear algebra is used in Deep Learning.

# Tutorials

I recommend starting here. These tutorials explain things intuitively with very little math: