Linear Algebra for Deep Learning
WORK IN PROGRESS!
- 1 Linear Algebra
- 2 Vector
- 3 Matrix
- 4 Tutorials
- 5 References
Linear Algebra is a branch of mathematics that seeks to describe lines and planes using structures like vectors and matrices. Khan Academy has awesome Linear Algebra tutorials.
Vectors in geometry are 1-dimensional arrays of numbers or functions used to operate on points on a line or plane. Vectors store the magnitude and direction of a potential change to a point. A vector with more than one dimension is called a matrix.
There are a variety of ways to represent vectors:
Or more simply:
How Vectors Work
Vectors typically represent movement from a point. They store both the magnitude and direction of potential changes to a point. This vector says move left 2 units and up 5 units.
A vector can be applied to any point on a plane:
The vector’s direction equals the slope created by moving up 5 and left 2. Its magnitude equals the length of the hypotenuse (the long side in a right angle triangle).
The simplest vector operation is a scalar operation. It involves one vector and one number. You simply apply the number to all the values in the larger vector--add, subtract, multiply, etc.
The vector operations below are elementwise. The first value in first vector is matched with the first value in the second vector. The second with the second, and so on. This means the vectors MUST have equal dimensions to complete the operation. There is one exception: if one of the vectors is of size one it can be treated like a scalar and applied to all the values in the larger vector.
The output of vector addition is another vector.
y = np.array([1,2,3]) x = np.array([2,3,4]) y + x = [3, 5, 7]
The output of vector subtraction is another vector.
y = np.array([1,2,3]) x = np.array([2,3,4]) y - x = [-1, -1, -1]
The output of vector division is another vector.
y = np.array([1,2,3]).astype(float) x = np.array([2,3,4]).astype(float) y / x = [.5, .67, .75]
Deep learning relies on two different techniques for multiplying vectors/matrices: Dot Product and Hadamard Product. In both cases the inputs must be of equal dimensions.
The output of dot product is a scalar number.
y = np.array([1,2,3]) x = np.array([2,3,4]) np.dot(y,x) = 20
Element-wise multiplication of two vectors. The output of Hadamard product is another vector.
y = np.array([1,2,3]) x = np.array([2,3,4]) y * x = [2, 6, 12]
A unit vector is a vector with a magnitude of 1. Unit vectors can have any slope (move in any direction), but the magnitude (length of vector) must equal 1. They are useful when you care only about the direction of the change, and not the magnitude. Unit vectors are used by directional derivatives.
Given the vector (3, 4), the magnitude is 5 (hypotenuse). This is not a unit vector. To find the unit vector of this vector, we divide each value by the magnitude of the vector. The new vector (3/5, 4/5) points in the same direction and has a magnitude of 1.
A vector field is a diagram that shows how for a given point in space (a,b), where that point would move in your if you apply a vector function to it. Given a point, the vector field shows the “power” and “direction” of our vector function. Here is an example vector field:
Wait, why does the vector field point in different directions? Shouldn’t it look like this:
The difference is because the second vector contains only scalar numbers. So from any point we always move over 2 and up 5. The first vector on the other hand contains functions. So for each point, we derive the direction by inputing the coordinates into a function. For non-linear functions, things can become very fancy indeed.
A matrix is a rectangular grid of numbers or terms (like an Excel spreadsheet) with special rules for addition, subtraction, and multiplication.
We describe the dimensions of a matrix as rows x columns.
a = np.array([ [1,2,3], [4,5,6] ]) a.shape == (2,3)
Scalar operations with matrices work the same way as they do for vectors. Simply apply the scalar to every value in the matrix--add, subtract, divide, multiply, etc.
a = np.array([ [1,2], [3,4] ]) a + 1 [[2,3], [4,5]]
In order to add, subtract, or divide the dimensions must be the same. We combine the corresponding values in each matrix in elementwise fashion.
a = np.array([ [1,2], [3,4] ]) b = np.array([ [1,2], [3,4] ]) a + b [[2, 4], [6, 8]] a - b [[0, 0], [0, 0]]
Matrix multiplication specifies a set of rules for multiplying matrices to produce a new matrix.
Why is it Useful?
It turns complicated problems into simple, more efficiently calculated problems. It’s used in a number of fields including machine learning, computer graphics, and population ecology. Source
Not all matrices are eligible for multiplication. In addition, there is a requirement on the dimensions of the matrix product. Source.
- The number of columns in the first matrix must equal the number of rows in the second
- The product of an M x N matrix and an N x K matrix is an M x K matrix. The new matrix takes the rows of the first and columns of the second.
Matrix multiplication uses Dot Product to multiply various combinations of rows and columns to derive its product. In the image below, each entry in Matrix C is the dot product of a row in matrix A and a column in matrix B.
The operation means we take the dot product of the 1st row in matrix A and the 1st column in matrix B .
Here's another way to look at it:
Q1: What are the dimensions of the matrix product?
Q2: What are the dimension of the matrix product?
Q3: What is the matrix product?
Q4: What is the matrix product?
Q5: What is the matrix product?
Why Does It Work This Way?
It’s an arbitrary human construct. There is no mathematical law underlying why it's done this way. Mathematicians decided on this approach because it turned out to be very useful in real life.
Matrix multiplication with Numpy.
a = np.array([ [1, 2] ]) a.shape == (1,2) b = np.array([ [3, 4], [5, 6] ]) b.shape == (2,2) #Matrix Multiply mm = np.dot(a,b) mm == [13, 16] mm.shape == (1,2)
Hadamard Product is an elementwise multiplication: positionally corresponding values are multiplied to product a new matrix.
a = np.array([ [2,3], [2,3] ]) b = np.array([ [3,4], [5,6] ]) a * b [[ 6, 12], [10, 18]]
You can take the Hadamard product of a matrix and a vector as long as they're dimensions are compatible (see numpy broadcasting).
#Same number of rows a = np.array([ ,  ]) b = np.array([ [3,4], [5,6] ]) a * b [[ 3, 4], [10, 12]] #Same number of columns c = np.array([ [1,2] ]) b * c [[ 3, 8], [5, 12]]
Identity and Inverse
The Identity Matrix in an x square matrix where if multiplied by any other matrix of the same dimensions equals that matrix. It's like multiplying by 1.
The identity matrix consists of all zeros except for a diagonal line of 1s from top left to the bottom right. Wikipedia.
The inverse matrix of another square matrix of equal dimensions is the matrix where when multiplied together produces the identity matrix.
To transpose a matrix there are two steps:
- Rotate the matrix 90°
- Reverse the order of elements in each row (e.g. [a b c] becomes [c b a])
Transpose M into T:
Matrix transpose with numpy.
a = np.array([ [1, 2], [3, 4] ]) a.T [[1, 3], [2, 4]]
Here are some various ways to initialize a matrix with Numpy.
a = np.array([ [1, 2], [3, 4] ]) b = np.ones((2,3)) [[1, 1, 1], [1, 1, 1]] c = np.zeros((3,2)) [[0, 0], [0, 0], [0, 0]] d = np.random.randn(2, 2) [[-0.474, -1.040], [2.063, -0.083]]
I recommend starting here. These tutorials explain things intuitively with very little math:
- Khan Academy Linear Algebra
- Andrew Ngs Course Notes
- Explanation of Linear Algebra
- Explanation of Matrices
- Intro To Linear Algebra
- matrix product
- matrix inverse
- orthogonal matrices