Weeks 12 and 13 are about using external scientific and engineering libraries that are not inside core Python, but are popular enough that we should learn them too.
NumPy is a Python library for easy and efficient numerical computation of multi-dimensional arrays (vectors, matrices, etc.) You can install it via either pip or conda with, for example the following command:
pip install numpy
We first have to import numpy:
import numpy as np
Numpy works with lists (and lists of lists, etc.) and the core data type is something called numpy.ndarray, which we can create using the np.array() construction function:
l = [1, 2, 3]
arr = np.array(l)
type(arr)
numpy.ndarray
This is how we can create a 2D matrix:
arr = np.array([
[1, 2, 3, 4],
[5, 6, 7, 8]
])
A numpy array holds integers or floating point numbers, which can be checked by looking at its dtype:
arr.dtype
dtype('int64')
We can get the dimensions of a numpy array using .shape:
arr.shape # 2 rows, 4 columns
(2, 4)
.ndim gives the number of dimensions:
arr.ndim
2
A numpy array can be reshaped into a different shape using .reshape(new_shape). This will create a new array with the requested shape.
arr.reshape(8,1)
array([[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8]])
arr.reshape(2, 2, 2) # 2x2x2 cube
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
To get the number of elements inside an array, use .size:
arr.size
8
Similarly to lists in Python, we can access elements inside an array using the same square bracket syntax:
print(arr)
[[1 2 3 4] [5 6 7 8]]
arr[0][2]
3
As a special trick, we can access the same index as above using the following syntax:
arr[0, 2] # separated by commas
3
To get all of the elements of the second columns (counting from zero):
arr[:, 2]
array([3, 7])
To get all of the elements of the first row (counting from zero):
arr[1, :]
array([5, 6, 7, 8])
We can also create arrays that are automatically filled with a specific value:
np.zeros((10, 10))
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
np.ones((5, 5))
array([[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]])
np.identity(8) # Create an 8x8 identity matrix
array([[1., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0.],
[0., 0., 1., 0., 0., 0., 0., 0.],
[0., 0., 0., 1., 0., 0., 0., 0.],
[0., 0., 0., 0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0., 1., 0., 0.],
[0., 0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 1.]])
# Create a matrix filled with random values sampled from [0, 1]
np.random.normal(size=(4, 4))
array([[ 0.59264239, 1.22055596, 1.45809488, 1.48312865],
[ 0.39275385, -0.4658526 , 0.75872202, -1.82229149],
[-1.24125527, -1.73792977, 1.32602209, 0.33317698],
[ 0.1169786 , -2.50504403, 1.1665632 , -1.15352151]])
We can create arrays that with .arange, that mimics the behavior of range, as follows:
np.arange(1, 10, 2)
array([1, 3, 5, 7, 9])
list(range(1, 10, 2))
[1, 3, 5, 7, 9]
One difference is that .arange allows us to give a real-valued step size:
np.arange(1, 10, 0.2)
array([1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. , 3.2, 3.4,
3.6, 3.8, 4. , 4.2, 4.4, 4.6, 4.8, 5. , 5.2, 5.4, 5.6, 5.8, 6. ,
6.2, 6.4, 6.6, 6.8, 7. , 7.2, 7.4, 7.6, 7.8, 8. , 8.2, 8.4, 8.6,
8.8, 9. , 9.2, 9.4, 9.6, 9.8])
For equally-spaced sampled points, we can also use .linspace:
np.linspace(2, 5, 10) # Sample 10 equally-spaced points from 2 to 5
array([2. , 2.33333333, 2.66666667, 3. , 3.33333333,
3.66666667, 4. , 4.33333333, 4.66666667, 5. ])
NumPy comes with lots of operations that can be done between arrays. The simplest is addition / subtraction:
a = np.array([
[1, 2],
[3, 4]
])
b = np.array([
[5, 1],
[3, 4]
])
a + b
array([[6, 3],
[6, 8]])
a - b
array([[-4, 1],
[ 0, 0]])
b - b
array([[0, 0],
[0, 0]])
We can also do logical comparison between arrays: These will be done element-wise, i.e, each element in the first array will be compared with the element in the second array that has the same index:
a < b
array([[ True, False],
[False, False]])
a > 2.5
array([[False, False],
[ True, True]])
There's also lots of ready-to-use functions that are in numpy, including:
np.sin(), np.cos(), ...np.min(), np.max(), ...np.mean(), np.std(), np.sum(), ...np.exp(), np.log(), ...Most of these can also be called as a class method (a.mean() vs np.mean(a)).
np.sin(0)
0.0
np.sin(np.pi / 2)
1.0
a
array([[1, 2],
[3, 4]])
np.min(a)
1
a.min()
1
a.mean()
2.5
def standardize(a):
return (a - a.mean()) / a.std()
standardize(a)
array([[-1.34164079, -0.4472136 ],
[ 0.4472136 , 1.34164079]])
Some operations can be specified to be done along an axis (i.e, row-wise, column-wise, etc.)
To do that, we specify the axis parameter:
a
array([[1, 2],
[3, 4]])
a.sum() # Default: sum up everything
10
a.sum(axis=0) # Sum up columns
array([4, 6])
a.sum(axis=1) # Sum up rows
array([3, 7])
a.min(axis=1)
array([1, 3])
a = np.arange(1, 37).reshape(6, 6)
a
array([[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24],
[25, 26, 27, 28, 29, 30],
[31, 32, 33, 34, 35, 36]])
.hsplit and .vsplit are for partitioning an array into sub-arrays:
part1, part2 = np.hsplit(a, 2) # Divide into two horizontal splits
part1
array([[ 1, 2, 3],
[ 7, 8, 9],
[13, 14, 15],
[19, 20, 21],
[25, 26, 27],
[31, 32, 33]])
part2
array([[ 4, 5, 6],
[10, 11, 12],
[16, 17, 18],
[22, 23, 24],
[28, 29, 30],
[34, 35, 36]])
part1, part2 = np.vsplit(a, 2)
part1
array([[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18]])
part2
array([[19, 20, 21, 22, 23, 24],
[25, 26, 27, 28, 29, 30],
[31, 32, 33, 34, 35, 36]])
Counterpart to .hsplit and .vsplit are .hstack and .vstack for combining arrays into a single array.
np.hstack([part1, part2]) # Horizontally stack parts
array([[ 1, 2, 3, 4, 5, 6, 19, 20, 21, 22, 23, 24],
[ 7, 8, 9, 10, 11, 12, 25, 26, 27, 28, 29, 30],
[13, 14, 15, 16, 17, 18, 31, 32, 33, 34, 35, 36]])
np.vstack([part1, part2]) # Vertically stack parts
array([[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24],
[25, 26, 27, 28, 29, 30],
[31, 32, 33, 34, 35, 36]])
a
array([[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24],
[25, 26, 27, 28, 29, 30],
[31, 32, 33, 34, 35, 36]])
A for loop with numpy arrays will loop over each row:
for row in a:
print('row:', row)
row: [1 2 3 4 5 6] row: [ 7 8 9 10 11 12] row: [13 14 15 16 17 18] row: [19 20 21 22 23 24] row: [25 26 27 28 29 30] row: [31 32 33 34 35 36]
We can transpose an array by using .T or .transpose(). Transposition makes it that each element at index (row, col) is now at index (col, row):
a
array([[ 1, 2, 3, 4, 5, 6],
[ 7, 8, 9, 10, 11, 12],
[13, 14, 15, 16, 17, 18],
[19, 20, 21, 22, 23, 24],
[25, 26, 27, 28, 29, 30],
[31, 32, 33, 34, 35, 36]])
a.T
array([[ 1, 7, 13, 19, 25, 31],
[ 2, 8, 14, 20, 26, 32],
[ 3, 9, 15, 21, 27, 33],
[ 4, 10, 16, 22, 28, 34],
[ 5, 11, 17, 23, 29, 35],
[ 6, 12, 18, 24, 30, 36]])
This lets us iterate over the rows of the original matrix, for example:
for col in a.T:
print('col:', col)
col: [ 1 7 13 19 25 31] col: [ 2 8 14 20 26 32] col: [ 3 9 15 21 27 33] col: [ 4 10 16 22 28 34] col: [ 5 11 17 23 29 35] col: [ 6 12 18 24 30 36]
Nested for loops allows us the "unravel" an array as follows:
for row in a:
for num in row:
print(num)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
NumPy provides .flat to do the same thing:
for i in a.flat:
print(i)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
NumPy also comes with a lot of tools for linear algebra.
As an example, suppose we have the following equations:
2x + 5y + z = 5
x + 2y + 3z = 3
3x + 3y + 5z = 10
We can find x, y, and z by representing the equations with the following matrices:
a = np.array([ # From the coefficients
[2, 5, 1],
[1, 2, 3],
[3, 3, 5]
])
b = np.array([5, 3, 10]) # From the results
np.linalg.solve will provide a solution if it exists:
x = np.linalg.solve(a, b)
x
array([ 3.63157895, -0.47368421, 0.10526316])
We can check the solution by computing the dot product with each row of the first matrix:
for row in a:
print(row.dot(x))
5.000000000000001 3.0000000000000004 10.0
b
array([ 5, 3, 10])
In general, we can compute the inverse of a matrix (if it exists) with np.linalg.inv:
a
array([[2, 5, 1],
[1, 2, 3],
[3, 3, 5]])
a_inv = np.linalg.inv(a)
a_inv
array([[ 0.05263158, -1.15789474, 0.68421053],
[ 0.21052632, 0.36842105, -0.26315789],
[-0.15789474, 0.47368421, -0.05263158]])
The inverse of a matrix, when multiplied with original matrix, should produce an identity matrix:
np.matmul(a_inv, a)
array([[ 1.00000000e+00, 0.00000000e+00, 4.44089210e-16],
[ 0.00000000e+00, 1.00000000e+00, 0.00000000e+00],
[ 0.00000000e+00, -5.55111512e-17, 1.00000000e+00]])
np.matmul(a, a_inv)
array([[ 1.00000000e+00, 2.77555756e-16, -5.55111512e-17],
[ 5.55111512e-17, 1.00000000e+00, -5.55111512e-17],
[ 0.00000000e+00, 4.44089210e-16, 1.00000000e+00]])
There are many more functions that are available, such as:
np.linalg.det() for computing determinantsnp.trace() to get the trace of a matrixnp.linalg.eigh() to compute eigenvalues and eigenvectors for a given matrix.np.linalg.svd(), np.linalg.cholesky(), etc. to get matrix decompositions.np.linalg.det(a)
19.000000000000004
np.trace(a)
9
np.linalg.eigh(a)
(array([-0.35889894, 1. , 8.35889894]),
array([[ 0.55439511, 0.70710678, -0.43891465],
[ 0.55439511, -0.70710678, -0.43891465],
[-0.62071905, 0. , -0.78403308]]))
SciPy is another external Python library for scientific computing. It has multiple libraries for various scientific computation tools, including but not limited to:
See the textbook for further details about SciPy.