Weeks 12 and 13 are about using external scientific and engineering libraries that are not inside core Python, but are popular enough that we should learn them too.
NumPy is a Python library for easy and efficient numerical computation of multi-dimensional arrays (vectors, matrices, etc.) You can install it via either pip or conda with, for example the following command:
pip install numpy
We first have to import numpy:
import numpy as np
Numpy works with lists (and lists of lists, etc.) and the core data type is something called numpy.ndarray
, which we can create using the np.array()
construction function:
l = [1, 2, 3]
arr = np.array(l)
type(arr)
numpy.ndarray
This is how we can create a 2D matrix:
arr = np.array([
[1, 2, 3, 4],
[5, 6, 7, 8]
])
A numpy array holds integers or floating point numbers, which can be checked by looking at its dtype
:
arr.dtype
dtype('int64')
We can get the dimensions of a numpy array using .shape
:
arr.shape # 2 rows, 4 columns
(2, 4)
.ndim
gives the number of dimensions:
arr.ndim
2
A numpy array can be reshaped into a different shape using .reshape(new_shape)
. This will create a new array with the requested shape.
arr.reshape(8,1)
array([[1], [2], [3], [4], [5], [6], [7], [8]])
arr.reshape(2, 2, 2) # 2x2x2 cube
array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
To get the number of elements inside an array, use .size
:
arr.size
8
Similarly to lists in Python, we can access elements inside an array using the same square bracket syntax:
print(arr)
[[1 2 3 4] [5 6 7 8]]
arr[0][2]
3
As a special trick, we can access the same index as above using the following syntax:
arr[0, 2] # separated by commas
3
To get all of the elements of the second columns (counting from zero):
arr[:, 2]
array([3, 7])
To get all of the elements of the first row (counting from zero):
arr[1, :]
array([5, 6, 7, 8])
We can also create arrays that are automatically filled with a specific value:
np.zeros((10, 10))
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
np.ones((5, 5))
array([[1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.]])
np.identity(8) # Create an 8x8 identity matrix
array([[1., 0., 0., 0., 0., 0., 0., 0.], [0., 1., 0., 0., 0., 0., 0., 0.], [0., 0., 1., 0., 0., 0., 0., 0.], [0., 0., 0., 1., 0., 0., 0., 0.], [0., 0., 0., 0., 1., 0., 0., 0.], [0., 0., 0., 0., 0., 1., 0., 0.], [0., 0., 0., 0., 0., 0., 1., 0.], [0., 0., 0., 0., 0., 0., 0., 1.]])
# Create a matrix filled with random values sampled from [0, 1]
np.random.normal(size=(4, 4))
array([[ 0.59264239, 1.22055596, 1.45809488, 1.48312865], [ 0.39275385, -0.4658526 , 0.75872202, -1.82229149], [-1.24125527, -1.73792977, 1.32602209, 0.33317698], [ 0.1169786 , -2.50504403, 1.1665632 , -1.15352151]])
We can create arrays that with .arange
, that mimics the behavior of range
, as follows:
np.arange(1, 10, 2)
array([1, 3, 5, 7, 9])
list(range(1, 10, 2))
[1, 3, 5, 7, 9]
One difference is that .arange
allows us to give a real-valued step size:
np.arange(1, 10, 0.2)
array([1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4, 2.6, 2.8, 3. , 3.2, 3.4, 3.6, 3.8, 4. , 4.2, 4.4, 4.6, 4.8, 5. , 5.2, 5.4, 5.6, 5.8, 6. , 6.2, 6.4, 6.6, 6.8, 7. , 7.2, 7.4, 7.6, 7.8, 8. , 8.2, 8.4, 8.6, 8.8, 9. , 9.2, 9.4, 9.6, 9.8])
For equally-spaced sampled points, we can also use .linspace
:
np.linspace(2, 5, 10) # Sample 10 equally-spaced points from 2 to 5
array([2. , 2.33333333, 2.66666667, 3. , 3.33333333, 3.66666667, 4. , 4.33333333, 4.66666667, 5. ])
NumPy comes with lots of operations that can be done between arrays. The simplest is addition / subtraction:
a = np.array([
[1, 2],
[3, 4]
])
b = np.array([
[5, 1],
[3, 4]
])
a + b
array([[6, 3], [6, 8]])
a - b
array([[-4, 1], [ 0, 0]])
b - b
array([[0, 0], [0, 0]])
We can also do logical comparison between arrays: These will be done element-wise, i.e, each element in the first array will be compared with the element in the second array that has the same index:
a < b
array([[ True, False], [False, False]])
a > 2.5
array([[False, False], [ True, True]])
There's also lots of ready-to-use functions that are in numpy, including:
np.sin()
, np.cos()
, ...np.min()
, np.max()
, ...np.mean()
, np.std()
, np.sum()
, ...np.exp()
, np.log()
, ...Most of these can also be called as a class method (a.mean()
vs np.mean(a)
).
np.sin(0)
0.0
np.sin(np.pi / 2)
1.0
a
array([[1, 2], [3, 4]])
np.min(a)
1
a.min()
1
a.mean()
2.5
def standardize(a):
return (a - a.mean()) / a.std()
standardize(a)
array([[-1.34164079, -0.4472136 ], [ 0.4472136 , 1.34164079]])
Some operations can be specified to be done along an axis (i.e, row-wise, column-wise, etc.)
To do that, we specify the axis
parameter:
a
array([[1, 2], [3, 4]])
a.sum() # Default: sum up everything
10
a.sum(axis=0) # Sum up columns
array([4, 6])
a.sum(axis=1) # Sum up rows
array([3, 7])
a.min(axis=1)
array([1, 3])
a = np.arange(1, 37).reshape(6, 6)
a
array([[ 1, 2, 3, 4, 5, 6], [ 7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24], [25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36]])
.hsplit
and .vsplit
are for partitioning an array into sub-arrays:
part1, part2 = np.hsplit(a, 2) # Divide into two horizontal splits
part1
array([[ 1, 2, 3], [ 7, 8, 9], [13, 14, 15], [19, 20, 21], [25, 26, 27], [31, 32, 33]])
part2
array([[ 4, 5, 6], [10, 11, 12], [16, 17, 18], [22, 23, 24], [28, 29, 30], [34, 35, 36]])
part1, part2 = np.vsplit(a, 2)
part1
array([[ 1, 2, 3, 4, 5, 6], [ 7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18]])
part2
array([[19, 20, 21, 22, 23, 24], [25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36]])
Counterpart to .hsplit
and .vsplit
are .hstack
and .vstack
for combining arrays into a single array.
np.hstack([part1, part2]) # Horizontally stack parts
array([[ 1, 2, 3, 4, 5, 6, 19, 20, 21, 22, 23, 24], [ 7, 8, 9, 10, 11, 12, 25, 26, 27, 28, 29, 30], [13, 14, 15, 16, 17, 18, 31, 32, 33, 34, 35, 36]])
np.vstack([part1, part2]) # Vertically stack parts
array([[ 1, 2, 3, 4, 5, 6], [ 7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24], [25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36]])
a
array([[ 1, 2, 3, 4, 5, 6], [ 7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24], [25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36]])
A for loop with numpy arrays will loop over each row:
for row in a:
print('row:', row)
row: [1 2 3 4 5 6] row: [ 7 8 9 10 11 12] row: [13 14 15 16 17 18] row: [19 20 21 22 23 24] row: [25 26 27 28 29 30] row: [31 32 33 34 35 36]
We can transpose an array by using .T
or .transpose()
. Transposition makes it that each element at index (row, col) is now at index (col, row):
a
array([[ 1, 2, 3, 4, 5, 6], [ 7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18], [19, 20, 21, 22, 23, 24], [25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36]])
a.T
array([[ 1, 7, 13, 19, 25, 31], [ 2, 8, 14, 20, 26, 32], [ 3, 9, 15, 21, 27, 33], [ 4, 10, 16, 22, 28, 34], [ 5, 11, 17, 23, 29, 35], [ 6, 12, 18, 24, 30, 36]])
This lets us iterate over the rows of the original matrix, for example:
for col in a.T:
print('col:', col)
col: [ 1 7 13 19 25 31] col: [ 2 8 14 20 26 32] col: [ 3 9 15 21 27 33] col: [ 4 10 16 22 28 34] col: [ 5 11 17 23 29 35] col: [ 6 12 18 24 30 36]
Nested for loops allows us the "unravel" an array as follows:
for row in a:
for num in row:
print(num)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
NumPy provides .flat
to do the same thing:
for i in a.flat:
print(i)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
NumPy also comes with a lot of tools for linear algebra.
As an example, suppose we have the following equations:
2x + 5y + z = 5
x + 2y + 3z = 3
3x + 3y + 5z = 10
We can find x, y, and z by representing the equations with the following matrices:
a = np.array([ # From the coefficients
[2, 5, 1],
[1, 2, 3],
[3, 3, 5]
])
b = np.array([5, 3, 10]) # From the results
np.linalg.solve
will provide a solution if it exists:
x = np.linalg.solve(a, b)
x
array([ 3.63157895, -0.47368421, 0.10526316])
We can check the solution by computing the dot product with each row of the first matrix:
for row in a:
print(row.dot(x))
5.000000000000001 3.0000000000000004 10.0
b
array([ 5, 3, 10])
In general, we can compute the inverse of a matrix (if it exists) with np.linalg.inv
:
a
array([[2, 5, 1], [1, 2, 3], [3, 3, 5]])
a_inv = np.linalg.inv(a)
a_inv
array([[ 0.05263158, -1.15789474, 0.68421053], [ 0.21052632, 0.36842105, -0.26315789], [-0.15789474, 0.47368421, -0.05263158]])
The inverse of a matrix, when multiplied with original matrix, should produce an identity matrix:
np.matmul(a_inv, a)
array([[ 1.00000000e+00, 0.00000000e+00, 4.44089210e-16], [ 0.00000000e+00, 1.00000000e+00, 0.00000000e+00], [ 0.00000000e+00, -5.55111512e-17, 1.00000000e+00]])
np.matmul(a, a_inv)
array([[ 1.00000000e+00, 2.77555756e-16, -5.55111512e-17], [ 5.55111512e-17, 1.00000000e+00, -5.55111512e-17], [ 0.00000000e+00, 4.44089210e-16, 1.00000000e+00]])
There are many more functions that are available, such as:
np.linalg.det()
for computing determinantsnp.trace()
to get the trace of a matrixnp.linalg.eigh()
to compute eigenvalues and eigenvectors for a given matrix.np.linalg.svd()
, np.linalg.cholesky()
, etc. to get matrix decompositions.np.linalg.det(a)
19.000000000000004
np.trace(a)
9
np.linalg.eigh(a)
(array([-0.35889894, 1. , 8.35889894]), array([[ 0.55439511, 0.70710678, -0.43891465], [ 0.55439511, -0.70710678, -0.43891465], [-0.62071905, 0. , -0.78403308]]))
SciPy is another external Python library for scientific computing. It has multiple libraries for various scientific computation tools, including but not limited to:
See the textbook for further details about SciPy.