Numpy#
NumPy (or Numpy) is a Linear Algebra library for Python, the reason it is so important for Data Science with Python is that almost all of the libraries in the Python Data Science Ecosystem rely on NumPy as one of their main building blocks. Numpy is also incredibly fast, as it has bindings to C libraries.
!pip install numpy
Requirement already satisfied: numpy in d:\github\python-for-data-science\venv\lib\site-packages (1.25.2)
[notice] A new release of pip available: 22.3.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip
Using NumPy#
Before we use numpy, we need to import it as a library.
import warnings
warnings.simplefilter(action='ignore')
import numpy as np
Numpy has many built-in functions and capabilities. We won’t cover them all, instead we will focus on some of the most important aspects of Numpy: vectors, arrays, matrices, and number generation. Let’s start with Arrays.
Numpy Arrays#
Numpy arrays are the main way we will use Numpy throughout the course. Numpy arrays essentially come in two flavours: vectors and matrices. Vectors are strictly 1-D arrays and matrices are 2-D (but you should note a matric an still have only one row or one column).
Let’s begin by creating arrays.
Creating Numpy Arrays#
From a Python list
We can create an array by directly converting a list or list of lists.
my_list = [1,2,3]
my_list
[1, 2, 3]
np.array(my_list)
array([1, 2, 3])
print(type(my_list))
print(type(np.array(my_list)))
<class 'list'>
<class 'numpy.ndarray'>
my_matrix = [[1,2,3],[4,5,6],[7,8,9]]
my_matrix
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
np.array(my_matrix)
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
There are lots of built-in ways to generate Arrays.
arange
Returns evenly spaced values within a given interval. Similar to the range
function we used before.
np.arange(0, 10)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
np.arange(0, 11, 2)
array([ 0, 2, 4, 6, 8, 10])
zeros and ones
We can use the functions zeros
or ones
to create arrays consisting of only zeros or ones respectively.
np.zeros(3)
array([0., 0., 0.])
np.zeros((5,5))
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])
np.ones(3)
array([1., 1., 1.])
np.ones((3, 4))
array([[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])
linspace
The linspace
function also returns evenly spaces numbers over a specified interval. However, there is a slight difference between linspace
and arange
. In arange
, similar to the range
function, we define the step of each increment. In linspace
, we define the number of elements we need in the specified interval
np.linspace(0, 10, 3)
array([ 0., 5., 10.])
np.arange(0, 10, 3)
array([0, 3, 6, 9])
np.linspace(0, 10, 50)
array([ 0. , 0.20408163, 0.40816327, 0.6122449 , 0.81632653,
1.02040816, 1.2244898 , 1.42857143, 1.63265306, 1.83673469,
2.04081633, 2.24489796, 2.44897959, 2.65306122, 2.85714286,
3.06122449, 3.26530612, 3.46938776, 3.67346939, 3.87755102,
4.08163265, 4.28571429, 4.48979592, 4.69387755, 4.89795918,
5.10204082, 5.30612245, 5.51020408, 5.71428571, 5.91836735,
6.12244898, 6.32653061, 6.53061224, 6.73469388, 6.93877551,
7.14285714, 7.34693878, 7.55102041, 7.75510204, 7.95918367,
8.16326531, 8.36734694, 8.57142857, 8.7755102 , 8.97959184,
9.18367347, 9.3877551 , 9.59183673, 9.79591837, 10. ])
eye
The function eye
creates an identity matrix (square matrices where only diagonals are one and rest is zero)
np.eye(4)
array([[1., 0., 0., 0.],
[0., 1., 0., 0.],
[0., 0., 1., 0.],
[0., 0., 0., 1.]])
Random
Numpy also has lots of ways to create random number arrays:
rand
Creates an array of the given shape and populates it with random samples from a uniform distribution over [0, 1).
np.random.rand(2)
array([0.95929715, 0.67075412])
np.random.rand(5, 5)
array([[0.31551959, 0.62479383, 0.80138271, 0.45669335, 0.02614221],
[0.99411757, 0.168258 , 0.37732228, 0.62424973, 0.22165636],
[0.32028392, 0.76110266, 0.45402502, 0.53789367, 0.51218525],
[0.41061842, 0.24484808, 0.57213579, 0.20334385, 0.58985469],
[0.21772777, 0.31404079, 0.45276074, 0.32084539, 0.74839634]])
randn
Returns a sample (or samples) from the “standard normal” distribution. Unlike rand which is uniform.
np.random.randn(2)
array([0.36979384, 0.27957466])
np.random.randn(5, 5)
array([[ 0.06473084, -0.54752886, 0.98757083, 1.56280885, 0.83742714],
[-0.75804819, 0.01884728, -0.7796917 , -1.06170928, -0.80635061],
[ 0.15349417, 0.19364278, -0.053014 , -0.46270241, -0.15764704],
[ 0.60906722, 2.7324825 , 0.43388339, 0.19960402, -0.17349376],
[ 0.69714979, -0.0393381 , 0.63608993, 0.33190352, -0.1884811 ]])
randint
Returns random integers from low (inclusive) to high (exclusive).
np.random.randint(1, 100)
5
np.random.randint(1, 100, 10)
array([60, 7, 59, 7, 70, 6, 23, 9, 73, 80])
Array Attributes and Methods#
Let’s discuss some useful attributes and methods on an array.
arr = np.arange(25)
arr
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24])
ranarr = np.random.randint(0, 50, 10)
ranarr
array([39, 34, 42, 40, 36, 11, 42, 34, 48, 9])
Reshape
Returns an array containing the same data with a new shape.
arr.reshape(5, 5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
Max, Min, Argmax, Argmin
These are useful methods to find the maximum and minimum values, or to find their index locations using argmin or argmax
ranarr
array([39, 34, 42, 40, 36, 11, 42, 34, 48, 9])
ranarr.max()
48
ranarr.min()
9
ranarr.argmax()
8
ranarr.argmin()
9
Shape
Shape is an attribute that arrays have (not a method).
arr
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24])
arr.shape
(25,)
arr.reshape(1, 25)
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
16, 17, 18, 19, 20, 21, 22, 23, 24]])
arr.reshape(1, 25).shape
(1, 25)
arr.reshape(5, 5)
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
arr.reshape(5, 5).shape
(5, 5)
Dtype
Dtype grabs the data type of the object in the array.
arr.dtype
dtype('int32')
Numpy Indexing and Selection#
Indexing and Selection help us to select elements or groups of elements from an array.
arr = np.arange(2, 11)
arr
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10])
Bracket Indexing
Similar to lists and other sequence objects, we can use brackets to pick one or more elements from an array.
arr[8]
10
arr[1:5]
array([3, 4, 5, 6])
Broadcasting
Numpy arrays differ from a normal Python list because of their ability to broadcast.
arr[0:5] = 100
arr
array([100, 100, 100, 100, 100, 7, 8, 9, 10])
slice_of_arr = arr[0:6]
slice_of_arr
array([100, 100, 100, 100, 100, 7])
slice_of_arr[:]=99
slice_of_arr
array([99, 99, 99, 99, 99, 99])
arr
array([99, 99, 99, 99, 99, 99, 8, 9, 10])
Data is not copied, it’s a view of the original array! This avoids memory problems.
#To get a copy, need to be explicit
arr_copy = arr.copy()
arr_copy
array([99, 99, 99, 99, 99, 99, 8, 9, 10])
Indexing a 2D array#
The general format is arr_2d[row][col]
or arr_2d[row, col]
arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))
arr_2d
array([[ 5, 10, 15],
[20, 25, 30],
[35, 40, 45]])
# Indexing row
arr_2d[1]
array([20, 25, 30])
# Format is arr_2d[row][col] or arr_2d[row,col]
# Getting individual element value
arr_2d[1][0]
20
# 2D array slicing
#Shape (2,2) from top right corner
arr_2d[:2,1:]
array([[10, 15],
[25, 30]])
#Shape bottom row
arr_2d[2]
array([35, 40, 45])
#Shape bottom row
arr_2d[2,:]
array([35, 40, 45])
Fancy Indexing#
Fancy indexing allows you to select entire rows or columns out of order.
#Set up matrix
arr2d = np.zeros((10,5))
arr2d
array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])
arr2d.shape
(10, 5)
#Length of array
arr_length = arr2d.shape[0]
arr_length
10
#Set up array
for i in range(arr_length):
arr2d[i] = i
arr2d
array([[0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1.],
[2., 2., 2., 2., 2.],
[3., 3., 3., 3., 3.],
[4., 4., 4., 4., 4.],
[5., 5., 5., 5., 5.],
[6., 6., 6., 6., 6.],
[7., 7., 7., 7., 7.],
[8., 8., 8., 8., 8.],
[9., 9., 9., 9., 9.]])
Fancy indexing allows the following:
arr2d[[2,4,6,8]]
array([[2., 2., 2., 2., 2.],
[4., 4., 4., 4., 4.],
[6., 6., 6., 6., 6.],
[8., 8., 8., 8., 8.]])
#Allows in any order
arr2d[[6,4,2,7]]
array([[6., 6., 6., 6., 6.],
[4., 4., 4., 4., 4.],
[2., 2., 2., 2., 2.],
[7., 7., 7., 7., 7.]])
Selection#
We can select parts of arrays using comparison operators.
arr = np.arange(1,11)
arr
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
arr > 4
array([False, False, False, False, True, True, True, True, True,
True])
bool_arr = arr > 4
bool_arr
array([False, False, False, False, True, True, True, True, True,
True])
arr[bool_arr]
array([ 5, 6, 7, 8, 9, 10])
x = 2
arr[arr > x]
array([ 3, 4, 5, 6, 7, 8, 9, 10])
Numpy Operations#
Arithmetic#
You can easily perform array with array arithmetic or scalar with array arithmetic.
import numpy as np
arr = np.arange(0, 10)
arr
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
arr + arr
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
arr - arr
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
# Warning on division by zero, but not an error!
# Just replaced with nan
arr/arr
array([nan, 1., 1., 1., 1., 1., 1., 1., 1., 1.])
# Also warning, but not an error instead infinity
1/arr
array([ inf, 1. , 0.5 , 0.33333333, 0.25 ,
0.2 , 0.16666667, 0.14285714, 0.125 , 0.11111111])
arr ** 3
array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729], dtype=int32)
arr + 4
array([ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
arr - 5
array([-5, -4, -3, -2, -1, 0, 1, 2, 3, 4])
Universal Array Functions#
Numpy comes with many Universal Array Functions, which are essentially just mathematical operations you can use to perform the operations across the array. Here are a few common ones:
# Taking Square Roots
np.sqrt(arr)
array([0. , 1. , 1.41421356, 1.73205081, 2. ,
2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ])
# Calcualting exponential (e^)
np.exp(arr)
array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
2.98095799e+03, 8.10308393e+03])
np.max(arr) #same as arr.max()
9
np.sin(arr)
array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ,
-0.95892427, -0.2794155 , 0.6569866 , 0.98935825, 0.41211849])
np.log(arr)
array([ -inf, 0. , 0.69314718, 1.09861229, 1.38629436,
1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458])