Numpy

Numpy#

NumPy (or Numpy) is a Linear Algebra library for Python, the reason it is so important for Data Science with Python is that almost all of the libraries in the Python Data Science Ecosystem rely on NumPy as one of their main building blocks. Numpy is also incredibly fast, as it has bindings to C libraries.

!pip install numpy

Requirement already satisfied: numpy in d:\github\python-for-data-science\venv\lib\site-packages (1.25.2)

[notice] A new release of pip available: 22.3.1 -> 24.0
[notice] To update, run: python.exe -m pip install --upgrade pip

Using NumPy#

Before we use numpy, we need to import it as a library.

import warnings
warnings.simplefilter(action='ignore')

import numpy as np

Numpy has many built-in functions and capabilities. We won’t cover them all, instead we will focus on some of the most important aspects of Numpy: vectors, arrays, matrices, and number generation. Let’s start with Arrays.

Numpy Arrays#

Numpy arrays are the main way we will use Numpy throughout the course. Numpy arrays essentially come in two flavours: vectors and matrices. Vectors are strictly 1-D arrays and matrices are 2-D (but you should note a matric an still have only one row or one column).

Let’s begin by creating arrays.

Creating Numpy Arrays#

From a Python list

We can create an array by directly converting a list or list of lists.

my_list = [1,2,3]
my_list

[1, 2, 3]

np.array(my_list)

array([1, 2, 3])

print(type(my_list))
print(type(np.array(my_list)))

<class 'list'>
<class 'numpy.ndarray'>

my_matrix = [[1,2,3],[4,5,6],[7,8,9]]
my_matrix

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

np.array(my_matrix)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

There are lots of built-in ways to generate Arrays.

arange

Returns evenly spaced values within a given interval. Similar to the range function we used before.

np.arange(0, 10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.arange(0, 11, 2)

array([ 0,  2,  4,  6,  8, 10])

zeros and ones

We can use the functions zeros or ones to create arrays consisting of only zeros or ones respectively.

np.zeros(3)

array([0., 0., 0.])

np.zeros((5,5))

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

np.ones(3)

array([1., 1., 1.])

np.ones((3, 4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

linspace

The linspace function also returns evenly spaces numbers over a specified interval. However, there is a slight difference between linspace and arange. In arange, similar to the range function, we define the step of each increment. In linspace, we define the number of elements we need in the specified interval

np.linspace(0, 10, 3)

array([ 0.,  5., 10.])

np.arange(0, 10, 3)

array([0, 3, 6, 9])

np.linspace(0, 10, 50)

array([ 0.        ,  0.20408163,  0.40816327,  0.6122449 ,  0.81632653,
        1.02040816,  1.2244898 ,  1.42857143,  1.63265306,  1.83673469,
        2.04081633,  2.24489796,  2.44897959,  2.65306122,  2.85714286,
        3.06122449,  3.26530612,  3.46938776,  3.67346939,  3.87755102,
        4.08163265,  4.28571429,  4.48979592,  4.69387755,  4.89795918,
        5.10204082,  5.30612245,  5.51020408,  5.71428571,  5.91836735,
        6.12244898,  6.32653061,  6.53061224,  6.73469388,  6.93877551,
        7.14285714,  7.34693878,  7.55102041,  7.75510204,  7.95918367,
        8.16326531,  8.36734694,  8.57142857,  8.7755102 ,  8.97959184,
        9.18367347,  9.3877551 ,  9.59183673,  9.79591837, 10.        ])

eye

The function eye creates an identity matrix (square matrices where only diagonals are one and rest is zero)

np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

Random

Numpy also has lots of ways to create random number arrays:

rand

Creates an array of the given shape and populates it with random samples from a uniform distribution over [0, 1).

np.random.rand(2)

array([0.95929715, 0.67075412])

np.random.rand(5, 5)

array([[0.31551959, 0.62479383, 0.80138271, 0.45669335, 0.02614221],
       [0.99411757, 0.168258  , 0.37732228, 0.62424973, 0.22165636],
       [0.32028392, 0.76110266, 0.45402502, 0.53789367, 0.51218525],
       [0.41061842, 0.24484808, 0.57213579, 0.20334385, 0.58985469],
       [0.21772777, 0.31404079, 0.45276074, 0.32084539, 0.74839634]])

randn

Returns a sample (or samples) from the “standard normal” distribution. Unlike rand which is uniform.

np.random.randn(2)

array([0.36979384, 0.27957466])

np.random.randn(5, 5)

array([[ 0.06473084, -0.54752886,  0.98757083,  1.56280885,  0.83742714],
       [-0.75804819,  0.01884728, -0.7796917 , -1.06170928, -0.80635061],
       [ 0.15349417,  0.19364278, -0.053014  , -0.46270241, -0.15764704],
       [ 0.60906722,  2.7324825 ,  0.43388339,  0.19960402, -0.17349376],
       [ 0.69714979, -0.0393381 ,  0.63608993,  0.33190352, -0.1884811 ]])

randint

Returns random integers from low (inclusive) to high (exclusive).

np.random.randint(1, 100)

np.random.randint(1, 100, 10)

array([60,  7, 59,  7, 70,  6, 23,  9, 73, 80])

Array Attributes and Methods#

Let’s discuss some useful attributes and methods on an array.

arr = np.arange(25)
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

ranarr = np.random.randint(0, 50, 10)
ranarr

array([39, 34, 42, 40, 36, 11, 42, 34, 48,  9])

Reshape

Returns an array containing the same data with a new shape.

arr.reshape(5, 5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

Max, Min, Argmax, Argmin

These are useful methods to find the maximum and minimum values, or to find their index locations using argmin or argmax

ranarr

array([39, 34, 42, 40, 36, 11, 42, 34, 48,  9])

ranarr.max()

ranarr.min()

ranarr.argmax()

ranarr.argmin()

Shape

Shape is an attribute that arrays have (not a method).

arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

arr.shape

(25,)

arr.reshape(1, 25)

array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15,
        16, 17, 18, 19, 20, 21, 22, 23, 24]])

arr.reshape(1, 25).shape

(1, 25)

arr.reshape(5, 5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

arr.reshape(5, 5).shape

(5, 5)

Dtype

Dtype grabs the data type of the object in the array.

arr.dtype

dtype('int32')

Numpy Indexing and Selection#

Indexing and Selection help us to select elements or groups of elements from an array.

arr = np.arange(2, 11)
arr

array([ 2,  3,  4,  5,  6,  7,  8,  9, 10])

Bracket Indexing

Similar to lists and other sequence objects, we can use brackets to pick one or more elements from an array.

arr[8]

arr[1:5]

array([3, 4, 5, 6])

Broadcasting

Numpy arrays differ from a normal Python list because of their ability to broadcast.

arr[0:5] = 100
arr

array([100, 100, 100, 100, 100,   7,   8,   9,  10])

slice_of_arr = arr[0:6]
slice_of_arr

array([100, 100, 100, 100, 100,   7])

slice_of_arr[:]=99
slice_of_arr

array([99, 99, 99, 99, 99, 99])

arr

array([99, 99, 99, 99, 99, 99,  8,  9, 10])

Data is not copied, it’s a view of the original array! This avoids memory problems.

#To get a copy, need to be explicit
arr_copy = arr.copy()
arr_copy

array([99, 99, 99, 99, 99, 99,  8,  9, 10])

Indexing a 2D array#

The general format is arr_2d[row][col] or arr_2d[row, col]

arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))
arr_2d

array([[ 5, 10, 15],
       [20, 25, 30],
       [35, 40, 45]])

# Indexing row
arr_2d[1]

array([20, 25, 30])

# Format is arr_2d[row][col] or arr_2d[row,col]

# Getting individual element value
arr_2d[1][0]

# 2D array slicing

#Shape (2,2) from top right corner
arr_2d[:2,1:]

array([[10, 15],
       [25, 30]])

#Shape bottom row
arr_2d[2]

array([35, 40, 45])

#Shape bottom row
arr_2d[2,:]

array([35, 40, 45])

Fancy Indexing#

Fancy indexing allows you to select entire rows or columns out of order.

#Set up matrix
arr2d = np.zeros((10,5))
arr2d

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

arr2d.shape

(10, 5)

#Length of array
arr_length = arr2d.shape[0]
arr_length

#Set up array

for i in range(arr_length):
    arr2d[i] = i
arr2d

array([[0., 0., 0., 0., 0.],
       [1., 1., 1., 1., 1.],
       [2., 2., 2., 2., 2.],
       [3., 3., 3., 3., 3.],
       [4., 4., 4., 4., 4.],
       [5., 5., 5., 5., 5.],
       [6., 6., 6., 6., 6.],
       [7., 7., 7., 7., 7.],
       [8., 8., 8., 8., 8.],
       [9., 9., 9., 9., 9.]])

Fancy indexing allows the following:

arr2d[[2,4,6,8]]

array([[2., 2., 2., 2., 2.],
       [4., 4., 4., 4., 4.],
       [6., 6., 6., 6., 6.],
       [8., 8., 8., 8., 8.]])

#Allows in any order
arr2d[[6,4,2,7]]

array([[6., 6., 6., 6., 6.],
       [4., 4., 4., 4., 4.],
       [2., 2., 2., 2., 2.],
       [7., 7., 7., 7., 7.]])

Selection#

We can select parts of arrays using comparison operators.

arr = np.arange(1,11)
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

arr > 4

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

bool_arr = arr > 4
bool_arr

array([False, False, False, False,  True,  True,  True,  True,  True,
        True])

arr[bool_arr]

array([ 5,  6,  7,  8,  9, 10])

x = 2

arr[arr > x]

array([ 3,  4,  5,  6,  7,  8,  9, 10])

Numpy Operations#

Arithmetic#

You can easily perform array with array arithmetic or scalar with array arithmetic.

import numpy as np

arr = np.arange(0, 10)

arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

arr + arr

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

arr - arr

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

# Warning on division by zero, but not an error!
# Just replaced with nan
arr/arr

array([nan,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.])

# Also warning, but not an error instead infinity
1/arr

array([       inf, 1.        , 0.5       , 0.33333333, 0.25      ,
       0.2       , 0.16666667, 0.14285714, 0.125     , 0.11111111])

arr ** 3

array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729], dtype=int32)

arr + 4

array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13])

arr - 5

array([-5, -4, -3, -2, -1,  0,  1,  2,  3,  4])

Universal Array Functions#

Numpy comes with many Universal Array Functions, which are essentially just mathematical operations you can use to perform the operations across the array. Here are a few common ones:

# Taking Square Roots
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

# Calcualting exponential (e^)
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

np.max(arr) #same as arr.max()

np.sin(arr)

array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

np.log(arr)

array([      -inf, 0.        , 0.69314718, 1.09861229, 1.38629436,
       1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458])