## Overview

Here are some notes I took while watching Keith Galli’s video providing an introduction to NumPy.

Colab Notebook

## What is NumPy

• A multi-dimensional array library

### How are List different from NumPy?

• Lists are very slow
• Lists are dynamically typed
• Lists need to store a lot more information to account for unfixed data types
• Needs to keep track of the following information for single Integer
• Size: 4 bytes
• Reference Count: 8 bytes
• Object Type: 8 bytes
• Object Value: 8 bytes
• Does not use contiguous memory
• Different array elements are scattered in different parts of memory
• NumPy is very fast
• NumPy uses fixed types
• Don’t need to do type checking
• Default type is Int32 (4 bytes)
• Faster to read less bytes of memory
• Can specify specific data types (e.g. Int16, Int8)
• Uses contiguous memory
• Data for an array is in the same chunk of memory
• faster to access
• Can leverage SIMD Vector Processing
• Single Instruction Multiple Data
• Can perform operations on all elements simultaneously
• Effective CPU cache utilization
• Lot’s more functionality
• Example: array multiplication arrayA*arrayB

## Applications of NumPy

• MATLAB replacement
• SciPy has even more mathematical capability
• Plotting (Matplotlib)
• Backend (Pandas, Digital Photography)
• Machine Learning (Tensors)

## Install NumPy

• pip install numpy
• conda install numpy

## Import NumPy

import numpy as np


## The Basics

Initialize a 1D array

# Initialize a 1D array
a = np.array([1,2,3])
a

array([1, 2, 3])


Initialize a 2D array of floats

# Initialize a 2D array of floats
b = np.array([[9.0,8.0,7.0],[6.0,5.0,4.0]])
b

array([[9., 8., 7.],
[6., 5., 4.]])


Get Dimension

# Get Dimension
a.ndim

1


Get Shape

# Get Shape
b.shape

(2, 3)


Get Type

# Get Type
a.dtype

dtype('int64')


Specify data type

# Specify data type
a = np.array([1,2,3], dtype='int16')
a.dtype

dtype('int16')


Get Size

# Get Size: the number of bytes per array element
a.itemsize

2 (for int16)


Get total size

# Get total size: number of elements times the number of bytes per element
a.size * a.itemsize
# or
a.nbytes

6 (for 3 int16 elements)


## Accessing and Changing Arrays

Indexing - NumPy v1.13 Manual

a = np.array([[1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14]])
print(f'Values: {a}')
print(f'Shape: {a.shape}')

# Get a specific element [r, c]
a[1, 5]

Values: [[ 1  2  3  4  5  6  7]
[ 8  9 10 11 12 13 14]]
Shape: (2, 7)
13


Get a specific row

# Get a specific row
a[0, :]

array([1, 2, 3, 4, 5, 6, 7])


Get a specific column

# Get a specific column
a[:, 2]

array([ 3, 10])

# Getting a little more fancy [startindex:endindex:stepsize]
a[0, 1:6:2]
# or
a[0, 1:-1:2]

array([2, 4, 6])


Change elements

# Change elements
a[1,5] = 20
a

array([[ 1,  2,  3,  4,  5,  6,  7],
[ 8,  9, 10, 11, 12, 20, 14]])


Change column index 2

# Change column index 2
a[:, 2] = 5
a

array([[ 1,  2,  5,  4,  5,  6,  7],
[ 8,  9,  5, 11, 12, 20, 14]])


Change colum with two different numbers

# Change colum with two different numbers
# Needs to be the same shape as the part you want to modify
# Two elements in each column means a lenght of 2
a[:, 2] = [1,2]
a

array([[ 1,  2,  1,  4,  5,  6,  7],
[ 8,  9,  2, 11, 12, 20, 14]])


3D Example

# 3D Example
b = np.array([[[1,2],[3,4]], [[5,6],[7,8]]])
b

array([[[1, 2],
[3, 4]],

[[5, 6],
[7, 8]]])


Get specific element

# Get specific element (work outside in)
# [first_dim, second_dim, third_dim]
b[0, 1, 1]

4


Get Specific Element

# Get specific element (work outside in)
b[:,1,:]

array([[3, 4],
[7, 8]])


Replace values

# Replace
# New value needs to be the same dimensions as what is being replaced
b[:,1,:] = [[9,9],[8,8]]
b

array([[[1, 2],
[9, 9]],

[[5, 6],
[8, 8]]])


## Initialize Different Types of Arrays

Array creation routines - NumPy v1.21 Manual

All 0s matrix

# All 0s matrix
print(f'1D: {np.zeros(5)}')
print(f'2D: {np.zeros((2,3))}')
print(f'3D: {np.zeros((2,3,4))}')

1D: [0. 0. 0. 0. 0.]
2D: [[0. 0. 0.]
[0. 0. 0.]]
3D: [[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]

[[0. 0. 0. 0.]
[0. 0. 0. 0.]
[0. 0. 0. 0.]]]


All 1s matrix

# All 1s matrix
print(f'1D: {np.ones(5)}')
print(f'2D: {np.ones((2,3))}')
print(f'3D: {np.ones((2,3,4))}')

1D: [1. 1. 1. 1. 1.]
2D: [[1. 1. 1.]
[1. 1. 1.]]
3D: [[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]

[[1. 1. 1. 1.]
[1. 1. 1. 1.]
[1. 1. 1. 1.]]]


Any number

# Any other number
np.full((2,2), 99)

array([[99, 99],
[99, 99]])


Any other number with the same shape as another array

# Any other number (full_like)
# Use the same shape as the provided array
np.full_like(a, 55)

array([[55, 55, 55, 55, 55, 55, 55],
[55, 55, 55, 55, 55, 55, 55]])


Random decimal numbers between 0 and 1

# Random decimal numbers between 0 and 1
# shape of (4,2)
np.random.rand(4,2)

array([[0.90796667, 0.18775268],
[0.36853663, 0.82186396],
[0.75724737, 0.09608278],
[0.5953758 , 0.57110868]])


Random decimal number from shape

# Random decimal number from shape
np.random.random_sample(a.shape)

array([[0.96539982, 0.72943229, 0.10863575, 0.84796304, 0.09610215,
0.88132328, 0.56848496],
[0.27198747, 0.2295634 , 0.40931032, 0.99669531, 0.90768254,
0.1626064 , 0.80310083]])


Random integer values

# Random integer values
# Max value (exclusive) and shape
np.random.randint(7, size=(3,3))

array([[6, 2, 0],
[4, 2, 4],
[4, 0, 4]])


Random integer values in a range

# Random integer values
# Range of values (exclusive) and shape
np.random.randint(4,7, size=(3,3))

array([[5, 4, 6],
[4, 5, 6],
[4, 6, 6]])


Identity matrix

# Identity matrix
np.identity(3)

array([[1., 0., 0.],
[0., 1., 0.],
[0., 0., 1.]])


Repeat array

# Repeat array
arr = np.array([1,2,3])
# Repeat arr 3 times element-wise
r1 = np.repeat(arr,3)
r1

array([1, 1, 1, 2, 2, 2, 3, 3, 3])


Repeat 2D array

# Repeat 2D array
arr = np.array([[1,2,3]])
# Repeat arr 3 times element-wise
r1 = np.repeat(arr,3, axis=0)
r1

array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])


Recreate this array

[1, 1, 1, 1, 1]
[1, 0, 0, 0, 1]
[1, 0, 9, 0, 1]
[1, 0, 0, 0, 1]
[1, 1, 1, 1, 1]

c = np.ones((5,5), dtype='int32')
c[1:-1, 1:-1] = 0
c[2,2] = 9
c

array([[1, 1, 1, 1, 1],
[1, 0, 0, 0, 1],
[1, 0, 9, 0, 1],
[1, 0, 0, 0, 1],
[1, 1, 1, 1, 1]], dtype=int32)


Be careful when copying arrays!!!

# Shallow copy
a = np.array([1,2,3])
b = a
b[0] = 100
a

array([100,   2,   3])

# Deep copy
a = np.array([1,2,3])
b = a.copy()
b[0] = 100
a

array([1, 2, 3])


## Mathematics

Mathematical functions - NumPy v1.21 Manual

a = np.array([1,2,3,4])
a

array([1, 2, 3, 4])


a + 2

array([3, 4, 5, 6])


Subtract

a - 2

array([-1,  0,  1,  2])


Multiply

a * 2

array([2, 4, 6, 8])


Divide

a / 2

array([0.5, 1. , 1.5, 2. ])


Shorthand

a += 2
a

array([3, 4, 5, 6])


b = np.array([1,0,1,0])
a + b

array([4, 4, 6, 6])


Exponents

a ** 2

array([ 9, 16, 25, 36])


Sine

# Take the sin
np.sin(a)

array([ 0.14112001, -0.7568025 , -0.95892427, -0.2794155 ])


Cosine

# Take the cosine
np.cos(a)

array([-0.9899925 , -0.65364362,  0.28366219,  0.96017029])


### Linear Algebra

Linear algebra (numpy.linalg) - NumPy v1.21 Manual

a = np.ones((2,3))
a

array([[1., 1., 1.],
[1., 1., 1.]])

b = np.full((3,2),2)
b

array([[2, 2],
[2, 2],
[2, 2]])


Matrix multiplication

# Matrix multiplication
np.matmul(a,b)

array([[6., 6.],
[6., 6.]])


Find the determinant

# Find the determinant
c = np.identity(3)
np.linalg.det(c)

1.0


### Statistics

stats = np.array([[1,2,3],[4,5,6]])
stats

array([[1, 2, 3],
[4, 5, 6]])


Get lowest value in array

# Get lowest value in array
np.min(stats)

1


Get lowest value in array along specific axis

# Get lowest value in array along specific axis
# axis=0: min values in each column
np.min(stats, axis=0)

array([1, 2, 3])


Get lowest value in array along specific axis

# Get lowest value in array along specific axis
# axis=1: min values in each row
np.min(stats, axis=1)

array([1, 4])


Get highest value in array

# Get highest value in array
np.max(stats)

6


Sum up values in array

# Sum up values in array
np.sum(stats)

21


Sum up values in array across axis

# Sum up values in array across axis
# axis=0: sum values in each column
np.sum(stats, axis=0)

array([5, 7, 9])


Sum up values in array across axis

# Sum up values in array across axis
# axis=1: sum values in each row
np.sum(stats, axis=1)

array([ 6, 15])


## Reorganizing Arrays

Note: New shape needs to maintain the same number of values

before = np.array([[1,2,3,4],[5,6,7,8]])
print(before)
print(f'Shape: {before.shape}')

[[1 2 3 4]
[5 6 7 8]]
Shape: (2, 4)


**Reshape from (2,4) to (8,1) **

# Reshape array
# Reshape from (2,4) to (8,1)
after = before.reshape((8,1))
after

array([[1],
[2],
[3],
[4],
[5],
[6],
[7],
[8]])


Reshape from (2,4) to (4,2)

# Reshape array
# Reshape from (2,4) to (4,2)
after = before.reshape((4,2))
after

array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])


Reshape from (2,4) to (2,2,2)

# Reshape array
# Reshape from (2,4) to (2,2,2)
after = before.reshape((2,2,2))
after

array([[[1, 2],
[3, 4]],

[[5, 6],
[7, 8]]])


Vertically stacking vectors

# Vertically stacking vectors
v1 = np.array([1,2,3,4])
v2 = np.array([5,6,7,8])

np.vstack([v1,v2])

array([[1, 2, 3, 4],
[5, 6, 7, 8]])


Stack vectors multiple times

# Stack vectors multiple times
np.vstack([v1,v2,v2,v1])

array([[1, 2, 3, 4],
[5, 6, 7, 8],
[5, 6, 7, 8],
[1, 2, 3, 4]])


Horizontal stacks

# Horizontal stacks
np.hstack([v1, v2])

array([1, 2, 3, 4, 5, 6, 7, 8])


Combining Horizontal and Vertical Stacks

np.hstack([np.vstack([v1,v2,v2,v1]), np.vstack([v1,v2,v2,v1])])

array([[1, 2, 3, 4, 1, 2, 3, 4],
[5, 6, 7, 8, 5, 6, 7, 8],
[5, 6, 7, 8, 5, 6, 7, 8],
[1, 2, 3, 4, 1, 2, 3, 4]])


## Miscellaneous

# Load data from text file
# Pass in file path and the delimiter character that separates values
# Casts values to float
filedata = np.genfromtxt('data.txt', delimiter=',')


Cast array values to specific type

# Cast array values to specific type
filedata = filedata.astype('int32')


stats = np.array([[10,2,3],[-4,5,6]])
stats

array([[10,  2,  3],
[-4,  5,  6]])


Boolean mask for values greater than 3

# Boolean mask for values greater than 3
stats > 3

array([[ True, False, False],
[False,  True,  True]])


Index array using a boolean mask

# Index array using a boolean mask
stats[stats > 3]

array([10,  5,  6])


Index with a list

# Index with a list
a = np.array([1,2,3,4,5,6,7,8,9])
# List of indices
a[[1,2,8]]

array([2, 3, 9])


Check if any values in array return true for a boolean

# Check if any values in array return true for a boolean
np.any(a > 3, axis=0)

True


Check if all values in array return true for a boolean

# Check if all values in array return true for a boolean
np.all(a > 3, axis=0)

False


Use multiple conditions

# Use multiple conditions
((a > 3) & (a < 7))

array([False, False, False,  True,  True,  True, False, False, False])


Use multiple conditions with negation

# Use multiple conditions with negation
(~((a > 3) & (a < 7)))

array([ True,  True,  True, False, False, False,  True,  True,  True])


Test Array

test_array = np.arange(36).reshape(6, -1)
test_array

array([[ 0,  1,  2,  3,  4,  5],
[ 6,  7,  8,  9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])


Range of indices

# 2:4: range of rows to index
# 0:2: range of columns to index
test_array[2:4, 0:2]

array([[12, 13],
[18, 19]])


List of indices

# [0,1,2,3,4]: list of rows
# [1,2,3,4,5]: list of indexes for each row
test_array[[0,1,2,3,4], [1,2,3,4,5]]

array([ 1,  8, 15, 22, 29])


Combine range and list of indices

# [0,4,5]: The list of rows
# Columns 3 and later
test_array[[0,4,5], 3:]

array([[ 3,  4,  5],
[27, 28, 29],
[33, 34, 35]])


References: