What is ā” NumPy
ā” NumPy, which stands for Numerical Python, is an opensource library that allows users to store large amounts of data using less memory and perform extensive operations (mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation, etc) easily using homogenous, one-dimensional, and multidimensional arrays.
The basic data structure of NumPy is a ndarray, similar to a list.
š” An array in NumPy is a data structure organized like a grid of rows and columns, containing values of the same data type that can be indexed and manipulated efficiently as per the requirement of the problem.
Difference between NumPy and Python standard List
The three most important differences between NumPy arrays and standard Python sequences are:
NumPy Array | Python Sequences (list, tuple, range) | |
---|---|---|
Creation Size | Fixed size | Python list can grow dynamically |
Datatype | Elements are of same datatype | Elements can be of multiple datatypes |
Speed | Fast as its partially written in C | Slower compared to NumPy |
Why use Numpy: Computation time
A python list can very well perform all the operations that NumPy arrays perform; it is simply a fact that NumPy arrays are faster ā” and convenient when it comes to large complex computations.
Let's add two matrix of 9 million elements each to see the computation time.
import time
import numpy as np
# python standard list
list_A = [i for i in range(1,9000000)]
list_B = [j**2 for j in range(1,9000000)]
t0 = time.time()
sum_list = list(map(lambda x, y: x+y, list_A, list_B))
t1 = time.time()
list_time = t1 - t0
print ("Time taken by Python standard list is ",list_time)
# numpy array
array_A = np.arange(1,9000000)
array_B = np.arange(1,9000000)
t0 = time.time()
sum_numpy = array_A + array_B
t1 = time.time()
numpy_time = t1 - t0
print ("Time taken by NumPy array is ",numpy_time)
print("The ratio of time taken is {}".format(list_time//numpy_time))
Time taken by Python standard list is 0.6801159381866455
Time taken by NumPy array is 0.04106783866882324
The ratio of time taken is 16.0
You can notice that NumPy is a lot faster than the list. Below is a table to show the difference between the python standard list and NumPy computation speed on different operations.
Size of each matrix | Type of operation | Time taken by list | Time taken by numpy | Ratio (List Time / Numpy Time) |
---|---|---|---|---|
9 million | Addition (+) | 0.56s | 0.017s | 32.0 |
9 million | Subtraction (-) | 0.61s | 0.016s | 36.0 |
9 million | Multiplication (*) | 0.69s | 0.016s | 42.0 |
9 million | Division (/) | 0.51s | 0.022s | 23.0 |
From the above table, we can conclude that NumPy is a lot faster than the python standard list. In the real world when the data is in billions and the operation are more complex, this ratio will be even bigger.
Installing NumPy
To start working with NumPy, you need to install it and you can't go wrong if you follow instructions from numpy official website.
[Optional]: Follow this guide to install python, if you don't have it already installed. It's not required but it's ideal to install python packages inside a virtual environment to avoid version-related conflicts in the future.
Basics of Numpy
As a prerequisite, you will need to know beginner-level python. See this Python tutorial for refreshing your concepts.
In the above image array is an object of ndarray class of the NumPy library.
Whenever you work with a dataset, the first step is to get an idea about the dataset array. Four important attributes of NumPy array to get information about the dataset are:
.ndim: returns number(int) of dimensions (axis) of the array.
.shape: returns a tuple of n rows and m column (n,m).
.size: returns a number(int) of total elements in the array.
.dtype: returns an object of numpy.dtype that describes the type of elements in the array.
Below is a code snippet of the attributes described above.
array = np.array([[1,2,3],[4,5,6]]) # Creating NumPy array from list
print("Dimension: ",array.ndim, type(array.ndim))
print("Shape: ",array.shape, type(array.shape))
print("Size: ",array.size, type(array.size))
print("Datatype: ",array.dtype, type(array.dtype))
print("Itemsize: ",array.itemsize, type(array.itemsize))
print("Data: ",array.data, type(array.data))
Dimension: 2 <class 'int'>
Shape: (2, 3) <class 'tuple'>
Size: 6 <class 'int'>
Datatype: int64 <class 'numpy.dtype[int64]'>
Itemsize: 8 <class 'int'>
Data: <memory at 0x7f2d807312b0> <class 'memoryview'>
Array Creation
A NumPy array is created by passing an array-like data structure such as python's list or a tuple.
Let's create a 0-D, 1-D, 2-D, and a 3-D array from a list.
0-D array:
np.array(11)
1-D array:
np.array([1, 2, 3, 4, 5])
2-D array:
np.array([[1, 2, 3], [4, 5, 6]])
3-D array:
np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
array_0D = np.array(11)
array_1D = np.array([1, 2, 3, 4, 5])
array_2D = np.array([[1, 2, 3], [4, 5, 6]])
array_3D = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(array_0D)
print(array_1D)
print(array_2D)
print(array_3D)
11
[1 2 3 4 5]
[[1 2 3]
[4 5 6]]
[[[1 2 3]
[4 5 6]]
[[1 2 3]
Like the python standard list, here are 7 ways to create a NumPy array.
.array([1,2,3]): Returns array from list.
.array((1.1,2.2,3.3)): NumPy array from tuple.
.zeros((2,3)): Returns array filled with zeros (2 rows, 3 columns).
.ones((2,3)): NumPy array filled with ones (2 rows, 3 columns).
.empty((2,4)): Returns array of arbitary data of given shape and type.
.arange((2,10,2)): Returns evenly spaced values within a given range. Similar to python range().
.linspace((2,4,9)): Return evenly spaced 9 numbers between 2 and 4.
array_list = np.array([1,2,3], dtype=int) # From List
array_tuple = np.array((1.1,2.2,3.3)) # From Tuple
array_zeroes = np.zeros((2,3)) # Array of zeroes: 2 rows and 3 columns
array_ones = np.ones((2,3)) # Array of ones: 2 rows and 3 columns
array_empty = np.empty((2,4)) # Array of zeroes: 2 rows and 3 columns
array_arange = np.arange(2,10,2) # Similar to python range()
array_linspace = np.linspace(2,4,9) # Array of 9 numbers between 2 and 4
Just like dtype=int parameter, you can make use of others parameters like copy, order, subok, ndim, like. You can explore other NumPy arrays parameters.
Let's practice some methods to create arrays
š” Tip: Use help to see syntax when required
help(np.zeros)
array([[ 0.],
[ 0.]])
>>> s = (2,2)
>>> np.zeros(s)
array([[ 0., 0.],
[ 0., 0.]])
>>> np.zeros((2,), dtype=[('x', 'i4'), ('y', 'i4')]) # custom dtype
array([(0, 0), (0, 0)],
dtype=[('x', '<i4'), ('y', '<i4')])
Create a 1D array of ones.
arr = np.ones(9)
print(arr)
print(arr.dtype)
[1. 1. 1. 1. 1. 1. 1. 1. 1.]
float64
Notice that, by default, NumPy creates a data type float64. Let's provide dtype explicitly.
arr = np.ones(9, dtype=int)
print(arr)
print(arr.dtype)
[1 1 1 1 1 1 1 1 1]
int64
Create a 4x3 array of zeroes.
arr = np.ones((4,3), dtype=int)
print(arr)
[[1 1 1]
[1 1 1]
[1 1 1]
[1 1 1]]
Create an array of integers between 3 to 7.
arr = np.arange(4,7)
print(arr)
[4 5 6]
Create an array of integers from 5 to 20 with a step of 2
arr = np.arange(5,21,2)
print(arr)
[ 5 7 9 11 13 15 17 19]
Create an array of random integers of size 10.
arr = np.random.randint(5,size=10)
print(arr)
[3 2 2 0 4 0 1 3 2 0]
Create an array of random integers between 6 and 9 of size 10.
arr = np.random.randint(7,9,size=10)
print(arr)
[8 8 7 7 8 8 8 7 7 7]
Create a 2x3 2D array of random numbers.
arr = np.random.random([2,3])
print(arr)
[[0.9664729 0.33623868 0.52633769]
[0.80454667 0.68146984 0.08063325]]
Create an array of size 10 between 1.5 and 2.
arr = np.linspace(1.5,2,10)
print(arr)
[1.5 1.55555556 1.61111111 1.66666667 1.72222222 1.77777778
1.83333333 1.88888889 1.94444444 2. ]
That's all for the basic ways of creating arrays. You can also explore these other 4 ways to create arrays as well:
.full(): Create a constant array of any number ānā
.tile(): Create a new array by repeating an existing array for a particular number of times
.eye(): Create an identity matrix of any dimension
.random.randint(): Create a random array of integers within a particular range
Basic Operations
NumPy can perform a variety of operations, the very basics include, addition, subtraction, and multiplication. Below are a few basic operations that can be done in NumPy without using loops.
Create a NumPy array to store the marks of 5 students.
marks = [1, 2, 3, 4, 5]
marks_np = np.array(marks)
print(marks_np)
[1 2 3 4 5]
Add marks of 5 subjects of two different students.
marks_A = [10,20,10,20,14]
marks_B = [23,12,43,12,43]
marks_np_A = np.array(marks_A)
marks_np_B = np.array(marks_B)
total = marks_np_A + marks_np_B # Add using + operator
print(total)
[33 32 53 32 57]
Convert weight of 5 students from kg to gram
weight = [45, 55, 53, 63, 60] # In KG
weight_np = np.array(weight)
weight_in_gram = weight_np * 1000 # 1kg = 1000gm
print(weight_in_gram)
[45000 55000 53000 63000 60000]
Calculate the BMI of 5 students. To calculate BMI we need
Two arrays of height and weight
Apply the formulae weight_in_kg / (height_in_m ** 2)
heights_in_inch = [71,72,73,74,75]
weights_in_lbs = [195, 180, 250, 230, 200]
First, let's convert height from inch to meter and weight lbs to kg
height_in_m = np.array(heights_in_inch) * 0.0254
weight_in_kg = np.array(weights_in_lbs) * 0.453592
Now, we have converted the array into the right units, let's calculate BMI
BMI = weight_in_kg / (height_in_m ** 2)
print("BMI",BMI)
BMI [27.19667848 24.41211827 32.98315848 29.52992539 24.99800911]
Here is a list of 5 common basic functions in NumPy ndarray:
.sum: returns sum of elements over a given axis
.min: return minimum number along a given axis.
.max: return maximum number along a given axis.
.cumsum: return cumulative sum of elements along a given axis.
.mean: return average of elements along a given axis.
NumPy also provides universal functions like sin, cos, and exp, these are also called ufunc.
Indexing, Slicing, and Iterating
bmi_first_element = BMI[0] #First Element
bmi_last_element = BMI[1] # second element
bmi_first_five_elements = BMI[0:5] # elements 1-5
bmi_last_five_elements = BMI[-1:] # elements 1-5 from the last
Filter BMI array where BMI > 23
# Conditional Filter
BMI_filtered = BMI[BMI > 23]
print(BMI_filtered)
[27.19667848 24.41211827 32.98315848 29.52992539 24.99800911]
Now you know the basics to work with a NumPy array and you should be able to create arrays and perform operations on them.