NumPy For Machine Learning & Data Science

In machine learning and data science, we mostly deal with mathematical operations, many libraries on Python currently offer help for mathematical or scientific operations.

One of them is NumPy, which you may have heard of. NumPy is a library that allows us to work with multidimensional arrays and matrices and provides us with high-level mathematical functions.

Since NumPy is C-based math and scientific library, it is also good for performance and efficiency.

1 – NumPy Array – Fast & Functional

We said how good NumPy is in speed and performance. In this section, we will look at how to use arrays in NumPy.

import numpy as np

# Basic List - Slow
x = [1 , 2 , 3]

# NumPy Array - Fast
np.array(x)

NumPy Arrays also have their own functions, every data scientist should know and use NumPy Arrays. let’s look at the performance difference with the list.

We will compare the performances as follows, we will pull the data in both data types and we will take the exponent of this data with themselves and print it to another variable.

# Libraries
import numpy as np
import time

x = np.arange(0 , 10000)
y = list(range(0 , 10000))

# Take Start Time
t1 = time.time()

for i in x:
  var = i ** i

# Take End Time
t2 = time.time()

# Find Work Time
print(t2 - t1)

# Take Start Time
t1 = time.time()

for j in y:
  var = j ** j

# Take End Time
t2 = time.time()

# Find Work Time
print(t2 - t1)
NumPy Array: 0.0036
List: 5.0438

Yes, now that we see the obvious performance difference, let’s move on to the other features, our other important feature will have the “shape of arrays & reshaping”.

2 – Shape of Arrays & ReShaping

Arrays are structures that store data in a straight line, and we can learn their shape with the shape object NumPy provides us. If you don’t know data types in python, read this.

When you make arrays in more than one dimension, a matrix is formed. Matrices can only hold the same value. Matrices that hold different values are called data frame.

x = [np.arange(0 , 5) , np.arange(0 , 5)]
x = np.array(x)

print(x)
[0,1,2,3,4],
[0,1,2,3,4]

There is a 2-dimensional array here. When you create a separate list within the list, your list grows one dimension more.

It is possible to reshape arrays, so you can convert a 2D array to 1D and a 1D array to 2D, also you can convert it 1D or 2D array to a 3D array. (provided the right conditions are met)

x = np.arange(1 , 7)

x.reshape(2 , 3)
[1 , 2 , 3]
[4 , 5 , 6]

Enter the dimension in the first parameter and the number of data to be added to each dimension in the second parameter. You can manipulate the shapes with this method.

3 – Array Indexing

A data scientist should pull the data and examine it so that it can detect null data etc. problems and optimize data.

Array indexing not only helps you fix issues like Dummy Variable but also gives you a preview of the dataset.

# Creating 1D Array
x = np.arange(0 , 100)

# Indexing 1D Array
x[12]
x[0:5]
Output: 12
Output2: 0 1 2 3 4

Indexing methods change in multidimensional variables, including size and index number.

While getting the data by giving an index number in a one-dimensional array, we must also enter which column it is in, in a 2-dimensional array.

# Creating 2D Array
x = [range(0 , 100) , range(0 , 100)]
x = np.array(x)

# Indexing 2D Array
x[0 , 12]
x[1 , 6:9]
Output: 12
Output2: 6,7,8

In the first example, the 12th index of the first column is taken, while in the second example, rows 6 to 9 of the 2nd column are taken.

4 – Array Slicing

Array, List, Dataframe Slicing is an extremely important concept for machine learning algorithms, it’s great for leaving behind useless data and focusing only on the data you need.

Even if we are working on NumPy, slicing can be done on the data frame. Since this article is about NumPy, it will only be performed on NumPy arrays.

data = [range(0 , 10) , range(0 , 10)]
data = np.array(data)

sliced = data[0 , 5:8]
sliced
Output: [5 , 6 , 7]

You can use the same slicing method to delete irrelevant columns or rows in large data sets. The slicing method can be thought of as taking the indexes and assigning them to another variable.

5 – Operations to Array

You can perform different operations on arrays. In this section, we will examine simple operations that can be performed on arrays.

x = np.arange(0 , 50)
x.sum()
Output: 1225

Now let’s examine the median and average functions, each of these functions only works on the NumPy array.

x = np.array([22 , 77 , 33 , 21 , 0])

x.mean()
x.median()
Mean: 30.6
Median: 22.0

6 – Random Arrays & Special Arrays

It is a very convenient tool for random arrays, tests, and simulations. It can be used when you need random values to test your algorithm or just want to give it a try your algorithm.

Maybe you may need to sum each value by 1 for a machine learning formula, for such cases, there are special arrays.

There are several ways to create an array with random data. You can use different functions such as randn, rand, randint, random.

from numpy import random

# Create 2D Array 
np.rand(2, 5)

# Create 1D Array
np.random(10)

# Array With Negative 
np.randn(2 , 2)

Now let’s look at univariate arrays that you can use in machine learning formulas or other math formulas.

When these arrays are created it contains 1 variable, you can use them when you need constant variables that hold the same value.

# Create Array With Ones
np.ones([2 , 2] , int)
np.ones(5 , float)

# Create Array With Zeros
np.zeros([2 , 2] , float)
np.zeros(5 , int)
[1 , 1],
[1 , 1]

[1.,1.,1.,1.,1.]

[0. , 0.],
[0. , 0.]

[0,0,0,0,0]

7 – Maximum , Minimum , Absolute

Other operations you can perform on Arrays: find the largest and smallest element, get the absolute value of all elements.

x = np.array([1,-2,3,-4,-5])

x.max()
x.min()
x.abs()
Max: 3
Min: -5
Abs: [1,2,3,4,5]

Even though the absolute value function returns as an array, you can easily convert it to a string with type conversions. Other functions return an integer.

8 – Trigonometric Functions (Sinus & Cosinus)

You can’t be a mathematician or scientist without calculating sines and cosines 🙂 NumPy includes two functions to prevent you from writing long code for sin and cos.

np.sin(np.pi/2.)
np.cos(np.pi/2.)
Output: 0.866
Output: 0.5

9 – Sorting , Searching , Counting

On NumPy, you can sort the arrays, search the elements in your arrays, and see how much the data in your array is used.

x = np.array([5,3,1,2,4])

arr1 = np.sort(x,axis=0)
arr2 = np.sort(x)[::-1]
[1,2,3,4,5]
[5,4,3,2,1]

argmax , nanargmax , argmin You can search on arrays with the help of these functions, so you can easily find a particular item on the array.

x = [[1,2,3] , [6,9,11] , [88,56,77]]
x = np.array(x)

# Largest number index
np.argmax(x)

# Index Horizontal
np.argmax(x , axis=0)

# Index Vertical
np.argmax(x , axis=1)
[1,2,3],
[6,9,11},
[88,56,77]

Big: 6
Horizontal: [2,2,2]
Vertical: [2,2,0]

In the 2-dimensional array here, when you enter the axis as 0, you will get the horizontal index number, while if you set the axis as 1, you will get the vertical index number.

If you change the function to argmin this time it will give you the index of the smallest number instead of the largest number.

In nanargmax or nanargmin, null values are ignored and after this the desired index is searched.

x = [0 , 7 , 3  ,0]

np.count_nonzero(x)
np.count_nonzero(x , axis = 1)
Count: 2
[2,1]

With the help of count_nonzero, you can see how many of the data in your array are not 0. This can be useful for datasets with null values saved as 0.

10 – Creating Nan Value & Checking

The NumPy library contains very useful functions for creating, managing, and manipulating null values. NumPy is often used in the data cleaning phase.

In this section, we will examine topics such as how to create, update, delete null values with NumPy.

x = [1,np.nan,2,3,4,5]
x = np.array(x)

In large data sets, we may not be able to detect whether there are nulls or not, in such cases, it is important to use the isna() function.

np.isnan(x)
[False,  True, False, False, False, False]

It may be difficult to see True in structures containing large data, we can use any() function for this, even if 1 is True in this array, it will give us a message.

np.isnan(x).any()
Output: True

You can use the nan_to_num function in NumPy to fill the nan values. It takes 2 parameters, first the array, second the number to replace nan

nnp.nan_to_num(x , nan = np.mean(x[np.isnan(x) == False]))

If you want, you can disable these data by making 0 or you can create a new array separately from nan values with the help of isnan function.

Conclusion

This article covers important topics in NumPy, I tried to include features that are frequently used in data science and machine learning

Hope you like it, thanks for reading and happy coding…

Leave a Reply

Your email address will not be published. Required fields are marked *