We learned about matplotlib and pandas libraries in previous lessons. We are starting to learn the NumPy library which is as important as these, after finishing this library Tensorflow lessons will start.
NumPy library helps us to do mathematical operations and we can customize datasets by using them with pandas. NumPy offers us great features and is compatible with other libraries so let’s get started with the first lesson of the NumPy library.
What you should know before you start
- Data Visualization Like a Professional
- Things to Know on Pandas for Data Scientists
- Machine Learning For Beginner (1)
- Data Visualization And Analyze With Pandas And Matplotlib
What is NumPy?
numpy is a python library created to facilitate the operations in scientific research and data science projects, more scientific research is carried out in it. It is also used for statistical transactions.
Setting Up the Project
The library is easy to install, you can download and import it with the pip install command like pandas matplotlib. Since this project will be processed on Jupyter notebook, you do not need to do a pip install, you can directly call the library.
import numpy as np
Creating NumPy Array
The concept of the array is found by default in python, the difference of numpy array is less space, faster than lists, it has optimized functions. Using a numpy array is more convenient and useful than using lists.
import numpy as np np.array([1 , 2 , 3 , 4 , 5]) #1D array np.array([[1 , 2 , 3] , [4 , 5 , 6]]) #2D array x = np.array([[[1 , 2] , [3 , 4] , [5 , 6]]]) #3D array
The easiest way to find out how many dimensions the Array has is to use the ndim function.
import numpy as np arr = np.array([1 , 2 , 3] , [4 , 5 , 6]) arr.ndim
Indexing NumPy Array
Indexing is very similar to lists so it is easy to understand. Indexing is used to access data in the array. You can find an example indexing below.
import numpy as np arr = np.array([1 , 2 , 3]) #1D array arr[0] # Giving first index arr2D = np.array([[1 , 2 , 3] , [4 , 5 , 6]]) #2D array arr2D[0] # Giving first 1D array
Calling index one by one may sound boring. However, there is an easy method for this, you can pull the data from the beginning to the last number with “:”.
import numpy as np arr = np.array([1 , 2 , 3 , 4 , 5]) arr[0:3] # Returns values from 0 to 3.
Using NumPy with Series
We use the panda’s library to create a usable series with the Numpy series. Numpy can be used to create null values. You can also use Numpy’s linspace function to generate test data.
import pandas as pd import numpy as np data = np.array([np.linspace(0 , 10 , 20 Ser1 = pd.Series(data)
In the example above, 10 values are created. These values are equal to the second argument, ie 10, in the 10th index. The first argument is the number to start. Let’s creating null values are created.
import pandas as pd import numpy as np data = np.array([1 , 2 , np.nan , 4 , 5]) Ser1 = pd.Series(data)
We have two functions to fill or delete empty values, they are fillna() function and dropna function. Let’s first examine the more used fillna () function.
import pandas as pd import numpy as np data = np.array([1 , 2 , np.nan , 4 , 5]) Series = pd.Series(data) Series.fillna("Value")
The argument given to the function is the value to be written instead of the null value, the value of the argument can be of any data type. Now let’s examine our function that deletes columns and rows containing null values.
import pandas as pd import numpy as np data = np.array([1 , 2 , np.nan , 4 , 5]) Series = pd.Series(data) Series.dropna(axis = 0)
The dropna function is used to delete the column or row containing the Nan value. You can select the structure you want with the axis. If you use 0, it deletes the row. If you use 1, it deletes the column. (Series does not have columns, so it cannot be used as axis 1, you can use it over DataFrames.)