Things to Know on Pandas for Data Scientists

In pandas, there are patterns and functions that every data scientist uses once in a project, in this article we will examine important functions and concepts. You can access the codes via the Github link at the end of the article. You can also access the summary note on GitHub after reading this article.

You Should Know These Topics
Work with Empty Value on Pandas

We will use the NumPy library to insert null values, null values are found in almost all datasets, we can use pandas to edit this data. Numpy offers many useful functions, you can customize these easy-to-use functions.

pip install pandas
pip install numpy

After installing the libraries, you can call them in the project and create a dataset. If you don’t know how to create it, you can access it from the content section at the top.

import pandas as pd
import numpy as np

data = {"Column 1": [20 , 30 , np.nan] , "Column 2": [10 , np.nan , np.nan]}
datasets = pd.DataFrame(data)

Let’s start with deleting the empty data. You can delete the row containing empty data or the row with the dropna function. The only argument of this function that you can use with or without arguments is the axis.

dataframe.dropna(axis)
# If axis is 0, row deletes, if 1 deletes columns

You can assign a value instead of deleting it, using the fillna function, which prints the value you give to the argument to nan values. It accepts values such as int, bool, string, char.

dataframe.fillna(value)
# The value argument is printed to all nan.
Group Data in Dataset

Grouping data is important in analyzing data. Pandas library groups all your data with a single function. Let’s examine this function. Groupby function groups your data according to columns.

group = dataframe.groupby("Column Name")

group.count() # gives the number of row values.
group.mean() # averages the rows
group.max() # gives the highest value for that row.
Concatenate Dataframe with Concat and Merge

We can combine more than 1 data sets, so pandas offer us two functions, merge, and concat. Let’s start with the first Concat function. We will combine 2 datasets. You can use the above method within these 2 datasets.

We are creating two separate data frames. If you noticed, we change the index numbers in the second one, because we want the indexes to be 0 again when they are merged. The second will be added to the end of the first in the merge process.

The merge function operates on a column that is the same, if there is a different column in two data frames, it merges that column. It is asked over which values to combine the argument on. This value must also be in every data frame.

Leave a Reply

Your email address will not be published. Required fields are marked *