Data Analysis Tutorial In Python

I will try to gather all the necessary information about data analysis in this series, and Python is next in the series we have done with the R programming language.

This series will consist of approximately 3 parts. In the first part, simple analysis functions, indexing, and pandas library will be covered. Let’s start by first examining series, data frames, and strings.

Series, Data Frames, and Arrays

Before learning these structures, let’s add the necessary modules to the project. Two modules will be used in this course. These modules are the pandas module and numpy module.

import pandas as pd
import numpy as np

Series is a data structure that stores data in one dimension, it can consist of strings or objects. You can call the data you want indexable with the help of the index.

Series1 = pd.Series([1 , 2 , 3 , 4 , 5])

Here we keep the numbers from 1 to 5 in the Series1 variable. If we want to change the index numbers, we can add another array next to it.

Series1 = pd.Series([1 , 2 , 3 ,4 , 5] , ["a" , "b" , "c" , "d" , "e"])

We will create an array next, but since it is not available in python, we use a numpy array because it is faster than lists. Its usage is similar to Series.

Arr1 = np.array([1 , 2 , 3 , 4 , 5])

Series are one-dimensional data structures, but arrays can also be used in 2 dimensions. To work in 2 dimensions, you add 2 arrays into an array.

Arr2 = np.array([1 , 2 , 3 , 4 , 5] , [10 , 20, 30, 40 , 50]]

We will see the advantages of working with 2 dimensions again in the index section. For now, it is enough to know how to make a 2-dimensional string. Next is the data frame, which is the most used data structure.

Data frames consist of rows and columns just like an SQL table. It is very easy to process data and is the most preferred data structure.

data1 = dict(a = 1 , b = 10 , c = 12 , d = 13)
data2 = dict(a = 3 , b = 22 , c = 12 , d = 145)

data3 = dict(First = data1 , Second = data2)
df = pd.DataFrame(data3)

It may sound a little more complicated at first, we create two dictionaries, these dictionaries are our columns, we throw the rows into it, then combine them in a single dictionary and write to the data frame data type.

Subsetting Data Types

After storing the data, we will need to access them, so we can do the indexing process, first, let’s try it through series.

Seri1 = pd.Series([1 , 2 , 3] , ["a" , "b" , "c"])

Serie1[0] # Accessing First Index
Serie1[0:2] # Accessing from 0 to 2
Serie1[0:] # Accessing from 0 to end

The indexing of this structure is similar to numpy arrays. Let’s try it on arrays.

Arr1 = np.array([[1 , 2 , 3] , ["a" , "b" , "c"]])
Arr1[0] # Accessing first array
Arr1[0 , 1] # Accessing first array second element

One-dimensional strings have the same index structure Data frame indexing method is a little different than others. Let’s examine these methods now, as a series. For 2-dimensional strings, the above indexing method is used.

data = {"First": [1 , 2 , 3] , "Second": ["a" ,"b" ,"c"]}
Df1 = pd.DataFrame(data)
Df1["First"] # Accessing first column
Df1[0:1] # Accessing first row

Series, Data Frames, and Arrays

Subsetting Data Types

Leave a Reply Cancel reply