In this tutorial, we will look at what types of data are more efficient to work with. Before starting the article, we will explain these two concepts, then we will evaluate them in a project on the Python programming language.
What is quality data?
The data that meet the characteristics such as reliability is called quality data, but the concept of quality data may vary. The quality data set to be used in this article will be taken from a real and reliable source and will be used in the project.
What is big data?
If you have been interested in data science before, you probably have not heard of this concept, this concept indicates the redundancy of data, it can also be considered as a kind of big data.
What will we investigate in this article?
In this project, we will process which data set type will produce more beneficial results. In order to get better quality results, I will get the quality data set from an official environment, and I will make the big data set with the values I have created.
The big data set, which I will make an important note, will contain all of the data stored from different sources. Quality data set will be created with data coming from a single environment (reliable).
Project preparation phase
You can get the big data set and quality data set to be used at the end of the article. Let’s proceed by examining all the codes, first, let’s create a linear regression model on quality data.
Our large data set is a weather set, the data we use in this set are air temperature and rainfall, the program gives a deviation error of about 0.13, which can be considered perfect, this set consists of 96453 data.
This dataset contains data on amazon book sales, there are 550 data, so approximately 95903 data is less than the other dataset. However, it still has an error rate of only 0.16.
As a result, although the error rate of big data is less than quality data, quality data is superior in terms of performance.