For programming and computer science, data structures are the main subject. It is utilized in nearly all fields of computer science.
In this post, I will describe the data structures that are crucial for data science and programming, as well as what data structures mean in general.
I will also add documentations to use for get more information in this article. You can use this article as guide for quick introduction to data structures.
What are Data Structures?
In computer science, a data structure is a data organization, management, and storage format that enables efficient access and modification according to Wikipedia.
If this term seems complicated, let me clarify. These are structures that can store, organize and help us modify multiple data (like lists that you use everyday).
Lists, dictionaries, and arrays are simple data structures present in all programming languages. Although their syntax varies, their logic stays the same, so you can adapt the examples we will conduct in Python to languages such as Java and C++.
Caution some data types can be special for language or some data types can need external libraries and sources to use.
Dictionaries , Maps , Hash Tables
When you start working with these structures, which I will give you important information instead of explaining at length, you will start to discover some features yourself.
In Python, dictionaries are for storing an arbitrary number of data, each with its own keywords. Maps are also called hash maps and lookup tables or associative arrays.
A dictionary makes easier to organize data associated and presents it in a more organized form.
In dictionary values stored with keywords and programmer can use these keywords after access values.
For example, a dictionary may store attributes (data) such as age, date of birth, horoscope based on the name keyword.
data = {
"Mark": 12,
"Alice": 23,
"David": 8,
}
data["Mark"]
Output: 12
You can also type curly braces to fit on a single line, but dictionaries with large data will mess up, so it’s better to use indentation.
Let’s make another example to show power of dictionaries. In this example we will create a dictionary to show 2 different lists holding names and ages.
dataDict = {
'names': ['David' , 'Emma'],
'ages': [12 , 19]
}
print(dataDict['names'][0])
Output: David
In here we define a names list (or array because in here we store values with same data type) and we called names from dataDict and we get first index of this list.
Since we will come back to the indexing logic later, for now you can think that index 0 corresponds to the first value in a list.
Documentation For More: Python Official
OrderedDict & DefualtDict & ChainMap
We have dictionaries with different features. Although each of them has a different purpose, I collected them under the same title because they are in the same library.
You need the collections library to access these data structures so don’t forget to install it on your computer.
As the name suggests, OrderedDict is a dictionary that does not forget the order of your keys and acts accordingly.
import collections as cs
dict1 = cs.OrderedDict(one=1,two=2)
print(dict1)
# Adding New Data
dict1["three"] = 3
print(dict1)
([('one', 1), ('two', 2)])
([('one', 1), ('two', 2), ('three', 3)])
You can use defaultdict to prepare messages that will inform you when you try to pull data that is not in the dictionaries.
Preparing the error messages in a way that is easy for you will help you in the debug phase (also it’s good way to show users their mistakes).
from collections import defaultdict
# Error message
def error():
return "Index Not Found"
x = defaultdict(error)
# 0 to 9 dictionary
for i in range(0,10):
x[i] = i
# Try get 10'th index
print(x[9])
# Try get 11'th index
print(x[11])
9
Index Not Found
Do you have multiple dictionaries, let’s turn them into a single dictionary with Chainmap.
from collections import ChainMap
dict1 = {1: 'alice', 2: 'emma'}
dict2 = {3: 12, 4: 25}
main = ChainMap(dict1, dict2)
print(main[1] , main[3])
Output: alice 12
ChainMap Documentation (Official)
OrderedDict Documentation (Official)
DefualtDict Documentation (Official)
Array Data Structures
Arrays are one of the most widely used and well-known data structures. In this section, we’ll talk about how arrays work.
Arrays are called structures that hold the same data together, each array is a whole, and the data in it is of the same data type.
Although lists and arrays are separated in other programming languages, in Python they both have the same syntax. If you use different data, it becomes a list, if you use the same data, it becomes an array.
But there is another way to use arrays in Python. NumPy a scientific computing library for Python. We will see NumPy arrays and lists in this section together.
# from 0 to 10 value
arr = [1,2,3,4,5,6,7,8,9,10]
# String Array
arr1 = ["a" , "b" , "c"]
# Get First Indexing
arr1[0]
# Get from 0 to 4
arr[0:4]
# Deleting Element
del arr[0]
# Adding Element
arr.append(11)
Output 1: a
Output 2: [1, 2, 3, 4]
You can create 3, 2 and 1 dimensional arrays in NumPy. Let’s start with Array creations.
array1D = np.array([1, 2, 3, 4])
array2D = np.array([[1, 2], [3, 4]])
array3D = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
Also while creating new arrays you should consider dtype of this array. Don’t forget unlike built-in list in Python NumPy don’t allow for different data types in same array.
arr = np.array(['a', 'b', 'c'], np.dtype('U'))
In here we defined our values in arr as Unicode string. You can get more information about NumPy and NumPy Arrays from here (Official Documentation).
Tuple Data Structures
Tuples are another data structure capable of holding any data type components. With this flexibility, however, that the data is less tight than in a typed array.
In this data type, the data cannot be dynamically deleted, edited or any action can be taken, although it is often seen as a disadvantage, you can turn it in your favor for security.
tuple = (1 , 2 , 3)
tuple[0]
tuple1 = ("x" , 1 , 1.25)
tuple1[2]
# you'll get error
del tuple[0]
tuple[1] = "y"
Output: 1
Output: 1.25
Official Documentation For Tuples and Sequences
Using Array with Array Library
Python’s array module stores fundamental C-style data types like bytes, 32-bit integers, floating-point numbers, and so on in a space-efficient manner.
Unlike lists it only accepts data with the same data type, so it is impossible to enter different data types.
Because of this constraint, array.array objects with many elements are more space-efficient than lists and tuples.
# Accessing array
from array import array
# use type code
arr = array("f" , [1.0 , 1.2])
Since the working logic of arrays in NumPy is the same, I will not delve into them further. If you want more information about this library, you can read it here.
String – Arrays of Unicode Characters
String objects save space since they are densely packed and specialize in a particular data type. If you want to save Unicode text, you should use a string.
In this section, we will save not only texts but also emojis, as I said, every Unicode must be stored in a string object.
str = "55555"
emoji = "😀"
print(str , emoji)
Output: 55555 😀
Bytes & ByteArray
Bytes objects are structures that can hold numbers between 0 and 255. If you exceed 255, you will get an error, the same is true for numbers below 0.
They are high-performance and stable structures like strings, if the numbers you are dealing with are in this value range, it will be useful to use bytes.
x = bytes([1 , 2 , 3])
y = bytes([-1 , 2 , 3])
z = bytes([100 , 200 , 300])
Output: b'\x01\x02\x03'
Output: error
Output: error
With the indexing method, you can pull your data, but you cannot edit, add or delete like tuple structures.
However, you can add, update and delete data with the byte array class. bytes rules apply, they can only be manipulated.
x = bytearray((10 , 20 , 30))
# Update
x[1] = 11
# Remove
del x[0]
# Adding
x.append(40)
# Show all data
for i in x:
print(i)
11
30
40
Writing Custom Class For Productivity
For more control, more power, all you need is yourself. Don’t be afraid to create and use your own classes. Creating complex classes can be tiring at times, but it will increase your productivity.
When you want to add business logic and activity to your record objects via methods, creating a custom class is a fantastic solution. However, this means that these things are no longer just data objects.
class Student:
def __init__(self, name, note):
self.name = name
self.note = note
x = Student("David" , 55)
y = Student("Mark" , 35)
# Access Data
print(x.name , x.note)
print(Student)
Output: David 55
Output: <main.Student object at 0x7f53925c2400>
If you’ve noticed, it doesn’t exactly look like a data class. The easiest way out of this situation is to use @dataclass, Python 3.7+ is required.
from dataclasses import dataclass
@dataclass
class Student:
name: str
note: int
x = Student("David" , 55)
# Write class
print(x)
# Write note data for x
print(x.note)
Output: Student(name='David', note=55)
Output: 55
Sets & Multisets Data Structures
The set type is the built-in set implementation in Python. It’s mutable and allows for the dynamic insertion and deletion of elements.
Python’s sets are backed by the dict data type and share the same performance characteristics. Any hashable object can be stored in a set.
set = {1 , 2 , 3}
set.add(4)
set.remove(3)
If it doesn’t seem like it’s structurally different from an array, don’t worry, I’ll summarize the important differences.
While renewed items can be found in lists and arrays, sets cannot. every data should be unique, like passwords, usernames, nicknames…
Frozen Sets
There is almost no difference between the set data structure and frozen set, the only difference is that value (data) can’t be changed in set.
frozen = frozenset({"x" , "y" , "z"})
frozen.add("k")
Output: Erorr
Using Multiplesets With Counter
The counter class in the Collections library allows us to combine more than one set. if you have more than one set but some have the same values it will not add the same values to the new set instead, it shows how much the key is used by increasing the value.
from collections import Counter
merge = Counter()
# first counter update
fruits = {"apple" , "banana" , "orange"}
merge.update(fruits)
print(merge)
# second counter update
fruits1 = {"apple" , "banana" , "watermelon"}
merge.update(fruits1)
print(merge)
{'orange': 1, 'apple': 1, 'banana': 1})
{'apple': 2, 'banana': 2, 'orange': 1, 'watermelon': 1})
Although it carries each value separately in the set, it only stores the main key in length, you can briefly view it with the code below.
# Unique Value
len(merge)
# Total Elements
sum(merge.values())
# Print unique keys
for i in merge:
print(i)
Output: 4
Output: 6
-------------
apple
orange
banana
watermelon
Stacks (LIFOs) in Python
A stack is a collection of items that fast in/out semantics (LIFO) for insert and delete are supported.
Unlike Arrays and lists, you can’t do random access, you need to use functions for insertion and deletion.
Using Lists For Stacks
You can add the data to the end with append and remove it from the LIFO queue with the help of pop, but let’s not say that the performance is very bad as the list grows.
stack = []
stack.append(1)
stack.append(2)
stack.append(3)
print(stack)
stack.pop()
stack.pop()
print(stack)
Output: [1,2,3]
Output: [1]
Every time you use append, it is appended in the last index, and every time the pop function runs, it deletes in the last index.
Using Deque For Stack
The difference of deque from the list also supports adding and removing elements at the beginning of the array in fixed time.
Therefore, it works more efficiently and faster than a list. It also supports random access.
You may lose performance if you try to delete the data in between with the deque, the main reason is that all the elements up to the two ends move to make room or fill the gap.
from collections import deque
stack = deque()
stack.append("a")
stack.append("b")
stack.append("c")
print(stack)
print(stack.pop())
print(stack.pop())
print(stack)
Output: deque(['a', 'b', 'c'])
Output: deque(['a'])
Queues (FIFO’s) in Python
The logic on the stack is slightly different here, where first-in, first-out (FIFO) is applied, while in the stack, first-in, last-out is applied.
We can use the list and deque data structures that we use in the stack here also use the Queue class in the queue.
queue = []
queue.append("x")
queue.append("y")
queue.append("z")
print(queue)
queue.pop(0)
queue.pop(0)
print(queue)
Output: ['x', 'y', 'z']
Output: ['z']
Using Deque For Queue
Everything in the stack is valid, the only difference is that we use popleft instead of the pop function, for use first-in-first-out logic.
from collections import deque
queue = deque()
queue.append("x")
queue.append("y")
queue.append("z")
print(queue)
queue.popleft()
queue.popleft()
print(queue)
Output: deque(['x', 'y', 'z'])
Output: deque(['z'])
Using Queue Class
A queue is a structure by which we can determine how much data a queue can hold and store. It is mostly used to implement a queue.
You can create an infinite queue by setting the max size parameter to 0 and this conforms to the FIFO rule.
from queue import Queue
queue = Queue(maxsize = 0)
# Adding element
queue.put(10)
queue.put(20)
queue.put(30)
print(queue.queue)
# Removing element
queue.get()
queue.get()
print(queue.queue)
Output: deque([10, 20, 30])
Output: deque([30])
Thanks to Yuqing For Everything…
My life and my everything. Please stay with me forever! Merry Christmas ❤️