Python programming language is a very popular language today (in 2021), but there is a fact that no one can deny, Python is inefficient in terms of speed.
However, even though Python is slow, we have solutions to speed it up, and in this article, we will examine one of them, Cython.
Why is Python Slow?
1 – Interpreted Language
The first and biggest reason Python is slow is that it is interpreted rather than compiled, It also does not use JIT (just in time).
Let’s explain JIT, when you start using a code block, only that code block is compiled, other code blocks are interpreted, so unless you use it.
Which is one of the reasons why Java and C# are faster than Python, but this method requires you to wait longer the first time you run the code.
2 – Dynamic Type Language
In languages such as C++, Java, C#, you have to specify the data type of the variables, but in Python, it is handled automatically in the background.
When you change the variable type you created in Python, Python creates a 2nd variable in memory and deletes the 1st variable from memory, This process degrades performance.
The way Python tries to make everything dynamic has a bad effect on performance.
How to Boost Python?
There are several ways to speed up Python, such as using PyPy to add JIT to Python, using Cython to export data structures from C to Python.
Cython translates your Python code into C code, that is, it eliminates the performance disadvantages of an interpreted language by converting it to a compiled language.
You can directly feed Cython with Python code or you can increase the speed of your project with some changes, even if you leave code the same, Cython will speed up your code by 2 or 3 times.
Installing Cython on Jupyter Notebook
Jupyter Notebook, which is one of the environments where you can use Cython comfortably, allows you to access Cython through the browser.
First, let’s install Jupyter Notebook and run it through the terminal. You can add Jupyter notebook extension on Vs Code and open a file with the .ipynb extension.
$ pip install jupyterlab
$ jupyter notebook
When Jupyter Notebook is opened in the browser, when the connection with the server is complete, create your first code block and install Cython.
$ pip install cython
Let’s call Cython with the help of IPython, we can use a magic command for this, then whenever we need it, we can access it by typing its name.
# Call Cython
%load_ext Cython
# For Using
%%cython
In every block you need Cython, %%cython should be required. You don’t need to import it again, just %%cython expression is sufficient.
Install Cython on Local Machine
If you don’t want to use Jupyter Notebook or Google Colab, you can make it ready for Cython with a few installations on your machine.
$ pip install cython
After that, you need a C/C++ compiler like MinGW, you can quickly download it from here and complete its installation from here (these steps are for Windows users).
After finishing the installations, let’s see how to compile a file. First, we need to create a file with the .pyx extension, this is the file where we will write the codes.
def example(x):
sum = 0
for i in x:
sum += i
return sum
Now to translate this file, we need to create a Python file (with .py extension) and reference this file.
import distutils.core
import Cython.Build
distutils.core.setup(
ext_modules = Cython.Build.cythonize("file.pyx"))
Finally, we will use this .pyx file to create a C file and a compiled file over the terminal. For this process, use the command below.
$ python setup.py build_ext --inplace
Using C Data Types in Python
We fixed the problems that arise from Python being an interpreted language, now let’s remove the problems that arise from being dynamic.
When creating a variable in C, we specify the data type, so in a variable that will contain integers, we must write an int next to its name.
Specifying data types in advance is recognized by the compiler before the code is compiled, and the program runs faster, not trying to render in memory while the program is running, as in Python.
int = for integers long int = for long integers float = for decimal nums double = for big decimal nums char = for 1 character string = for text bool = for logical value long float = for long float long double = for larger doubles
These are general data types. Let’s take a look at the concepts of signed and unsigned, but there are unsigned char, etc., which are used for byte calculation. I will not cover the concepts.
- signed int = The given value can be both negative and positive.
- unsigned int = given value can only be positive negative values are invalid.
You can also use the concepts of signed and unsigned in data types such as long. By using unsigned, we can store positive numbers that are 2 times larger than signed, also taking up less memory space by too Long data type.
cdef signed int = 32767
cdef unsigned int = 65535
cdef float = 2.000000
cdef double = 2.0000000000
You should use the cdef statement just before specifying the data type in Cython.
Now let’s create 2 loops, in the first loop we will use these data types, in the second loop we will use the Python data types so we can measure performance.
import time
x = 0
y = 0
z = 0.0
t1 = time.time()
for k in range(99999999):
y += 5
x += 2
z = x * y
t2 = time.time()
t = t2-t1
print("%.10f" % t)
%%cython
import time
cdef unsigned int x;
cdef signed int y;
cdef float z;
t1 = time.time()
for i in range(100000000):
y += 5
x += 2
z = x * y
t2 = time.time()
t = t2-t1
print("%.10f" % t)
Python Code: 27.5845s
Cython Code: 4.74465s
Specifying each variable found here will have a great impact on the speed of the code. Let’s also specify the doubles holding the time values and the variables we use in the loop.
%%cython
import time
cdef unsigned int x;
cdef signed int y;
cdef float z;
cdef double t,t1,t2;
cdef int r;
r = 9999999
t1 = time.time()
for i in range(r):
y += 5
x += 2
z = x * y
t2 = time.time()
t = t2-t1
print("%.10f" % t)
# Time: 3.43s
Using C data types can speed up your programs considerably, now arrays, etc. Let’s look at how we can work more efficiently when working with structures.
Using Cython and NumPy Together
We all know that NumPy is the fastest scientific library. You can use NumPy, one of Python’s most powerful tools, with Cython.
You can speed up the NumPy array process with the NumPy library provided by Cython and the static data type ndarray.
Below is the main python code that we will use to speed up the processing of NumPy arrays with Cython.
import numpy as np
import time
arr = np.arange(1,100000000)
total = 1
t1 = time.time()
for k in arr:
total += k
t2 = time.time()
print(t2 - t1)
# Speed: 29.92s
We will make this main code run about 1000 times faster than it runs on Python, now let’s optimize this code step by step.
Use NumPy Array & C Data Types
The cimport numpy statement imports a definition file in Cython named “numpy”. The is done because the Cython “numpy” file has the data types for handling NumPy arrays.
We can further speed up this main code by using data types in C.
%%cython
import numpy as np
cimport numpy
import time
cdef numpy.ndarray arr;
cdef unsigned long total;
cdef int max;
cdef int k;
cdef double t1,t2;
max = 100000000
arr = np.arange(max)
t1 = time.time()
for k in arr:
total += k
t2 = time.time()
print(t2 - t1)
# Speed: 10.02s
It is about 3 times faster, but it is still not enough. We can increase the speed 2 times with a few additional assignments while specifying the array.
NumPy Array with Function
We can save time by assign the value of the elements in the ndarray and the size of the array. We just need to create a function for this.
%%cython
import time
import numpy as np
cimport numpy
ctypedef numpy.int_t type1
cdef int r;
def do_calc(numpy.ndarray[long, ndim=1] arr):
cdef unsigned long total
cdef int k
cdef double t1, t2
t1 = time.time()
for k in arr:
total = total + k
t2 = time.time()
print(t2 - t1)
r = 100000000
arr = np.arange(r, dtype=np.int)
do_calc(arr)
# Pure Python: 29s
# Cython: 9.80s
We finished a small improvement, although there is a slight increase in speed, we will process about 500 times faster with the technique we will apply soon.
Use Indexing, Don’t Use Iterating Over in Cython
The main reason for the slowdown here is the loop, the type of loop we currently use is “For Each” and C doesn’t have it, so it’s interpreted via Python.
We can eliminate the loss of time due to its interpretation in python by running it in C, for this we need to change the loop a little.
cdef int count = arr.shape([0])
for i in range(count):
total += arr[i]
Currently, the For loop is compiled in C, so the loop runs almost 500 times faster. If you don’t know the for each structure, read this article.
Now let’s add this type of loop to the function we created earlier and compare it with the previous version.
%%cython
import time
import numpy as np
cimport numpy
ctypedef numpy.int_t type1
cdef int r;
def do_calc(numpy.ndarray[long, ndim=1] arr):
cdef unsigned long total
cdef int k
cdef double t1, t2
cdef int count = arr.shape[0]
t1 = time.time()
for i in range(count):
total += arr[i]
t2 = time.time()
print(t2 - t1)
r = 100000000
arr = np.arange(r, dtype=np.int)
do_calc(arr)
# Pure Python: 29s
# Cython: 0.04s
A loop that returns 1 billion numbers took 288 seconds in Python, while it took 0.3s in Cython. A serious difference started to appear.
Disable Negative Indexes & Bound Checking
To improve performance, we can disable several features: Index checking and negative indexes. You can process faster if you disable these two features.
# Close Index Checking
@cython.boundscheck(False)
# Close Negative Numbers
@cython.wraparound(False)
If you add these two short commands at the beginning of your function, you can use loops about 100, sometimes 200 times faster.
Do not apply this section if you want to use negative indexing or if you want to protect the string from invalid index numbers.
%%cython
import time
import numpy as np
cimport numpy
cimport cython
ctypedef numpy.int_t type1
cdef int r;
@cython.boundscheck(False)
@cython.wraparound(False)
def do_calc(numpy.ndarray[long, ndim=1] arr):
cdef unsigned long total
cdef int k
cdef double t1, t2
cdef int count = arr.shape[0]
t1 = time.time()
for i in range(count):
total += arr[i]
t2 = time.time()
print(t2 - t1)
r = 100000000
arr = np.arange(r, dtype=np.int)
do_calc(arr)
# Pure Python: 29.01s
# Cython: 0.000001s
We have significantly improved the performance and speed, now I think you will use NumPy arrays with Cython. It is a great advantage to perform a 29-second task at a rate close to 0 seconds.
The performance difference it creates definitely outstrips these two features, so I mostly recommend disabling them if you don’t need them.