Cython Tutorial: Fast & Efficient Python

Python programming language is a very popular language today (in 2021), but there is a fact that no one can deny, Python is inefficient in terms of speed.

However, even though Python is slow, we have solutions to speed it up, and in this article, we will examine one of them, Cython.

Why is Python Slow?

1 – Interpreted Language

The first and biggest reason Python is slow is that it is interpreted rather than compiled, It also does not use JIT (just in time).

Let’s explain JIT, when you start using a code block, only that code block is compiled, other code blocks are interpreted, so unless you use it.

Which is one of the reasons why Java and C# are faster than Python, but this method requires you to wait longer the first time you run the code.

2 – Dynamic Type Language

In languages such as C++, Java, C#, you have to specify the data type of the variables, but in Python, it is handled automatically in the background.

When you change the variable type you created in Python, Python creates a 2nd variable in memory and deletes the 1st variable from memory, This process degrades performance.

The way Python tries to make everything dynamic has a bad effect on performance.

How to Boost Python?

There are several ways to speed up Python, such as using PyPy to add JIT to Python, using Cython to export data structures from C to Python.

Cython translates your Python code into C code, that is, it eliminates the performance disadvantages of an interpreted language by converting it to a compiled language.

You can directly feed Cython with Python code or you can increase the speed of your project with some changes, even if you leave code the same, Cython will speed up your code by 2 or 3 times.

Installing Cython on Jupyter Notebook

Jupyter Notebook, which is one of the environments where you can use Cython comfortably, allows you to access Cython through the browser.

First, let’s install Jupyter Notebook and run it through the terminal. You can add Jupyter notebook extension on Vs Code and open a file with the .ipynb extension.

$ pip install jupyterlab
$ jupyter notebook

When Jupyter Notebook is opened in the browser, when the connection with the server is complete, create your first code block and install Cython.

$ pip install cython

Let’s call Cython with the help of IPython, we can use a magic command for this, then whenever we need it, we can access it by typing its name.

# Call Cython
%load_ext Cython

# For Using
%%cython

In every block you need Cython, %%cython should be required. You don’t need to import it again, just %%cython expression is sufficient.

Install Cython on Local Machine

If you don’t want to use Jupyter Notebook or Google Colab, you can make it ready for Cython with a few installations on your machine.

$ pip install cython

After that, you need a C/C++ compiler like MinGW, you can quickly download it from here and complete its installation from here (these steps are for Windows users).

After finishing the installations, let’s see how to compile a file. First, we need to create a file with the .pyx extension, this is the file where we will write the codes.

def example(x):

  sum = 0

  for i in x:
    sum += i

  return sum

Now to translate this file, we need to create a Python file (with .py extension) and reference this file.

import distutils.core
import Cython.Build
distutils.core.setup(
    ext_modules = Cython.Build.cythonize("file.pyx"))

Finally, we will use this .pyx file to create a C file and a compiled file over the terminal. For this process, use the command below.

$ python setup.py build_ext --inplace

Using C Data Types in Python

We fixed the problems that arise from Python being an interpreted language, now let’s remove the problems that arise from being dynamic.

When creating a variable in C, we specify the data type, so in a variable that will contain integers, we must write an int next to its name.

Specifying data types in advance is recognized by the compiler before the code is compiled, and the program runs faster, not trying to render in memory while the program is running, as in Python.

int = for integers
long int = for long integers

float = for decimal nums
double = for big decimal nums

char = for 1 character
string = for text

bool = for logical value
long float = for long float

long double = for larger doubles

These are general data types. Let’s take a look at the concepts of signed and unsigned, but there are unsigned char, etc., which are used for byte calculation. I will not cover the concepts.

signed int = The given value can be both negative and positive.
unsigned int = given value can only be positive negative values are invalid.

You can also use the concepts of signed and unsigned in data types such as long. By using unsigned, we can store positive numbers that are 2 times larger than signed, also taking up less memory space by too Long data type.

cdef signed int = 32767
cdef unsigned int = 65535

cdef float = 2.000000
cdef double = 2.0000000000

You should use the cdef statement just before specifying the data type in Cython.

Now let’s create 2 loops, in the first loop we will use these data types, in the second loop we will use the Python data types so we can measure performance.

import time

x = 0
y = 0 
z = 0.0

t1 = time.time()

for k in range(99999999):
    y += 5
    x += 2

    z = x * y

t2 = time.time()
t = t2-t1
print("%.10f" % t)

%%cython

import time

cdef unsigned int x;
cdef signed int y;
cdef float z;

t1 = time.time()

for i in range(100000000):
  y += 5
  x += 2

  z = x * y


t2 = time.time()
t = t2-t1

print("%.10f" % t)

Python Code: 27.5845s
Cython Code: 4.74465s

Specifying each variable found here will have a great impact on the speed of the code. Let’s also specify the doubles holding the time values and the variables we use in the loop.

%%cython

import time

cdef unsigned int x;
cdef signed int y;
cdef float z;
cdef double t,t1,t2;
cdef int r;

r = 9999999

t1 = time.time()

for i in range(r):
  y += 5
  x += 2

  z = x * y


t2 = time.time()
t = t2-t1

print("%.10f" % t)

# Time: 3.43s

Using C data types can speed up your programs considerably, now arrays, etc. Let’s look at how we can work more efficiently when working with structures.

Using Cython and NumPy Together

We all know that NumPy is the fastest scientific library. You can use NumPy, one of Python’s most powerful tools, with Cython.

You can speed up the NumPy array process with the NumPy library provided by Cython and the static data type ndarray.

Below is the main python code that we will use to speed up the processing of NumPy arrays with Cython.

import numpy as np
import time 

arr = np.arange(1,100000000)
total = 1

t1 = time.time()
for k in arr:
    total += k
t2 = time.time()

print(t2 - t1)

# Speed: 29.92s

We will make this main code run about 1000 times faster than it runs on Python, now let’s optimize this code step by step.

Use NumPy Array & C Data Types

The cimport numpy statement imports a definition file in Cython named “numpy”. The is done because the Cython “numpy” file has the data types for handling NumPy arrays.

We can further speed up this main code by using data types in C.

%%cython 
import numpy as np
cimport numpy
import time 

cdef numpy.ndarray arr;
cdef unsigned long total;
cdef int max;
cdef int k;
cdef double t1,t2;

max = 100000000
arr = np.arange(max)

t1 = time.time()
for k in arr:
    total += k
t2 = time.time()

print(t2 - t1)

# Speed: 10.02s

It is about 3 times faster, but it is still not enough. We can increase the speed 2 times with a few additional assignments while specifying the array.

NumPy Array with Function

We can save time by assign the value of the elements in the ndarray and the size of the array. We just need to create a function for this.

%%cython

import time
import numpy as np
cimport numpy

ctypedef numpy.int_t type1
cdef int r;

def do_calc(numpy.ndarray[long, ndim=1] arr):

    cdef unsigned long total
    cdef int k
    cdef double t1, t2
    
    t1 = time.time()

    for k in arr:
        total = total + k
    
    t2 = time.time()
    print(t2 - t1)

r = 100000000
arr = np.arange(r, dtype=np.int)
do_calc(arr)

# Pure Python: 29s
# Cython: 9.80s

We finished a small improvement, although there is a slight increase in speed, we will process about 500 times faster with the technique we will apply soon.

Use Indexing, Don’t Use Iterating Over in Cython

The main reason for the slowdown here is the loop, the type of loop we currently use is “For Each” and C doesn’t have it, so it’s interpreted via Python.

We can eliminate the loss of time due to its interpretation in python by running it in C, for this we need to change the loop a little.

cdef int count = arr.shape([0])
for i in range(count):
  total += arr[i]

Currently, the For loop is compiled in C, so the loop runs almost 500 times faster. If you don’t know the for each structure, read this article.

Now let’s add this type of loop to the function we created earlier and compare it with the previous version.

%%cython

import time
import numpy as np
cimport numpy

ctypedef numpy.int_t type1
cdef int r;

def do_calc(numpy.ndarray[long, ndim=1] arr):

    cdef unsigned long total
    cdef int k
    cdef double t1, t2
    cdef int count = arr.shape[0]

    t1 = time.time()

    for i in range(count):
      total += arr[i]
    
    t2 = time.time()
    print(t2 - t1)

r = 100000000
arr = np.arange(r, dtype=np.int)
do_calc(arr)

# Pure Python: 29s
# Cython: 0.04s

A loop that returns 1 billion numbers took 288 seconds in Python, while it took 0.3s in Cython. A serious difference started to appear.

Disable Negative Indexes & Bound Checking

To improve performance, we can disable several features: Index checking and negative indexes. You can process faster if you disable these two features.

# Close Index Checking
@cython.boundscheck(False)

# Close Negative Numbers 
@cython.wraparound(False)

If you add these two short commands at the beginning of your function, you can use loops about 100, sometimes 200 times faster.

Do not apply this section if you want to use negative indexing or if you want to protect the string from invalid index numbers.

%%cython

import time
import numpy as np
cimport numpy
cimport cython

ctypedef numpy.int_t type1
cdef int r;

@cython.boundscheck(False) 
@cython.wraparound(False)

def do_calc(numpy.ndarray[long, ndim=1] arr):

    cdef unsigned long total
    cdef int k
    cdef double t1, t2
    cdef int count = arr.shape[0]

    t1 = time.time()

    for i in range(count):
      total += arr[i]
    
    t2 = time.time()
    print(t2 - t1)

r = 100000000
arr = np.arange(r, dtype=np.int)
do_calc(arr)

# Pure Python: 29.01s
# Cython: 0.000001s

We have significantly improved the performance and speed, now I think you will use NumPy arrays with Cython. It is a great advantage to perform a 29-second task at a rate close to 0 seconds.

The performance difference it creates definitely outstrips these two features, so I mostly recommend disabling them if you don’t need them.