Use Rcpp For More Performance In R

We know that R is inherently slow, and one of its biggest rivals, Python, looks better than R in terms of performance.

In this article, we will see how we can write higher-performance code with Rcpp in R programming.

Why R is Slow?

There are performance limitations from design and implementation, but this is not the only obstacle.

Most R users do not have any formal training in programming and software development, so they have trouble optimizing their code.

Performance problems are inevitable in unoptimized code or poorly optimized projects.

How Much Can Rcpp Speed Up Your Project?

With just a little C++ knowledge, you can speed up your code with Rcpp up to x7 in some cases x20. This will be very useful in areas where speed is required.

Because of this performance difference, many CRAN packages take advantage of the power of Rcpp or are written entirely on Rcpp.

Where Rcpp can be Useful for you

  • Loop operations where the next iterations depend on the previous iterations
  • Access to each element of a vector/matrix/array etc.
  • Repetitive function calls in loops
  • Powering machine learning and statistical computing

Installing and Preparing Rcpp

Rcpp installation may seem confusing at first, but in this section, I will write it step by step in simplest form.

1 – Install Radian

Radian is a console for R, it is an open-source software on GitHub, you can install it with the code below.

pip install radian

When the installation is complete, just open the console and type radian, and a terminal will open where you can use your R commands.

2 – Install R Tools

R tools offer tools like MinGW to compile your C++ code, it’s only available for Windows, if you’re not using it, use the commands below to install the compiler.

sudo apt install gcc build-essential
xcode-select --install

After the compiler installation is finished, we can start installing Rcpp, Windows users should add R tools to PATH. For more information read this article.

3 – Install Rcpp on R

In this section, we will install the Rcpp library via Radian and integrate it into the project.

r$> install.packages("Rcpp")
r$> library(Rcpp)

High Performance Functions via Rcpp

In this section, we will see how to write high-performance functions with Rcpp and the data structures we use frequently.

1 – C++ Functions

C++ functions are similar to R functions, just a few changes, and rules exist, you can find them listed below.

  • “;” at the end of every statement you finish in C++ You must add it or you will get an error.
  • You have specified the data type of the function you created, including the data type of the parameters.
  • Functions with data type other than void functions must return values (in the data type of the function).
  • When creating a function, assignment is not made, after the function is created you can assign your function to a variable.
  • You should use “=” instead of “<-“, otherwise you will get a syntax error.

Now, with the help of this information, let’s create our first C++ function.

int add(int x , int y)
{
  return x + y;
}

If you want to run this function in C++, you must also create a main function. If you want to run the function on R, we will see this in the other topic.

int add(int x , int y)
{
  return x + y;
}

int main()
{
  x = add(2,4);
}

Below you can find the data types we use frequently, so you can use them after you decide what value your functions will return.

char = single character
int = integer
float = single precision decimal number
double = double precision decimal number
void = valueless quantit

SourceCpp & cppFunction

We can compile the C++ function in the R file with Rcpp, for this we use cppFunction, we just need to add the whole function in the cppFunction parameter.

# Import Library
library("Rcpp")

# Add Function
cppFunction("

  int add(int x , int y)
  {
    return x + y
  }

")

When cppFunction is finished, we can use the function as if it was prepared in R, without doing any operation on R.

r$> add(12 , 2)

While cppFunction is suitable for small examples or functions, you will often prefer to work in a separate C++ file for your large projects.

We use sourceCpp provided by Rcpp to compile the C++ file. Compiled functions can be used within the R file.

// Import Rcpp
#include <Rcpp.h>

// Getting rid of Rcpp
using namespace Rcpp;

// Export your function
// [[Rcpp::export]]

// Function 
int add(int x, int y)
{
  int z = x + y;
  return z;
}

Let’s summarize each section in the code so that people unfamiliar with C++ can easily understand it.

  • Include works the same as Library() in R. It is used to add libraries to C++.
  • There are abstract containers in C++. Instead of constantly writing the Rcpp namespace where we can store functions, we integrate them directly into the project.
  • We export the function under Rcpp::export to R file, you should use it before each function.

Now let’s compile this C++ file in R and use the function inside it on R. SourceCpp will be used in this section.

# Import Rcpp
library(Rcpp)

# Compile File
sourceCpp("File.cpp")

x <- add(12 , 8)
print(x)

Vectors , Matrices, Lists, DataFrames & Loops

In this section we will see iterable data types, after Matrices, Vectors, Arrays, and Lists, we will see Data Frames in the other section.

We will also see Loops used in C++ to access these iterable data structures. Thus, we will spend less time when we want to retrieve data from iterable data structures.

Vectors & Loops

Below you can find the Vector types available in Rcpp. We will create Vectors with these keywords.

NumericVector = Real
IntegerVector = Integer
LogicalVector = Logical
ComplexVector = Complex
StringVector = String

Although these vector variants are not specified in R, you must specify them in Rcpp, and you cannot add any other type of data other than the vector you’re specified.

// Empty Vectors
NumericVector v1 (10);
StringVector v2 (5);
IntegerVector v3 (20);

// Filled Vectors
NumericVector v1 {-1,2,3};
StringVector v2 {"a","b"};
IntegerVector v3 {1,2,3}

Of course, one of the most important features of Vectors is that we can access the data they store, in this part we will look at how I can access the data via Rcpp.

for(datatype name; condition; name++)
{
  // Body
}

You can decide for yourself whether to increment or not. If you are familiar with Java, C++ loops are exactly the same as Java.

// Using Rcpp Library
#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
NumericVector show()
{
  NumericVector v1 (10);

  for(int i = 0; i < 10; i++)
  {
     v1[i] = i;
  }

  return v1;
}
library(Rcpp)

sourceCpp("test.cpp")

print(show())
[1] 0 1 2 3 4 5 6 7 8 9

Congratulations, now you can integrate all Vector operations you do in R on Rcpp, there are some vector functions that C++ offers us to improve productivity more.

  • length() = Returns the number of elements of this vector
  • names() = Returns the element names of this vector
  • sort() = Returns a vector that sorts vector objects in ascending order.
  • push_back(x) = Append a scalar value x to the end of this vector.
  • push_front(x) = Append a scalar value x to the front of this vector.
  • erase(i) = Erase element at the position pointed by numerical index i. And returns the iterator pointing to the element just behind the erased element.
  • erase(first,last) = Erase elements from the position pointed by numerical index first to last.
  • get_na() = Returns the NA value of this Vector class.
  • is_na(x) = Returns True if detected Na
Matrices & Loops

Matrices are very similar to Vectors, almost the same when creating, you just have to specify nrow and ncol.

// 3x3 Matrix
NumericMatrix m1(3);

// 2x5 Matrix
NumericMatrix m2(2,5);

// 2x5 Matrix with 10 data
NumericMatrix m2(2,5,10);

We can convert vectors to the matrix or we can make Vectors 2D and use them like matrices without changing their type.

// Using Rcpp Library
#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
NumericMatrix show()
{
   NumericVector v {1,2,3,4};
   NumericMatrix v1 (2,2,v.begin());

   return v1;
}

The begin() function returns from the beginning of a vector. After determining the number of columns and rows, you can fill this matrix by adding a vector to the 3’rd parameter.

// Using Rcpp Library
#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
NumericVector show()
{
   NumericVector v1 {1,2,3,4};
   v1.attr("dim") = Dimension(2, 2);

   return v1;
}

With this method, you can create a matrix by making your vectors 2D, but the data type will still remain vector.

// Using Rcpp Library
#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
int show()
{
   NumericVector v = {1,2,3,4};
   NumericMatrix m (2,2,v.begin());

   int x = m(0 , 1);
   return x;
}
Output: [1] 3

While indexing, you must enter the horizontal and vertical index of the data, different from the linear arrayed vector.

// Using Rcpp Library
#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
NumericVector show()
{
 NumericVector v1 (100);

 for(int i = 0; i < 100; i++)
 {
   v1[i] = i;
 }

 NumericMatrix m (5,20,v1.begin());
 NumericVector m2 = m(_,0);
   

 return m2;
}
[1] 0 1 2 3 4 

You can get all data in a column or row using “_”. Now let’s look at the functions you can use with matrices.

  • nrow() = Returns the number of rows.
  • ncol() = Returns the number of columns.
  • rownames(m) = Get and set the row name of matrix m.
  • colnames(m) = Get and set the column name of matrix m.
Lists & Loops

To create a List object we use the List::create() function. Also, to specify the element name when creating List, use Named() function or _[].

List l1 = List::create(v1, v2);
List l2 = List::create(Named("name") = v1, _["name"] = v2);

You can access the data in the lists by the index number or the name if you named it. You can fill it with a for loop, or you can write all the data.

NumericVector v1 = l1[0];
NumericVector v2 = l2["name"];

All functions in the list are yes, all functions are the same as a vector, so there are no new function naming and rules.

DataFrames & Loops

Creating a DataFrame is very similar to creating a List, just call the create function from the DataFrame namespace.

DataFrame df = DataFrame::create(v1, v2);
DataFrame df = DataFrame::create(_["name"] = v1 , Named("name") = v2);

The value of the original Vector is not duplicated in the DataFrame::create() columns, but the columns are “references” to the original Vector.

So if you change the value of the referenced vector, the data in the column will also change.

If you want to avoid this (we usually do) you should use the clone() function. You can use the “clone” function to duplicate the value of the Vector element when creating a DataFrame column.

// Using Rcpp Library
#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
DataFrame show()
{
  NumericVector v = {1,2,3,4};
  DataFrame df = DataFrame::create(Named("V1") = v,
                                   Named("V2") = v);
    
  v = v * 2;
  return df;
}
  V1   V2
1  2    2
2  4    4 
3  6    6
4  8    8

As you can see, every change in the vector is reflected in the DataFrame. To prevent this, you must enclose the vector in the clone function before creating the column.

NumericVector v1 = df[0];
NumericVector v2 = df["V2"];

You can access the columns on the DataFrame this way, if you want to access the row you will need to use NumericVector indexing.

  • size() = Returns the number of columns.
  • fill(x) = fills all the columns of this DataFrame with Vector x.
  • push_back(x) = Append a vector x to the end of this DataFrame.
  • push_front(x) = Append a vector x to the front of this DataFrame.
  • erase(i) = Delete the i’th column of this DataFrame and return an iterator to the column just after erased column.
  • erase(first,last) = Erase column from the position pointed by numerical index first to last.

Using R Functions on Rcpp

You can use familiar R functions on Rcpp and this will greatly increase your productivity (without sacrificing speed).

// Using Rcpp Library
#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
NumericVector nums(int x)
{
    // calling seq()
    Function f("seq");   

    return f(0 , x);
}
library(Rcpp)

sourceCpp("test.cpp")

print(nums(10))
[1] 0 1 2 3 4 5

NA , NAN , NULL in Rcpp

Empty values, missing values are an integral part of data science. You can manage null values, missing data on Rcpp.

Inf = R_PosInf
-Inf = R_NegInf
NaN = R_NaN

To express the value of Inf -Inf NaN in Rcpp, use the symbol R_PosInf R_NegInf R_NaN.

NA_REAL for NumericVector
NA_INTEGER for IntegerVector
NA_LOGICAL for LogicalVector
NA_STRING for CharacterVector

On the other hand, for NA, different NA symbols are defined for each Vector type. You can see these symbols above.

Detecting Null Values

You can use is_na() , is_nan() , is_infinite() functions to check all values like NA NaN Inf -Inf. It is very similar to R and faster.

// Using Rcpp Library
#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
LogicalVector f()
{
  NumericVector v {1,NA_REAL,R_NaN,4,5};
  LogicalVector l1 = is_na(v);
  

  return l1;
}

// Output: FALSE  TRUE  TRUE FALSE FALSE
// Using Rcpp Library
#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
LogicalVector f()
{
  NumericVector v {1,R_NaN,R_NaN,4,5};
  LogicalVector l1 = is_nan(v);
  
  return l1;
}

// Output: FALSE  TRUE  TRUE FALSE FALSE
// Using Rcpp Library
#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
LogicalVector f()
{
  NumericVector v {1,R_NaN,R_NaN,4,5};
  LogicalVector l1 = is_infinite(v);
  
  return l1;
}

// Output: FALSE  TRUE  TRUE FALSE FALSE

By performing operations on vectors, you can delete or update null values. You are now ready to speed up your R project with Rcpp!

Leave a Reply

Your email address will not be published. Required fields are marked *