Introduction to luajr

luajr allows you to run Lua code from R.

Lua is a lightweight, simple, and fast scripting language that is used in a variety of settings. The standard Lua interpreter is already reasonably fast, but there is also a just-in-time compiler for Lua called LuaJIT that is even faster. luajr uses LuaJIT.

This is not a guide to Lua or LuaJIT; it is a quick-start guide to luajr for people who already know how to program in Lua. See the Lua web site for resources related to coding in Lua.

Running Lua code: lua() and lua_shell()

library(luajr)

To get a feel for luajr or to run “one-off” Lua code from your R project, use lua() and lua_shell().

When you pass a character string to lua(), it is run as Lua code:

lua("return 'Hello ' .. 'world!'")
#> [1] "Hello world!"

Assignments to global variables will persist between calls to lua():

lua("my_animal = 'walrus'")
lua("return my_animal")
#> [1] "walrus"

This is because luajr maintains a “default Lua state” which holds all global variables. This default Lua state is opened the first time a package function is used. You can create your own, separate Lua states, or reset the default Lua state (see Lua states, below).

Assignments to local variables will not persist between calls to lua():

lua("local my_animal = 'donkey'")
lua("return my_animal")
#> [1] "walrus"

In this case, the second line returns "walrus" because the local variable my_animal goes out of scope after the first call to lua() ends, so the second call to lua() is referring back to the global variable my_animal from before.

You can include more than one statement in the code run by lua():

lua("local my_veg = 'potato'; local my_dish = my_veg .. ' pie'; return my_dish")
#> [1] "potato pie"

You can also use the filename argument to lua() to load and run a Lua source file, instead of running the contents of a string.

Call lua_shell() to open an interactive Lua shell at the R prompt. This can can be helpful for debugging or for testing Lua statements.

Calling Lua functions from R: lua_func()

The key piece of functionality for luajr is probably lua_func(). This allows you to call Lua functions from R.

The first argument to lua_func(), func, is a string that should evaluate to a Lua function. lua_func() then returns an R function that can be used to call that Lua function from R. For example, you can use lua_func() to access an existing Lua function from R:

luatype = lua_func("type")
luatype(TRUE)
#> [1] "boolean"

Here, "type" is just referring to the built-in Lua function type which returns a string describing the Lua type of the value passed to it. You can also use lua_func() to refer to a previously defined function in the default Lua state:

lua("function squared(x) return x^2 end")
lua("return squared(4)")
#> [1] 16

sq = lua_func("squared")
sq(8)
#> [1] 64

Or you can use lua_func() to define an anonymous Lua function:

timestwo = lua_func("function(x) return x*2 end")
timestwo(123)
#> [1] 246

Under the hood, lua_func() just takes its first parameter (a string), adds "return " to the front of it, executes it as Lua code, and registers the result as the function.

The second argument to lua_func(), argcode, is also very important. argcode determines how the arguments passed to the function from R are translated into Lua values for use inside the function.

The permissible arg codes are:

  • 's': simplest Lua type (the default)
  • 'a': array type
  • '1': same as 's', but require that the argument has length 1
  • '9': same as 's', but require that the argument has length 9
  • 'v': pass by value
  • 'r': pass by reference

The kinds of R values that can be passed to Lua functions, and their behaviour under different arg codes, is summarized in the following table:

R type Example R value arg code ‘s’ ‘a’ ‘1’ ‘2’ ‘v’ ‘r’
NULL NULL nil nil nil nil nil nil
logical(1) TRUE true {true} true error luajr.logical({true}) luajr.logical_r({true})
integer(1) 1L 1 {1} 1 error luajr.integer({1}) luajr.integer_r({1})
numeric(1) 3.14159 3.14159 {3.14159} 3.14149 error luajr.numeric({3.14159}) luajr.numeric_r({3.14159})
character(1) "howdy" "howdy" {"howdy"} "howdy" error luajr.character({"howdy"}) luajr.character_r({"howdy"})
logical(nn) c(TRUE, FALSE) {true, false} {true, false} error {true, false} luajr.logical({true, false}) luajr.logical_r({true, false})
integer(nn) 1:2 {1, 2} {1.0, 2.0} error {1.0, 2.0} luajr.integer({1, 2}) luajr.integer_r({1, 2})
numeric(nn) exp(0:1) {1, 2.71828...} {1, 2.71828...} error {1, 2.71828...} luajr.numeric({1, 2.71828...}) luajr.numeric_r({1, 2.71828...})
character(nn) letters[1:2] {"a", "b"} {"a", "b"} error {"a", "b"} luajr.character({"a", "b"}) luajr.character_r({"a", "b"})
list(...) list(1, b='b') {[1]=1, b='b'} error error error x = luajr.list(); x[1] = luajr.numeric({1});
x.b = luajr.character({'b'}); return x
x = luajr.list(); x[1] = luajr.numeric_r({1});
x.b = luajr.character_r({'b'}); return x
external pointer lua_open() userdata:… userdata:… userdata:… userdata:… userdata:… userdata:…

Above, nn stands for an integer that is greater than 1; in the examples, it stands specifically for 2.

There should be one character in argcode for every argument of the function, but the string is “recycled” when there are more arguments passed than characters in the argcode string. So, for example, just passing "s" as argcode means all parameters will be passed as the simplest Lua type, while if argcode is "sr", then the first argument has arg code "s", the second argument has arg code "r", the third argument has arg code "s", etc.

When a vector (logical vector, integer vector, numeric vector, or character vector) is passed from R to Lua by reference, modifications made to the elements of that passed-in vector persist back in the R calling frame. For example:

values = c(1.0, 2.0, 3.0)
keep = lua_func("function(x) x[1] = 999 end", "v") # passed by value
keep(values)
print(values)
#> [1] 1 2 3

change = lua_func("function(x) x[1] = 999 end", "r") # passed by reference
change(values)
print(values)
#> [1] 999   2   3

Vectors can never be resized by reference; only their already-existing elements can be changed by reference.

Lists are always passed by value, but their vector elements can be either passed by value or by reference depending on the arg code. In order to make changes to a list, it needs to be returned to the calling function. For example:

x = list(1)

f1 = lua_func("function(x) x[1][1] = 999; x.a = 42; end", "v")
f1(x)
print(x)
#> [[1]]
#> [1] 1

f2 = lua_func("function(x) x[1][1] = 999; x.a = 42; end", "r")
f2(x)
print(x)
#> [[1]]
#> [1] 999

Note that in the examples above, we modify x[1][1], not just x[1]. That is because, even when lists are passed by reference, we need to modify specific elements of their vector elements if we want to make changes by reference. Setting x[1] = 999 would only reassign an element of the list x, and lists cannot be passed by reference itself, so this would have no effect on the R object passed in. By contrast, setting x[1][1] = 999 modifies the passed-in vector x[1], which can be passed in by reference, so this does change the R object that is passed in.

To modify a list, we have to return its new value from the function:

x = list(1)

f3 = lua_func("function(x) x[1][1] = 999; x.a = 42; return x; end", "v")
x = f3(x)
print(x)
#> [[1]]
#> [1] 999
#> 
#> $a
#> [1] 42

f4 = lua_func("function(x) x[1] = luajr.numeric({888, 999}); return x; end", "v")
x = f4(x)
print(x)
#> [[1]]
#> [1] 888 999
#> 
#> $a
#> [1] 42

Here, because we are returning a modified value, we can add elements to the list (f3) and change whole entries of the list (f4).

Benchmarking

In the same way that using R packages cpp11 or Rcpp allows you to bridge your R code with C++ code that runs faster than R, you can use luajr to bridge your R code with Lua code that runs faster than R.

In general, C/C++ code runs about 5-1,000 times faster than the equivalent R code.

In my experience, luajr code often presents a similar speedup, about the same as C/C++ code, or in the worst case, maybe half as fast. Sometimes luajr code can be faster than C/C++, though usually it isn’t quite as good.

Why use luajr then? Rcpp and cpp11 require a C++ toolchain (e.g. gcc, clang, etc.) and requires long compilation times, whereas luajr doesn’t. This means that luajr is usable when a C++ compiler isn’t available, or when compilation times are prohibitive or an annoyance.

In the following section, we look at two aspects of benchmarking. In the first example, we compare different ways of passing vectors to Lua functions relative to R. In the second example, we compare more fundamentally the difference in running a whole algorithm in Lua versus R.

Parameter passing example: sum of squares

For reasonably long vectors, passing by reference ('r') is faster than passing by value ('v'), which is faster than passing by simplify ('s', 'a', or '1''9'). But for relatively short vectors, passing by simplify can avoid some overhead, and for very simple operations on very short vectors, it might not be worth using Lua at all.

To illustrate the above points, here is some code that takes a numeric vector and calculates the sum of squares, comparing a pure R function with three alternatives written in Lua, respectively passing the vector by reference, by value, and by simplify.

v1 = rnorm(1e1)
v4 = rnorm(1e4)
v7 = rnorm(1e7)

lua("sum2 = function(x) local s = 0; for i=1,#x do s = s + x[i]*x[i] end; return s end")
sum2 = function(x) sum(x*x)
sum2_r = lua_func("sum2", "r")
sum2_v = lua_func("sum2", "v")
sum2_s = lua_func("sum2", "s")

# Comparing the results of each function:
sum2(v1)    # Pure R version
#> [1] 8.496783
sum2_r(v1)  # luajr pass-by-reference
#> [1] 8.496783
sum2_v(v1)  # luajr pass-by-value
#> [1] 8.496783
sum2_s(v1)  # luajr pass-by-simplify
#> [1] 8.496783

The time taken to sum v1, v4 and v7, depending upon the function kind, on a 2019-era MacBook Pro is summarised in this table:

Call Number of summands Method Median runtime
sum2(v1) 10 Pure R 1 µs
sum2_r(v1) 10 luajr pass by reference 6 µs
sum2_v(v1) 10 luajr pass by value 7 µs
sum2_s(v1) 10 luajr pass by simplify 4 µs
sum2(v4) 10,000 Pure R 63 µs
sum2_r(v4) 10,000 luajr pass by reference 18 µs
sum2_v(v4) 10,000 luajr pass by value 93 µs
sum2_s(v4) 10,000 luajr pass by simplify 78 µs
sum2(v7) 10,000,000 Pure R 49,000 µs
sum2_r(v7) 10,000,000 luajr pass by reference 13,000 µs
sum2_v(v7) 10,000,000 luajr pass by value 96,000 µs
sum2_s(v7) 10,000,000 luajr pass by simplify 124,000 µs

When the vector is 10 elements long, the R version wins handily.

When the vector is 10,000 elements long, the pass-by-reference Lua version is fastest, but the other methods are all comparable.

When the vector is 10,000,000 elements long, the story is similar.

This is an example where luajr doesn’t add that much speed—the function is relatively short, and there is a certain amount of overhead in invoking it, while the R function doesn’t have the overhead of transferring control between languages.

Processing time example: logistic map

Consider the following:

logistic_map_R = function(x0, burn, iter, A)
{
    result_x = numeric(length(A) * iter)
    
    j = 1
    for (a in A) {
        x = x0
        for (i in 1:burn) { 
            x = a * x * (1 - x)
        }
        for (i in 1:iter) { 
            result_x[j] = x
            x = a * x * (1 - x)
            j = j + 1
        }
    }
    
    return (list2DF(list(a = rep(A, each = iter), x = result_x)))
}

logistic_map_L = lua_func(
"function(x0, burn, iter, A)
    local dflen = #A * iter
    local result = luajr.dataframe()
    result.a = luajr.numeric_r(dflen, 0)
    result.x = luajr.numeric_r(dflen, 0)
    
    local j = 1
    for k,a in pairs(A) do
        local x = x0
        for i = 1, burn do
            x = a * x * (1 - x)
        end
        for i = 1, iter do
            result.a[j] = a
            result.x[j] = x
            x = a * x * (1 - x)
            j = j + 1
        end
    end
    
    return result
end", "sssr")

Here we are comparing two different versions (R versus Lua) of running a parameter sweep of the logistic map, a chaotic dynamical system popularized by Bob May in a 1976 Nature article. The output looks like this:

logistic_map = logistic_map_L(0.5, 50, 100, 200:385/100)
plot(logistic_map$a, logistic_map$x, pch = ".")

The times taken by each function on a 2019-era MacBook Pro are as follows:

Call Method Median runtime
logistic_map_R(0.5, 50, 100, 200:385/100)) R function 1900 µs
logistic_map_L(0.5, 50, 100, 200:385/100)) Lua function 200 µs

The version written in Lua is around 10 times faster than the version in R.

The speedup was much more notable in an earlier test where the R version first created the data frame and then performed the iteration, i.e. with the line result$x[j] = x instead of result_x[j] = x. The median runtime for that R version was two orders of magnitude slower; the extra overhead associated with data.frame methods was pointed out by Tim Taylor.

Working with Lua States: lua_open(), lua_reset()

All the functions mentioned above (lua(), lua_shell(), and lua_func()) can also take an argument L that specifies a particular Lua state that the function operates in.

When L = NULL (the default) the functions operate on the default Lua state.

But you can also open alternative Lua states using lua_open(), and then by passing the result as the parameter L, specify that the function operates in that specific state. For example:

L1 = lua_open()
lua("a = 2")
lua("a = 4", L = L1)
lua("return a")
#> [1] 2
lua("return a", L = L1)
#> [1] 4

There is no lua_close in luajr because Lua states are closed automatically when they are garbage collected in R.

lua_reset() resets the default Lua state:

lua("a = 2")
lua("return a")
#> [1] 2
lua_reset()
lua("return a")
#> NULL

To reset a non-default Lua state L returned by lua_open(), just do L = lua_open() again. The memory previously used by L will be cleaned up at the next garbage collection.

Further reading

For notes on the luajr Lua module – which contains functions and types for interacting with R from Lua code – see vignette("luajr-module").