Lua jit tests faster than Julia for Stock Prediction Engine

Lua jit tests faster than Julia for Stock Prediction Engine

I started testing Julia as a possible alternative because Julia advocates claimed the interpreter loop was nearly as fast a C and it was similar in concept to Python which I love but which was too slow for our application.   I recently ran across a blog entry mentioning a new Lua Jit. I found it intriguing because Lua did quite well during our last round of tests.

Performance comparison Julia versus Lua Jit

Relative Execution Time. Lua Jit as baseline – lower is better  
Operation Lua52 LuaJit Julia Julia Typed Array
Parse File into Data Frame 2.42 1.0 5.64 5.64
Compute SMA(14) 2.81 1.0 6.87 0.70
Compute SMA(600) 33.32 1.0 80.00 1.30
Compute SMA_slice(14) 2.42 1.0 11.87 1.83
Compute SMA_slice(600) 33.32 1.0 15.52 5.90
Did not implement slice in Lua so re-used the timing from nested loop version.
Response times are in seconds.

Only 1 tested Julia operation was faster than Lua JIT

The only function where Julia out performed Lua Jit was in the SMA(14) all other items tested were slower.   I think the reason it did better in this instance is that the SMA function must allocate a new array with 71K rows to store the results. In Julia you can do this as a typed Array of float.   In Lua this is done as an append to list so it is allocating memory in little pieces. In the SMA(600) the Lua jit was faster again because it is doing more work compute in a tight loop relative to the memory allocation overhead.

Is this a mistake? It is not what I expected?

The results of these tests are not what I expected. I thought that since the Julia type system and it’s array allocation allowed it to allocate typed arrays as contiguous blocks that the allocation should be faster and the statistical loops should have been able to run very fast. Also since Julia had real numeric indexes where the lua array indexes smell like hash  indexes I thought they would be slower since they should be more complex than iterate and advance a pointer and I didn’t see how the Lua Jit could overcome that point of additional complexity.

Since the answers are not what I expected I am posting this with the hope somebody else will see something I missed that will materially affect the results.

At some point you always have to stop thinking this seems fast and actually measure results. I am a big believer in testing a language in a way that actually models the types of problems we will be solving and using production scale data to do the testing.   I borrowed some code from a prior article “Why language performance Matters”   for the stock prediction domain.   Two of the things we do thousands of times per day is parse CSV files,   manipulate the columns by computing a wide variety of sliding window indicators and either use those new columns to drive our stock prediction engine or write the results back to disk to use in a future stage.   This code attempts to mimic in micro this process.

For this test I was using a 1 minute Bar file with 71,879 rows. It can be obtained for free using the historical download function built into Metatrader.   The CSV file table is of the following form but I had to manually add the CSV header string on line #1.

date,time,open,close,high,low,volume
2014.09.23,03:51,1.10430,1.10438,1.10426,1.10435,55
2014.09.23,03:52,1.10434,1.10438,1.10431,1.10438,28
2014.09.23,03:53,1.10439,1.10450,1.10436,1.10450,31
2014.09.23,03:54,1.10450,1.10450,1.10428,1.10439,34
2014.09.23,03:55,1.10437,1.10450,1.10427,1.10432,38
2014.09.23,03:56,1.10433,1.10451,1.10433,1.10450,28
2014.09.23,03:57,1.10449,1.10450,1.10441,1.10441,16
2014.09.23,03:58,1.10441,1.10445,1.10439,1.10440,28
2014.09.23,03:59,1.10439,1.10441,1.10438,1.10441,10
2014.09.23,04:00,1.10440,1.10451,1.10440,1.10451,24
2014.09.23,04:01,1.10451,1.10451,1.10451,1.10451,1

For those Julia experts out there I expected some benefit when forcing the explicit conversion from the DataArray we get out of the DataTable but the amount of improvement was greater than expected. I think we should either fix the problems in DataArray or provide convenience methods to convert to typed arrays. I hate to convert to typed arrays because I want to be able to easily add new values to the ends of the rows which become more cumbersome if forced to convert out of the DataArray format.    Also note how much slower the Array slice function runs than explicitly unrolling the function into the inner loop was 4.5 times slower. The use of the functional style slices reduced performance in Julia by enough to make it slower than lua jit.

Run Times Measured in Seconds. (Lower is better)
Operation Lua52 LuaJit Julia Julia Typed Array
Parse File into Data Frame 0.756 0.312 1.786 1.786
Compute SMA(14) 0.045 0.016 0.11 0.0113
Compute SMA(600) 1.033 0.031 2.48 0.0403
Compute SMA_slice(14) 0.045 0.016 0.19 0.0292
Compute SMA_slice(600) 1.033 0.031 0.45 0.1829
Version used for the test Lua 5.2.1 LuaJIT 2.0.3 0.3.2+2

 

Relative Execution Time. Lua52 as baseline – lower is better
Operation Lua52 LuaJit Julia Julia Typed Array
Parse File into Data Frame 1.0 41% 233% 233%
Compute SMA(14) 1.0 36% 244% 25%
Compute SMA(600) 1.0 3% 240% 4%
Compute SMA_slice(14) 1.0 36% 422% 65%
Compute SMA_slice(600) 1.0 3% 43% 18%
Did not implement slice in Lua so re-used the timing from nested loop version.

 

  • All timings are average of best 8 out of 10.
  • Timings did not include startup time only execution of the main algorithms steps.   Julia starts many times slower than lua or lua jit which is almost enough to make me want to switch just to save debugging time.
  • Test ran on windows, Lenvo T420s,  8GB Ram 2.33Ghtz I7 with Samsung SSD.

 

Lua Code

-- Provides a CSV parse similar to the generic table parse
-- in R and julia where each column winds up as a vector
-- of columns so we can easily apply statistical functions
-- to the columns.
require "os"
function split(aStr, sub)
  local out = {}
local i = 0
local pat = "[^" .. sub .. "]+"
for val in string.gmatch(aStr, pat) do
out[i] = val
i = i + 1
end
return out
end

function trim (s)
  return (string.gsub(s, "^%s*(.-)%s*$", "%1"))
end

function curr_sec()
  return os.clock()
end

function print_arr(arr)
  local k, v
  for k,v in pairs(arr) do print(" " .. k .. " = " .. v) end
end


-- NOTE: To Be valid as a comparison to DataFrame parser must create
--   an array of columns that can be accessed by Name and
--   has been parsed into a numeric type when it is numeric.
--   columns must be accessible as single vectors to pass into
--   stats functions
--
-- Reads a CSV File as set of columns. If the column
-- can be treated as numeric then all values will be parsed
-- as numeric.   Columns can be indexed as out["names"]["close"]
-- where close is the column names or accessed as out["cols"][3]
-- to obtain the column by position in the file.

function read_csv(fiName)
  local file = assert(io.open(fiName, "r"))
  local tout = {}
  local headline = file:read("*l")
  local headarr = split(headline, ",")
  -- Build structure to contain our output
  -- columns.
  local cols = {}
  local names = {}
  tout["names"] = names
  tout["columns"] = cols

  for fldNum,fldName in pairs(headarr) do
    fldName = trim(fldName)    
    avector = {}
    -- since objects are by reference both the
    -- the cols and names return pointer to same
    -- array.
    cols[fldNum] = avector
    names[fldName] = avector
  end

  local i = 0;
  for line in file:lines() do
  local arr = split(line, ",")
  for fldNum,fldName in pairs(headarr) do
    acol = cols[fldNum]
    fldval = arr[fldNum]
    nval = tonumber(fldval)
    if nval == nil then
      acol[i] = fldval
    else
     acol[i] = nval
    end
  end -- for each field
  i = i + 1
  end -- for each line
  file:close()
  --print("num rec=" .. i)
  return tout
end -- func()
 

-- Note: There are faster ways to compute SMA but this
-- is closest to the method shown in basic tutorials
-- and is a valid test of a tight loop that spends a
-- a lot of time indexing into an array.
function sma(arr, numPer)
  local tout = {}
  adjdiv = numPer + 1
  for ndx,tval in pairs(arr) do 
    local begndx = math.max(0, ndx - numPer)
    local tsum = 0.0
    for slicendx = begndx, ndx, 1 do
      tsum = tsum +  arr[slicendx]
    end
    if ndx > adjdiv then
      tout[ndx] = tsum / adjdiv
    else
     tout[ndx] = tsum / ndx
    end
  end  
  return tout
end
 
beg_time = curr_sec()
freemem = collectgarbage('count')
print("megabytes in use=" .. freemem/1024)
dtbl = read_csv("2014.M1.csv")
end_time = curr_sec()
elap = end_time - beg_time

print("elap sec time=" .. elap)
--print("num records=" .. # tout)
freemem = collectgarbage('count')
print("megabytes in use=" .. freemem/1024)
 

print("Column Names")
names = dtbl["names"]
print(names)
for k,v in pairs(names) do
  print(" " .. k )
end

beg_time = curr_sec()
closeVect = names["close"]
sma14 = sma(closeVect, 14)
end_time = curr_sec()
elap = end_time - beg_time
freemem = collectgarbage('count')
print("sma(14) elap=" .. elap .. " num sma=" .. # sma14 .. " megabytes in use=" .. freemem/1024)

beg_time = curr_sec()
sma600 = sma(closeVect, 600)
end_time = curr_sec()
elap = end_time - beg_time
freemem = collectgarbage('count')
print("sma(600) elap=" .. elap .. " num sma=" .. # sma600 .. " megabytes in use=" .. freemem/1024)

Julia Code

# Julia function to load CSV and compute
# a couple of SMA. written to be identical
# to lua version.
using DataFrames

# Note: There are faster ways to compute SMA but this
# is closest to the method shown in basic tutorials
# and is a valid test of a tight loop that spends a
# a lot of time indexing into an array.   We show it
# as a nested loop instead of slice
function sma(avect, numPer)
  print(length(avect), " numper=", numPer)
  numEle = length(avect)
  tout = Array(Float32, numEle)
  numdiv = numPer + 1
  for ndx = 1:numEle
    tsum = 0.0
    begndx = max(1, ndx - numPer)
    for slicendx = begndx:ndx
      tsum += avect[slicendx]
    end
    if ndx < numdiv
      tout[ndx] = tsum / float32(ndx)
    else
      tout[ndx] = tsum / float32(numdiv)
    end
  end
  return tout
end

## Showing the more common Julia version of
## slicing out what we need and applying built in
## operator mean instead of manual sub loop.
function sma_slice(avect, numPer)
  #print(length(avect), " numper=", numPer)
  numEle = length(avect)
  tout = Array(Float32, numEle)
  for ndx = 1:numEle
    begndx = max(1, ndx - numPer)
    tout[ndx] = mean(avect[begndx:ndx])
  end
  return tout
end
 
function runTest()
  print ("read table")
  tic()
  dta = readtable("2014.M1.csv")
  toc()
  println("finished read table")
  closeVect = dta[:close]
  println("compute sma(14)")
  tic()
  sma14 = sma(closeVect, 14)
  toc()
  println("finished sma(14)")

  println("compute sma (600)")
  tic()
  sma600 = sma(closeVect, 600)
  toc()

  println("finished sma(600)")
  closeVect = dta[:close]

  println("compute sma_slice(14)")
  tic()
  sma14 = sma_slice(closeVect, 14)
  toc()
  println("finished sma_slice(14)")

  println("compute sma_slice(600)")
  tic()
  sma600 = sma_slice(closeVect, 600)
  toc()
  println("finished sma_slice(600)")
  println("\n\n Convert to Typed Array\n\n")
  tic()
  tlen = length(closeVect)
  tvect = Array(Float32, tlen)
  for ndx = 1:tlen
    tvect[ndx] = closeVect[ndx]
  end
  toc()
  println("vector convertion complete")
  println("compute sma(14)")
  tic()
  sma14 = sma(tvect, 14)
  toc()

  println("finished sma(14)")
  println("compute sma (600)")
  tic()

  sma600 = sma(tvect, 600)
  toc()
  println("finished sma(600)")

  closeVect = dta[:close]
  println("compute sma_slice(14)")
  tic()
  sma14 = sma_slice(tvect, 14)
  toc()

  println("finished sma_slice(14)")
  println("compute sma_slice(600)")
  tic()
  sma600 = sma_slice(tvect, 600)
  toc()
  println("finished sma_slice(600)")
end
runTest()


Lua with ffi array of Double

 Coming Soon.  We actually tested part of this with a sum of a array of doubles.  The ffi version was faster but not by as much as I had expected.   This was on a LuaJit User group thread.  Now I just need to track down that email.

Contact us

5 Replies to “Lua jit tests faster than Julia for Stock Prediction Engine”

  1. Great comparison.

    You should be able to shave some milliseconds in the CSV parsing by using Lpeg[1] with something like this:

    local field = lpeg.C((1 – lpeg.S’,\n”‘)^0)
    local record = lpeg.Ct(field * (‘,’ * field)^0) * (lpeg.P’\n’ + -1)

    A quick test on my computer takes luajit CSV parsing from 0,26 to 0,21

    [1] http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html

  2. Since you asked. All the timings were in seconds. I adjusted the workload to make the time run long enough to get consistent readings from the timer. Then I ran them several times and averaged the results. I feel this is superior to worrying about more precising timing because there is enough variability in preemptive multitasking systems that that very fine grained timers don’t deliver predictably accurate results anyway. Looking at the results I think it would be a better investment to use a larger source file to increase the workload even more. I will switch over to the finer resolution timers some of you have recommended but do not expect it to not change the fundamental conclusion and it is unlikely to change the results enough to matter.

Leave a Reply