LuaJIT Access 20 Gig or More of Memory – Consulting, Machine Learning, Predictive Engines

Access 20 Gig or more from LuaJIT while coding in native Lua and minimizing GC speed penalties.

I started using LuaJIT© after first using F#, Python, Julia and C for stock and Forex related predictive work. I am always on the lookout for a language that is high speed as close as I can get to C without having to write in low level C all the time.

Lua is a language that feels somewhat like a cross between BASIC and Ruby and has been around for a long time. Lua may embedded or used stand-alone. It has been embedded into many games, entertainment consoles and other devices as a scripting language. The LuaJIT is a new compiler technology and takes what was already fast as an interpreted language and in some of our tests made it run over 20X faster with a few tests reaching 80X faster.

LuaJIT seemed like the ideal combination since it provided a language any ruby or python programmer would find readable with fast start-up times, excellent run-time speeds and good error messages.

Almost C Speed but a Memory limit.

I was quite simply amazed at the speed of LuaJIT because it out performed some of our well optimized Java and F# code. There is a gotcha and it is a big one. Within a couple weeks of work I ran into a problem of running out of memory which seemed odd when I could do the same task in Python, F# and Java with no problem. The problem occurred when I had lots of memory available on the machine.

After some google searches I found there are some design limits which prevent the LuaJIT garbage collector and memory allocator from accessing more than 2 Gig of RAM. 1.8 Gig is about about where we encountered the problems. In addition LuaJIT wasn’t real smart so instead of stopping and collecting what it could when an allocation was going to fail is simply aborted which can be a real pain for a analysis ran 20 minutes into a job.

Mike Pall, the driving force behind LuaJIT is working on a version 3.0 of the LuaJit garbage collector but I needed a solution now or I had no choice but to jump out to either C or possibly GO or fallback to F#.

I had already had F# code that ran faster than Java. I wanted to escape the F# .net on windows environment and F# on mono is very slow. I don’t like where Microsoft was taking F# or lack of command line tools in isolation of visual studio. The worst factor was that F# programmers wanted to use Scala style functional techniques which took F# from faster than Java to slower than Python and retraining violated their world view. (Similar to O.O. purists in the late 90’s). I still feel there is a productivity advantage in high level languages especially when exploring ideas that may morph many times before you find the perfect or at least good enough design. LuaJIT seemed to deliver almost the programming conciseness of Python and F# but with speed and cross platform portability to boot.

Escape the LuaJIT GC limit

After first discovering the memory limits of LuaJIT and reading the blogs I was thinking I had made a mistake porting the code to Lua. After interacting with the LuaJIT users group one of their users sent me some code which showed how to Store data outside the space managed by the garbage collector but still access the objects almost as if they were native lua objects.

It turns out the approach he proposed is an elegant way to escape GC hell and has some strategic benefits that are not immediately apparent. These strategic advantages could make LuaJit an ideal platform for major big data projects. I think it could be better than some common tools because it makes the GC available when needed but allows you to use hand optimized C style structures where it makes sense. This kind of escape could make it competitive even for major projects like Elastic Search which regularly battle the Java Garbage collector.

So far I have been able to access up to 10 Gig on a 64 bit Windows 8.1 machine. If anybody had a chance to try it on a Linux box with 32 or more gig please let me know how far you were able to go.

LuaJit made it simple to gain the best of GC and manually managed worlds

The complete sample code below demonstrates creating gigabytes worth of double arrays encoded identically to how they would be in C but accessing them from Lua. Even better; the generated assembler to iterate the array seems very similar to the assembler produced in C.

Download Source as Text file: ffi_non_gc_double_array_v2.lua

--Declaring a C style structure that makes casting easy
local DArr = ffi.metatype("struct{uint32_t size; double* a;}")

ptr = ffi.C.malloc(size_in_bytes) -- allocating memory C 
tobj = DArr( adj_len,  ptr) -- create a wrapper and cast into double
  -- After this I can access tobj.a as a normal array

ffi.C.free(tobj.a) -- return the memory to C
tobj.a = nil - makes sure we don't try to re-use
tobj = nill - all cleaned up.

-- could have skipped the structure definition and directly cast the result to an pointer to doubles but I wanted the size included as part of core structure.

Using the wrapper is trivial easy

The following example initializes an array of 600K floats, using it and deleting . As you can see it is about as easy as it could possibly get. This is enabled by a small wrapper written using FFI which accessed portions of C libraries from inside of Lua. FFI allowed me to call underlying C malloc and free directly from lua and cast the results to my desired object type very easily.

local tmparr = double_array(600000, "my test data arr")

The code to free it up again is:

tmparr.done()

The destructor is safe so if you try to .done() again is simply returns false.

The code to iterate is native Lua and performs incredibly well.

function avg_arr(tarr, start_ndx, end_ndx)
  local tsum = 0.0
  local num_ele = (end_ndx - start_ndx) + 1
  for x = start_ndx, end_ndx do
   tsum = tsum + tarr[x]
  end
  local avg = tsum / num_ele
  return avg
end

One of the features I don't like about Lua is that if you do not specifically declare something as a local it becomes a global. This can be nuisance when debugging but there are some tools to help find those issues.

The Strategic value of Lua FFI for external memory management

The ability to choose when to use the GC and when to step outside the bounds of what is managed by the GC is a huge benefit. The ability to access both sets of objects with native Lua at high speed is huge bonus. Let me explain way in the form of a machine learning ML use case.

In machine learning we compute what math geeks call features but you can think of them as a column of numbers such as the SMA(30) which is a moving average across 30 rows of data. These are computed from columns of numbers such as open price, close price, age, weight, etc. Some practitioners call the source data features and computed data facets but the line get’s blurry. We then combine these arrays in a wide variety of ways using algorithms like SVM, Decision trees, KNN and Bayes.

For 1 minute Forex data a fairly small data set is 250,000 rows with 8 based attributes and generally between 10 and a few hundred derived features. This is a lot of data but you can squeeze it into a 4 gig memory space if very careful. If we have 50 computed features using 64 bit floats then our total memory space is 250,000 rows * (8 base features + 50 computed features) * 8 bytes each = 116 Megabytes. Unfortunately no run-time system is perfectly efficient and this is before we build the model based these features or use it to predict the future. Our medium data sets are easily 20 times this size. Needless to say 2 Gig and even 4 Gig of ram doesn’t stretch all that far.

Once we load the source data and compute the features we use them over and over again to compute intermediate results. Once we have them computed the most critical aspect is very high access speed as we apply them.

Some genetic algorithms will combine features or change the weights of features millions of times and then either rebuild or re-apply the model. The bulk of the run time is generally spent in applying the model in the process of optimization (a form of learning). Some algorithms such as random forests take a lot of processing time to recursively build their models as well. Even when using trees or random forests choosing the features used in each trees of forest is a optimization trick that requires consuming the underlying features thousands of times as you rebuild and test different models.

I found it easiest when loading the data to use native Lua tables which grow automatically as needed but are limited in size by the 2 gig failure mentioned above. We don’t always know how many rows of data we will be receiving or retaining since we reject some as noise. Once each column (feature) is computed the only thing that changes is we add new numbers to the end which we can easily accommodate by reserving some extra spaces for new data points. You have to keep the source columns because they need to be referenced as historical data when computing the feature values new data rows.

In this instance we can use the native Lua with the full GC functionality to simplify our load / build loop and then once we have a column ready we allocate a external C array and copy the data over to it. This moves that column of data outside the purview of the GC must consider and keeps the space managed by the GC smaller which makes the GC faster.

When I stressed the LuaJIT GC so it had 1.5 Gig of data even when there were relatively few objects to clean up the full GC run took between 0.25 and 0.38 seconds. When I had only a few megabytes of data managed by the JIT GC it took 0.001 seconds so there is a huge performance advantage of moving long persistent data outside the purview of the GC. I think this will remain true even with the new and improved GC.

It would be a viable design decision to populate the feature data directly in the external array but that requires growing and or copying the data if we need a larger one. This would incur some overhead but it is keeps most of the pressure off the LuaJIT GC but would increase risk of fragmenting the C heap. The LuaJIT seems to be good at tail addition with minimal overhead so I have been building in a native tables and then copying over to a external array when complete. If I run short of memory in the native Lua memory space, I may have to change this approach. I will also move the derived ML models to external data structures.

Java Comparison

If I had this option available in Java I would have used when optimizing enterprise scale applications. In java something similar could be done using the JNI interface but this can be a time consuming experience. The normal solution is to allocate a large block of memory in java and re-use. The core concept is similar to how I am using these external arrays. The problem is that Java still has a GC and the GC still gets slower as you add more memory even when they are long persistent objects. This can degrade performance to the point where major Java projects like Elastic search been forced to establish best practices that recommend no more than 32 Gig of memory. I have still seen GC pauses take Elastic search to it’s knees. The .net Garbage collector is pretty good but it still has similar issues even if they manifest under different conditions.

The LuaJit + FFI approach is superior

I think the external memory concept supported by the lua JIT plus FFI is a superior approach when compared to a fully managed GC environment. I could used it to build the field caches, bit vectors and query caches in elastic search using external memory very similar to how I plan to ship features to memory outside the GC. In Java I would also use a rotating set of external character buffers and serialize my results search results directly into them which would take a burden off the GC.

I think Java could benefit from adding support for the unmanaged arrays as transparent as what Lua JIT + ffi supplies today. If they did, I would be very surprised if Elastic search couldn’t triple their load capacity on the same piece of hardware.

Be nice to the GC (garbage collector)

Once everything is ready predictions are fairly worthless unless they arrive fast enough to apply to the business problem at hand. In the trading arena a few seconds can be a lifetime. As such unplanned GC pauses are highly undesirable. One benefit of the Lua JIT ffi external arrays is that I can ship almost everything I am using outside the responsibility of the GC it gives the GC less work to do and means any pauses are likely to be shorter. If this isn’t good enough I can go back through the code and only use locals that are re-used and if that still isn’t good enough I can port the tight loops where needed to C so it is updating one static array with the results of the computation so all the Lua GC has to do is receive the computed results.

Summary

It is till too early to tell if the Lua JIT is really ready for prime time but it is showing a lot of promise. I can tell you that I have been able to test sophisticated ideas in lua jit as fast as I could have written them in Python, F# or node.js with better net performance and good readability better than F#. I really like the ability to ship the feature and model data outside the GC. I can also confirm that the same code ran in Lua without the jit is too slow to be viable.

I think Lua Jit has a viable chance of displacing Java in strategic big data ML projects but only if some of the items described in “Notes for the Lua Jit community” are implemented.

Lua Source to manage objects outside the Lua GC space

download source

-- Demonstrates a way to use Lua to mange C Memory outside
-- that considered by the GC.  This allows nearly normal Lua 
-- code to manage larger amounts of memory  bypassing the GC.
-- Intended to make it feasible to use the speed of lua while 
-- bypassing the downside normally encountered  with GC cleanup. 
-- You have to manually call .done() for the objects to free up 
-- their memory but that is a pretty low overhead to gain access 
-- more of the memory in a 64 bit machine.

local ffi = require("ffi") 
local size_double = ffi.sizeof("double")
local size_char    = ffi.sizeof("char")
local size_float    = ffi.sizeof("float")
ffi.cdef"void* malloc (size_t size);"
ffi.cdef"void free (void* ptr);"
local chunk_size  = 16384 
   -- want big enough chunks that we don't fragment the C 
   -- memory manager
  
-- define a structure which contains a size and array of 
-- doubles where we dynamically allocate the array using 
-- malloc() Do it this way just in case we want to write C
-- code that needs the size.
local DArr = ffi.metatype(
  --               size,               array
  "struct{uint32_t size; double* a;}",
  -- add some methods to our array
  { __index = {
    done = function(self) 
      if self.size == 0  then
        return false
      else
        ffi.C.free(self.a)
        self.a = nil
        self.size = 0
        return true
      end
    end,
    -- copy data element into our externally managed array from the
    -- supplied src array.   Start copying src[beg_ndx], stop copying
    -- at src[end_ndx],  copy into our array starting at self.a[dest_offset]
    copy_in = function(self,  src, beg_ndx, end_ndx, dest_offset)
      -- Can not use mem_cpy because the source is likely
      -- a native lua array.
      print ("self=", self,  " beg_ndx=", beg_ndx, 
         " end_ndx=", end_ndx, "dest_offset=", dest_offset)
      local mydata = self.a
      local dest_ndx  = dest_offset
      for src_ndx = beg_ndx, end_ndx do
         mydata[dest_ndx] = src[src_ndx]
         dest_ndx = dest_ndx + 1
      end
    end,
    -- copy data elements out of our externally managed array to another
    -- array.  Start copying at self.a[beg_ndx] ,  stop copying at self.a[end_ndx]
    -- place elements in dest starting at dest[dest_offset] and working up.
    copy_out = function(self, dest, beg_ndx, end_ndx, dest_offset)
      -- Can not can use mem_cpy because the dest is likely
      -- a native lua array.
      local mydata = self.a
      local dest_ndx  = dest_offset
      for ndx = beg_ndx, end_ndx do
        dest[dest_ndx] = mydata[ndx]
        dest_ndx = dest_ndx + 1
      end
    end,  
      -- return true if I still have a valid data pointer.
    -- return false if I have already ben destroyed.
    is_valid = function(self)
      print("is_valid() size=", self.size, " self.a=", 
        self.a, " self=", self)
      return self.size ~= 0 and  self.a ~= nil
    end,
    
    fill = function(self, anum, start_ndx, end_ndx)
      if end_ndx == nil then
        end_ndx = self.size
      end
      if start_ndx == nil then
        start_ndx = 0
      end 
      local mydata = self.a
      for ndx = 1, end_ndx do
        mydata[ndx] = anum
      end
    end,  -- func fill
    },

    __gc = function(self) 
         self:done()
    end
  }
)  -- end Darr()

-------------------
 --- Constructor for DArr
 ------------------
function double_array(leng)
   -- allocate the actual dynamic buffer.
   local size_in_bytes = (leng + 1) * size_double
   local adj_bytes = (math.floor(size_in_bytes / chunk_size) + 1) * chunk_size
   local adj_len  = math.floor(adj_bytes / size_double)
   
   local ptr = ffi.C.malloc(adj_bytes)
   if ptr == nil then
     return nil
   end
   return DArr( adj_len,  ptr)
end


function avg_arr(tarr, start_ndx, end_ndx)
  local tsum = 0.0
  local num_ele = (end_ndx - start_ndx) + 1
  for x = start_ndx, end_ndx do
    tsum = tsum + tarr[x]
    --print ("tarr[x]=", tarr[x])
  end
  local avg = tsum / num_ele
  return avg
end

Test Function for DArr Lib

function use_up_memory(targetMeg)
   tout ={}
   while  collectgarbage('count') / 1024 <  targetMeg do
      tx = {}
      tout[#tout+1] = tx
      for i = 1, 100000 do
        tx[i] = i + 1.2
      end
   end
   return tout
end

function basic_test()
  -- uncomment call to waste space to see how lua gC interacts
  -- with the external malloc()
  local  waste_space = use_up_memory(100) 
     -- Change this to 1500 to run lua close to limit on it's internal GC. 
     -- Change to 1000 to use up 1 gig which will cause some of the malloc() to fail.
  local num_ele = 75000
  local tmparr = double_array(num_ele)
  -- Note:  Each 75000 array should occupy roughly  600K of RAM.
  --  plus the overhead for our label, size, counter and containing table.
 
  nla  =  {}
  --put something interesting into our native lua object
  local ptr = tmparr.a
  for x = 1, num_ele do
    nla[x] = x
  end
  nlaavg = avg_arr(nla, 1, num_ele)
  print ("nlaavg = ", nlaavg)

   -- put something interesting into our external object
  local ptr = tmparr.a
  for x = 1, num_ele do
    ptr[x] = x
  end
  darravg = avg_arr(ptr,1, num_ele) 
  print("darravg=", darravg)
 
  assert(nlaavg == darravg, 
    "Expected average of nla and dla to be identical and they are not")
 
  -- Demonstrate copying a portion of the lua array into the external buffer
  -- copies elements 100 to 250 into external array into positions
  -- 320 .. 470.
  tmparr:copy_in(nla, 100, 250,  320)
  print ("ptr[320]=", ptr[320])
  assert(ptr[320] == nla[100],   
     "expected ptr[320] to contain" .. nla[100] 
     .. " after the copy_in but received " .. ptr[320])
 
 
  -- Demonstrate copying a portion back to native lua array
  -- copies element 74,950 to 75,000 into dest nla positions
  -- 1  to 50.
  local sbeg = num_ele - 50
  local send = num_ele
  tmparr:copy_out(nla, sbeg,  num_ele, 1)
  print("nla[1] = ", nla[1])
  assert(nla[1] ==  tmparr.a[sbeg], "expected nla[1] to contain ",
           tmparr.a[sbeg], "  after copy_out but got ",nla[1] )
 
 
  print ("pre destroy is_valid=", tmparr:is_valid())
  assert(tmparr:is_valid() == true, 
    "tmparr.is_valid() should be true before the delete")
 
  local dres = tmparr:done() -- destroy and relase our memory
  assert(dres == true,  "First destroy failed")
  print "sucessful destroy"
  print ("post destroy is_valid=",  tmparr:is_valid())
  assert(tmparr:is_valid() == false, 
    "tmparr.is_valid() should be false after done()")
 
  print "try to do second destroy which should not work"
  --- see what happens when we destroy it a second time
  local qres = tmparr:done()
  assert(qres == false, "Second delete should have failed")
  print "second destroy failed as planned"
 
  -- Lets see how much memory we are using when we start.
  print("using ",  collectgarbage('count') / 1024,
    " meg before GC")
  collectgarbage()
  print("using ",  collectgarbage('count') / 1024, 
    " meg after GC") 
  print("Lua GC will not show the externally allocated buffers")
  -- TODO:  Figure out FFI Call to get total process memory from Windows.
 
 -- for num_pass = 1,50 do
    -- Now try to create a enough of the external arrays that it would normally
    -- crash Lua.
    local taget_num = 3000000000 -- 3.0 gig
    local array_size_bytes  = num_ele * size_double 
    local num_arr_to_create = math.floor(taget_num / array_size_bytes) + 1
    print ("attempting to create ", num_arr_to_create,
            " arrays each ", array_size_bytes, 
            " bytes in size")
    local tholder = {}
    for andx = 1, num_arr_to_create do        
      local da =  double_array(num_ele)
      if da == nil then
        local mba = (andx * array_size_bytes) / 1000000
        local lua_mem = collectgarbage('count') / 1024
        local lua_mem = math.floor(lua_mem * 100) / 100
        local tot_meg = mba + lua_mem
        print ("failed to create andx=", andx,  
                " mb attempt=", mba,  
                " total Meg Used=", tot_meg)
      else
        --print ("create andx=", andx, " da=", da, " da.a=", da.a)
        da:fill(andx)
        tholder[andx] = da  
      end
    end
    print "finished create"
    print("using ",  collectgarbage('count') / 1024, 
       " meg before GC")
    collectgarbage()
    print("using ",  collectgarbage('count') / 1024, 
           " meg after GC") 

  
    print "dbl check access every element dbl avg"
    for andx = 1, num_arr_to_create do
      local da = tholder[andx]
      if da ~= nil then
        local darravg = avg_arr(da.a,1, num_ele)
        --print("andx=", andx, " da=", da, "da.a=", 
        --         da.a, " darravg = " , darravg)
        local calcavg = andx 
           -- we know we filled each array with it's index 
           -- value so that is what the average should be.
        local roundavg = math.floor(darravg * 1000) / 1000 
            -- have to round because of accumulated floating 
            -- point error
        assert(roundavg == calcavg,  
           " avg failed expected " .. calcavg ..  " got " 
           .. roundavg .. " ndx=" .. andx)
      end
    end
    print "Finished access check"
    print ("using ",  collectgarbage('count') / 1024, " meg before GC")
    collectgarbage()
    print ("using ",  collectgarbage('count') / 1024, " meg after GC")

  
    print "Start destroying our external arrays"
    for andx = 1, num_arr_to_create do
      local da = tholder[andx]
      if da ~= nil then
        --print ("destroy andx=", andx, " da=", da, " da.a=", da.a)  
        local deleteok = da:done()
        assert(deleteok == true, 
           " delete failed array #" .. andx)
      end
    end
    tholder = nil
    print "Finished destroy"
    print ("using ",  collectgarbage('count') / 1024, 
      " meg before GC")
    collectgarbage()
    print ("using ",  collectgarbage('count') / 1024, 
       " meg after GC")
    print (" We expect the GC to reclaim some here because"
           .. " we free up the space in our container array")
 -- end
end -- func


------------------------
---- MAIN ----------
------------------------
if arg[0] == "ffi_non_gc_double_array_v2.lua"  then
   basic_test()
end

Caveats

For this process to access memory past 4 Gig it must be built in 64 bit mode. You can tell if it is running in 64bit or 32 bit mode by starting a copy of LuaJit. If it shows a 32 next to it in the task manager then you are going to be limited to either 2 gig or 4 gig total process space depending on the compiler or OS. I never did get a sucessful 64 bit build using Cygwin and the native mingw32-make. But after considerable frustration I was able to get a 64 bit build using msvcbuild.

Instructions for getting 64 Bit LuaJit to build on Visual Studio 2012 Express Desktop on 64 Bit Windows 8.1

The command line SDK was failing to install on my machine and I had visual studio 2012 express web previously installed but it didn’t include the C compiler so I also installed the visual studio 2012 express desktop.

Microsoft took away the setenv command for 64 bit on windows 8.1. But they replaced it with a built in in command called “VS2012 X64 Cross Tools Command Prompt”

Unfortunately Microsoft broke their own tool so it came up with the error “\Microsoft was unexpected at this time” See: http://www.blinnov.com/en/2010/06/04/microsoft-was-unexpected-at-this-time/

They also got the path wrong for the location of a couple dependencies so I added these to the front of the default system path.

After some splunking I did the following:

Removed ” surrounding any visual studio component on the Path statement.
Added the following to the front of the system PATH using Control Panel – > System Properties -> Advanced -> Environment variables
C:\VisualStudio11\VC\bin\x86_amd64;C:\VisualStudio11\VC;C:\VisualStudio11\Common7\IDE;

Once I did this the I restarted the VS2012 X64 Cross Tools Command Prompt and ran msvcbuild.bat and it worked great and the new version passed the 4 Gig boundary just fine.

Note: On my machine visual studio is installed in c:\VisualStudio11 if yours is installed elsewhere you will have to modify the path accordingly.

Side Note: The build I did for 64 bit vista on a I7 didn’t work when copied directly to 64 bit windows 8 running on a I7. This is a little worrisome because every other program I copied between these machines ran without problem. There is an underlying brittleness that will need to be resolved before lujit can hit mainstream.

Notes for the Lua Jit community

I think the community needs to work improve the standard build so you get a version with most batteries including sockets and to provide full build support on every major Windows and Linux platform that includes true 64 bit capabilities.

We need libraries that extend what I have shown here so we could move some of these external arrays into CUDA memory on GPU cards and it would be ideal if we could ship the basic compiled code down to the CUDA code from Lua at the same time.

My GPU cards have 2 Gig built in so care will have to be used to design our models to fit into the CUDA memory if we want to maximize speed advantages from the GPU’s in the CUDA cards. .

Perhaps most important, They need to improve their outreach and start recruiting more engineer / users out of the big data and analytic space. This means time invested in examples that can be directly applied without becoming FFI and 64 bit build experts. It only took a couple hours to write a pretty good CSV parser similar to the R data frame but that really should be part of the base package. It must be part of the base package if they want to attract the Big data projects that have large budgets. One thing Guido did right with python is that very early in the project he produced small samples that showed real life use of every python function. Lua in general and the Lua jit community in particular needs to copy some of his best practices.

I understand and agree with the tight lua jit build philosophy due to it’s roots as an embedded environment. I almost think there are two discrete audiences that have different requirements. I suspect the big data audience will spend more money faster so perhaps they need a second build focused there. It seems like the tight focused produce is a subset of the larger product that could be handled with two build scripts without violating the original philosophy.

The LuaJIT developers need some organization that allows them to collect money to spend on future enhancements so they can keep more of their senior contributors focused on the important technical improvements.

It is kind of scary when the main driver is distracted on unrelated projects. I understand the need for day to day income but this feels risky. It is probably one of the greater risks of adopting the platform. I would rather see him well compensated but dependent on the sucess and wide spread adoption of the lua jit.

If some of these things happen then I think LuaJIT has a viable chance of displacing Java in strategic big data ML projects.

Disclaimer

I am a expert engineer and distributed architect but have only been using Lua for about a month. There are very likely ways to dramatically improve this code. Please leave a comment or send me private note about your suggested enhancements. The best way to learn any language is from other experts.

The prior version didn’t use the same meta programming style. I am still not sure if one is really better than the other but i think the meta style is a little cleaner. Download prior version.
It is possible to divide the data set and allow set of processes to work on different pieces this is particularity effective when computing expensive features. It can overcome the 2 gig limit simply by shaping the data to stay under that limit. The downside it that it requires design effort to move the data between processes and merge the results. This process is made easier by Hadoop. I love Hadoop but if you can do the same job with less code, less complex logic, less hardware and equally fast then Hadoop is not always the right answer. Needless to say a 20 core Hadoop cluster costs more than a 4 core server so there is always a ROI tradeoff.

5 Replies to “LuaJIT Access 20 Gig or More of Memory”

joejoint says:

2014-12-14 at 13:36

It only took a couple hours to write a pretty good CSV parser similar to the R data frame but that really should be part of the base package.

There are many packages out there, check out luarocks to discover them.

Also take a look at Torch7 project . It’s focused on ML with CUDA support ( it also comes with a csv package. )

Log in to Reply
1. Joseph Ellsworth says:
  
  2014-12-15 at 09:03
  
  I will take a look at Torch7 but what do you want to bet that it takes more than 30 minutes to figure out the build that will work cleanly with Luajit. I would love to be wrong. You missed the point on the CSV the basic parser was only part of it but I needed the full load, morph, save functionality built in to the R data tables. There may even be one available that does everything I needed but I looked at a several and they only seemed like 1/2 of what I needed. The key is I shouldn’t need to go searching or struggle with integrating incompatible builds cleanly with LuaJIT. I can normally get things like this working but the fact of the matter is that with R and Julia that functionality was pre-integrated into base so I didn’t have to waste time trying to get some random 3rd party package to build cleanly to integrate with LuaJIT. I do not want to spend my very limited hours doing build activities that are not really a value add for either me or my clients. If you want to attract people focused in the ML big data space then you will have to raise the bar because I have more tolerance for this kind of work that most engineers working in that space.
  
  Case in point every programming language I have used in the last 5 years except Lua came with functions for directory_exist and create_directory. Yes I found a filesystem module but when I built it from the github download it had errors. Yes I can figure out but why should I have to when other programming languages included it? Yeah I saw the script to do it with a shell but that is so hokey and would perform poorly. I also saw the script reading directories like files but that is not cross platform. Even Node.js has it built in. Checking and creating directories is kind of common for a tool language. Another case in point I still don’t have a working socket module working with luaJIT on windows after sinking 3 hours into that build just gave up but the key is I should not have needed to worry about building something as common as sockets. It should just be there part of the core build.
  
  Yes I understand the point about being a lean language but there is a point of being too lean. Read my comments in the article about two audiences. In this instance I am running on a 64 bit system with a minimum of 16 Gig of ram which is small for Big data projects so just give us a build that has the LuaJIT enabled and has all the common libraries built and ready to load. I found two separate batteries included Lua packages neither of them integrated LuaJIT so your lower effort strategy may be to work with them to fully integrate LuaJit.
  
  I hate to admit it but the Julia approach for this seems better. Most of the packages I wanted in Julia just downloaded and installed themselves with a single command and I only had one out of 30 fail. The Node.js package system also works pretty cleanly.
Joseph Ellsworth says:

2014-12-14 at 00:34

Another user sent me a comment that ultimately guided me to doing a 64 bit build with the Microsoft tools. Once I got this to work it used up 10 gig of memory on a 16 Gig system just like I wanted. I had to jump through some hoops to get the build to work on a Windows 8.1 machine using Visual studio 2012 but it was worth the effort. I documented my solution in the Caveats section of article above. The other user claims to have allocated all their physical RAM but didn’t say how much RAM they had available.

Log in to Reply
altexy says:

2014-12-13 at 07:07

Please, use proper software name while discussing it.

From luajit.org:
LuaJIT is a Just-In-Time Compiler (JIT) for the Lua programming language. Lua is a powerful, dynamic and light-weight programming language. It may be embedded or used as a general-purpose, stand-alone language.

LuaJIT is Copyright © 2005-2014 Mike Pall, released under the MIT open source license.

Log in to Reply
1. Joseph Ellsworth says:
  
  2014-12-13 at 08:16
  
  I added copyright attribution and changed some of the lua jit to LuaJit. How I describe the product is my prerogative as the author. I like the description you sent over but it didn’t fit where I was introducing the topic. I left some of the “lua jit” in for SEO.
  
  Now how about something even more helpful like the answer for how to modify what I have to manage more than 4 Gig on 64 bit systems.