Optimaze you NPL/Lua Code

Basic facts

TIP from Lua GEM book: http://www.lua.org/gems/sample.pdf

Before running any code, Lua translates (precompiles) the source into an internal format. This format is a sequence of instructions for a virtual machine, similar to machine code for a real CPU. This internal format is then interpreted by C code that is essentially a while loop with a large switch inside, one case for each instruction. Perhaps you have already read somewhere that, since version 5.0, Lua uses a register-based virtual machine. The “registers” of this virtual machine do not correspond to real registers in the CPU, because this correspondence would be not portable and quite limited in the number of registers available. Instead, Lua uses a stack (implemented as an array plus some indices) to accommodate its registers. Each active function has an activation record, which is a stack slice wherein the function stores its registers. So, each function has its own registers2. Each function may use up to 250 registers, because each instruction has only 8 bits to refer to a register. Given that large number of registers, the Lua precompiler is able to store all local variables in registers. The result is that access to local variables is very fast in Lua. For instance, if a and b are local variables, a Lua statement like a = a + b generates one single instruction: ADD 0 0 1 (assuming that a and b are in registers 0 and 1, respectively). For comparison, if both a and b were globals, the code for that addition would be like this:

GETGLOBAL 0 0 ; a
GETGLOBAL 1 1 ; b
ADD 0 0 1
SETGLOBAL 0 0 ; a

So, it is easy to justify one of the most important rules to improve the performance of Lua programs: use locals! If you need to squeeze performance out of your program, there are several places where you can use locals besides the obvious ones. For instance, if you call a function within a long loop, you can assign the function to a local variable. For instance, the code

for i = 1, 1000000 do
  local x = math.sin(i)
end

runs 30% slower than this one:

local sin = math.sin
for i = 1, 1000000 do
  local x = sin(i)
end

Access to external locals (that is, variables that are local to an enclosing function) is not as fast as access to local variables, but it is still faster than access to globals. Consider the next fragment:

function foo (x)
  for i = 1, 1000000 do
    x = x + math.sin(i)
  end
  return x
end
print(foo(10))
We can optimize it by declaring sin once, outside function foo:
local sin = math.sin
function foo (x)
 for i = 1, 1000000 do
   x = x + sin(i)
 end
 return x
end
print(foo(10))
This second code runs 30% faster than the original one.

Although the Lua compiler is quite efficient when compared with compilers for other languages, compilation is a heavy task. So, you should avoid compiling code in your program (e.g., function loadstring) whenever possible. Unless you must run code that is really dynamic, such as code entered by an end user, you seldom need to compile dynamic code. As an example, consider the next code, which creates a table with functions to return constant values from 1 to 100000:

local lim = 10000
local a = {}
for i = 1, lim do
  a[i] = loadstring(string.format("return %d", i))
end
print(a[10]()) --> 10
This code runs in 1.4 seconds. With closures, we avoid the dynamic compilation. The next code creates the same 100000 functions in 1 10 of the time (0.14 seconds):
function fk (k)
 return function () return k end
end

local lim = 100000
local a = {}
for i = 1, lim do a[i] = fk(i) end
print(a[10]()) --> 10


This topic: Main > NPL > DeveloperDoc > NPLCodeOptimization
Topic revision: r1 - 2009-02-20 - LiXizhi
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback