Delve deeper into Lua and compilation

In the last article there was a bit of scattered information about how lua compiles, a bit about the opcodes, etc. One of the basic questions that a developer might ask is how does it all work, the best part is that it is all available for free. It is all in a couple of API's that are not commonly made available or know about.

Lua code

Let us look at this sample code

function a() print("Hello") end

this function can be called/invoked by declaring this in your app and then by simply using a().

If you have worked a bit with lua, you are aware that you can also define/declare this (which lua does automatically internally) as

a = function() print("Hello") end

Now, if we were to see what this would look to the compiler, this is what it will look like
; x86 standard (32-bit, little endian, doubles)

; function [0] definition (level 1)
; 0 upvalues, 0 params, 2 stacks
.function  0 0 2 2
.const  "a"                 ; 0

; function [0] definition (level 2)
; 0 upvalues, 0 params, 2 stacks
.function  0 0 0 2
.const  "print"             ; 0
.const  "Hello"             ; 1
[1] getglobal  0   0        ; print
[2] loadk      1   1        ; "Hello"
[3] call       0   2   1  
[4] return     0   1      
; end of function

[1] closure    0   0        ; 0 upvalues
[2] setglobal  0   0        ; a
[3] return     0   1      
; end of function

Now, without really typing out the opcodes or hex dumping the source generated if we were to compile this, how about generating the same from lua itself? Can this be done? yes it can.

The API

There are a couple of API's in lua that are sandboxed by certain frameworks as they allow for unlimited power. These two API's are loadstring and dofile, how this works is that you can load lua instructions in plain text and pass them to these API commands and voila, dynamic code. Now this is what Apple might consider downloading and executing code and thereby reject your app.

A lesser known fact is that these can also take compiled files, files that are in bytecode not plain text and execute them. The reason for this is that the lua interpreter actually converts every thing into compiled code and executes it.

Prove it

Yep, that's the point. It does not matter which lua framework you use, as long as it is lua 5.1 this code will run perfectly fine

--filename : text.lua

local l = string.dump(function() print("hello") end)
local i
for i=1,#l do
 print(i,string.format("%x",string.byte(l,i)))
end

though this code looks simple, when you run it, it dumps the compiled bytecode for the function a() ... similar to the assembler code that is displayed above.

The disassembler shall display the following
0000                     ** global header start **
0000  1B4C7561           header signature: "\27Lua"
0004  51                 version (major:minor hex digits)
0005  00                 format (0=official)
0006  01                 endianness (1=little endian)
0007  04                 size of int (bytes)
0008  04                 size of size_t (bytes)
0009  04                 size of Instruction (bytes)
000A  08                 size of number (bytes)
000B  00                 integral (1=integral)
                         * number type: double
                         * x86 standard (32-bit, little endian, doubles)
                         ** global header end **
                         
000C                     ** function [0] definition (level 1)
                         ** start of function **
000C  0A000000           string size (10)
0010  40746573742E6C75+  "@test.lu"
0018  6100               "a\0"
                         source name: @test.lua
001A  03000000           line defined (3)
001E  03000000           last line defined (3)
0022  00                 nups (0)
0023  00                 numparams (0)
0024  00                 is_vararg (0)
0025  02                 maxstacksize (2)
                         * code:
0026  04000000           sizecode (4)
002A  05000000           [1] getglobal  0   0        ; print
002E  41400000           [2] loadk      1   1        ; "hello"
0032  1C400001           [3] call       0   2   1  
0036  1E008000           [4] return     0   1      
                         * constants:
003A  02000000           sizek (2)
003E  04                 const type 4
003F  06000000           string size (6)
0043  7072696E7400       "print\0"
                         const [0]: "print"
0049  04                 const type 4
004A  06000000           string size (6)
004E  68656C6C6F00       "hello\0"
                         const [1]: "hello"
                         * functions:
0054  00000000           sizep (0)
                         * lines:
0058  04000000           sizelineinfo (4)
                         [pc] (line)
005C  03000000           [1] (3)
0060  03000000           [2] (3)
0064  03000000           [3] (3)
0068  03000000           [4] (3)
                         * locals:
006C  00000000           sizelocvars (0)
                         * upvalues:
0070  00000000           sizeupvalues (0)
                         ** end of function **

0074                     ** end of chunk **

you can also test print(#l) , it will show 116 which is 0x74 (hex). If you want to write this to file, it is really a simple task as simple as

 local filename = "output.lu"
 local fh = io.open(filename,"wb")
  fh:write(l)
 io.close(fh)

So if you want to compile your own code and run it rather than run it in plain text, you know a good way to create your own compiled files.

In Closing

Lua is surprisingly a very powerful and interesting language, it has so much more to offer and every time there is something new to learn. This does offer new possibilities in terms of working integrated with lua and your code, more so if you use C/C++ wrappers. If you have a look at LuaForge, there are several projects that do so many wonderful things and if you look carefully, most of the projects are in the 2005-2009 timeframes (perhaps a bit too soon for the wonderful technology). Maybe we can look at creating a debugger, a compiler and a disassembler all from lua itself. If you look at dotNet, Java and Objective-C there are similar shortcomings that leave your code open to those that can and want to peek.

For many that might not have been born in the days of the 8-bit home computers, POKE and PEEK were the two most popular commands that were used to help cheat in games. Changing certain locations would help in achieving infinite lives, strength, items, etc The bytecode format looks seemingly familiar and I am not sure as yet if this can be modified and still work. There are fun applications but then there are also the scary parts, where AD's can be disabled (OH! the horrors) code can be altered to remove checks for In-App purchases and so on.

More soon...

Comments

Popular Posts