Lua, RIFF, Source Code and Bytecode
It was around 1996 when after the launch of Windows 95 many saw the new .ani format, animated cursors. Be it the walking dinosaur, the drum, the metronome or the hourglass. This was quite a new and interesting territory at the time. The ani file was nothing more than a series of .ico files that were put together into the ani file format. A rather specialized form of the RIFF format. Now many of you might not have heard of RIFF, in fact RIFF format might be older than many of the developers today. My interactions with this file format was via an applications on Windows 3.1, now I am not 100% if that was due to Painter or was it BitEdit and PalEdit that also offered the saving in RIFF. RIFF just means Resource Interchange File Format, it is a generic format that can help encapsulate a lot of other files/data which the file format refers to Chunks and sub-chunks.
It is in fact a very simple file format, almost like a file index on a filesystem. It has an identifier for a chunk, followed by the size and then the data, which in turn can be further chunks or sub chunks of data. Even in the days of the 16-bit systems, these were WORD aligned (when ints were 8-bits, WORD were 16-bits) which simply means that the size of the chunk was always even, had a trailing 00 (an extra byte) for padding.
AVI - Video (Audio Video Interleave)
PAL - Palette information
MID - Midi sound/tracks
WAV - Wav, raw digital audio file
Here (http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html) and here (http://www.daubnet.com/en/file-format-riff) are some wonderful explanations about RIFF. This article is not about RIFF and animated cursors, but about how it is used (although adapted or modified) in practical applications.
If you are working with Lua, you might well be aware that there are two wonderful lua utilities luac and luadec. Luac is a compiler that converts all of lua code into bytecode, much like Java, Python and even dotNet, and it is equally easy to decompile to get close to the source code output using luadec. In fact if you are a bit more adventurous and/or have been used to working with Assembly, the disassembler for lua (also available in ludec using the -dis option) is close to home. The issue that this brings is that the source code is not secure and can be decompiled or to a large extent the source file can be rebuilt.
This bytecode has no function definitions, 0 upvalues, 0 params and a stack size of 2. There is one local variable and the index 0 is assigned to the same and it is called "a", there is 1 constant, the first one is indexed as 0 and has the value of 1. Then the instruction loadk loads a constant number Bx into register A. so the instruction
just says that load the first constant indexed by 0 into the register 0 (which in this case is the local "a" and the value of the constant 0 is 1) so this is our equivalent of
and in the compiled file, it will look like
This is the hex dump of the bytecode file, and you can see that all compiled Lua files have a signature, that identifies the way the file is compiled and other attributes
so on an x86 platform, the default header will look like
this chunk header is always 12 bytes long. This is checked to determine if the block can be used or not, if all the 12 bytes do not match the header of the platform, it cannot run.
This is followed by a top level function chunk, which has the structure as follows
Where STRING is actually a structure that has two elements
The source name is generally populated only in the Top-level function, in the rest Source name has a size_t of value 0.
If you refer to the graphic (hex dump above, you can see that the bytes after the first 12 (header bytes) are as
The Instructions List is defined as follows
The Constant List is defined as follows
0 = LUA_TNIL
1 = LUA_TBOOLEAN
2 = LUA_TNUMBER
4 = LUA_TSTRING
The constant field does not exist if the constant type is 0, 0 if the constant type is 1, a number if the type is 3 and a string if it is 4. The number is in IEEE 754 64-bit double format and are all edian-sensitive.
Function prototype List is defined as follows
This is followed by a Source line position list, this corresponds to the source line number for each instruction in a function. This information is used by error handlers or debuggers.
Local List, each local variable has three fiels, a string and two integers.
and lastly the upvalue list
Now if we have a look at the hex dump above,
sidenote According to the IEE 754 number format, the following numbers are represented as (in 64-bits)
Now as you can see that it is not very difficult to understand how the lua bytecode works, and in many cases a simple compiled bytecode file can be converted into source code, however note that on Windows this is best and easily achieved, however on the Mac, there are issues on the 32-bit and 64-bit versions, so while luadec is not available in compiled/binary form, compiling luadec into binary is abit of a pain and in many cases frustrating.
If we were not to use the local and instead just used
it would compile to
As you can note there is an extra instruction that is added to the mix, setglobal.
and if we were to have two variables, one number and one string,
and a combination of local and global would look like
and then
Nevertheless, the source code can be decompiled or disassembled, for those that want to, they will, but you can save your source code from prying eyes by compiling it. The second problem is that when you have a couple of lua files, they can all be compiled into bytecode but managing it can get a bit difficult, as there are so many of them. Some documentation suggests that you can use a command like
where as some other suggests that if you have dependencies, compile them first, so the same would look like
To manage that, one suggested way is to place all of the compiled lua files into a single file like in the riff format, or like the ZIP format, that looks like
This is then followed by the file data that is repeated for each file in the archive. So we could have something similar, so let's say we have a lua project with the following files main.lua, lib1.lua, lib2.lua and lib3.lua
we could have something like
followed by with the filename data like in the ZIP format
which are then followed by the file data as
Now I have tired to see if Lua would run such a compiled code, but it would not as the lua interpretter does not understand custom RIFF type file wrapping. This is the reason why there are some special fields like the position in file, this helps to quickly seek to the position and pick up the block of code. The idea is that the compiled file can be decompiled, but if it is placed in a file wrapper like so, it cannot be easily decompiled and there is an added layer of protection to the code. If you are using this from within a C/C++ or an Objective-C app, these files can be extracted to a /tmp location and then executed, to add more security, these can be encrypted, so that the extraction will work only with a particular key that you set in your app. The only question and thing left to try is if the file is extracted at runtime into a /tmp space, can it be executed? and will it refer to the resources in the resources directory?
Keep tuned, if this interests you for the time when I try to answer the question posed, will it work if executed from the /tmp directory with the resources in a /resource directory and if there is a better protection methodology than just this or encryption.
http://www.pkware.com/documents/casestudies/APPNOTE.TXT
Lua VM Instructions - Kein-Hong Man
RIFF Format
http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html
http://www.daubnet.com/en/file-format-riff
IEEE 754 64-Bit double numbers
http://babbage.cs.qc.cuny.edu/IEEE-754.old/References.xhtml
http://speleotrove.com/decimal/
http://en.wikipedia.org/wiki/IEEE_754-2008
It is in fact a very simple file format, almost like a file index on a filesystem. It has an identifier for a chunk, followed by the size and then the data, which in turn can be further chunks or sub chunks of data. Even in the days of the 16-bit systems, these were WORD aligned (when ints were 8-bits, WORD were 16-bits) which simply means that the size of the chunk was always even, had a trailing 00 (an extra byte) for padding.
Common Usages
ANI - Animated CursorsAVI - Video (Audio Video Interleave)
PAL - Palette information
MID - Midi sound/tracks
WAV - Wav, raw digital audio file
Format
ID 4 bytes a four character identifier generally padded with space SIZE 4 bytes size of the data DATA size_bytes The data
Here (http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html) and here (http://www.daubnet.com/en/file-format-riff) are some wonderful explanations about RIFF. This article is not about RIFF and animated cursors, but about how it is used (although adapted or modified) in practical applications.
If you are working with Lua, you might well be aware that there are two wonderful lua utilities luac and luadec. Luac is a compiler that converts all of lua code into bytecode, much like Java, Python and even dotNet, and it is equally easy to decompile to get close to the source code output using luadec. In fact if you are a bit more adventurous and/or have been used to working with Assembly, the disassembler for lua (also available in ludec using the -dis option) is close to home. The issue that this brings is that the source code is not secure and can be decompiled or to a large extent the source file can be rebuilt.
Building the bytecode
When we compile a simple line likelocal a = 1it is converted into a series of instructions that the LuaVM can understand, so the same text above, to the compiler would look like
; x86 standard (32-bit, little endian, doubles) ; function [0] definition (level 1) ; 0 upvalues, 0 params, 2 stacks .function 0 0 2 2 .local "a" ; 0 .const 1 ; 0 [1] loadk 0 0 ; 1 [2] return 0 1 ; end of function
This bytecode has no function definitions, 0 upvalues, 0 params and a stack size of 2. There is one local variable and the index 0 is assigned to the same and it is called "a", there is 1 constant, the first one is indexed as 0 and has the value of 1. Then the instruction loadk loads a constant number Bx into register A. so the instruction
loadk 0 0
just says that load the first constant indexed by 0 into the register 0 (which in this case is the local "a" and the value of the constant 0 is 1) so this is our equivalent of
local a=1
and in the compiled file, it will look like
This is the hex dump of the bytecode file, and you can see that all compiled Lua files have a signature, that identifies the way the file is compiled and other attributes
HEADER 4 bytes ESC + Lua or 0x1B4C7561 VERSION 1 byte Q or 0x51 (version 5.1) FORMAT 1 byte 0 = Official Version EDIANNESS 1 byte 0 = Big Edian, 1= Little Edian SIZEOFINT 1 byte Default 4 SIZE_T 1 byte Default 4 SIZE_INSTR 1 byte Default 4 SIZE_NUMBER 1 byte Default 8 INTEGRALFLAG 1 byte 0=Floating Point, 1=Integral number type
so on an x86 platform, the default header will look like
1B4C7561 51000104 04040800
this chunk header is always 12 bytes long. This is checked to determine if the block can be used or not, if all the 12 bytes do not match the header of the platform, it cannot run.
This is followed by a top level function chunk, which has the structure as follows
Source Name STRING Line defined INTEGER Last line defined INTEGER No. of Upvalues 1 BYTE No. of Params 1 BYTE is_vararg Flag 1 BYTE Max Stack Size 1 BYTE List of Instructions LIST List of Constants LIST List of Functions Proto LIST Source Line Positions LIST List of Locals LIST List of Upvalues LIST
Where STRING is actually a structure that has two elements
SIZE_T String Data Size BYTES The string data, terminated with a NUL (ASCII 0)
The source name is generally populated only in the Top-level function, in the rest Source name has a size_t of value 0.
If you refer to the graphic (hex dump above, you can see that the bytes after the first 12 (header bytes) are as
0A000000 This is 10 (decimal) 406D6169 6E2E6C75 6100 This equates to main.lua\0
The Instructions List is defined as follows
INTEGER Size of code ISNTRUCTION VM Instructions
The Constant List is defined as follows
INTEGER Size of Constant List [ 1 byte Type of constant Constant The constant itself ]Where the Type of constants are
0 = LUA_TNIL
1 = LUA_TBOOLEAN
2 = LUA_TNUMBER
4 = LUA_TSTRING
The constant field does not exist if the constant type is 0, 0 if the constant type is 1, a number if the type is 3 and a string if it is 4. The number is in IEEE 754 64-bit double format and are all edian-sensitive.
Function prototype List is defined as follows
INTEGER Size of the function prototype [Functions] The function prototype bytecode data
This is followed by a Source line position list, this corresponds to the source line number for each instruction in a function. This information is used by error handlers or debuggers.
INTEGER Size of source line position list [INTEGER] list index corresponds to the instruction position
Local List, each local variable has three fiels, a string and two integers.
INTEGER size of local list [ STRING Name of local variable INTEGER Start of local variable scope INTEGER End of local variable scope ]
and lastly the upvalue list
INTEGER Size of upvalue list [ STRING Name of upvalue ]
Now if we have a look at the hex dump above,
0000 ** global header start ** 0000 1B4C7561 header signature: "\27Lua" 0004 51 version (major:minor hex digits) 0005 00 format (0=official) 0006 01 endianness (1=little endian) 0007 04 size of int (bytes) 0008 04 size of size_t (bytes) 0009 04 size of Instruction (bytes) 000A 08 size of number (bytes) 000B 00 integral (1=integral) * number type: double * x86 standard (32-bit, little endian, doubles) ** global header end ** 000C ** function [0] definition (level 1) ** start of function ** 000C 0A000000 string size (10) 0010 406D61696E2E6C75+ "@main.lu" 0018 6100 "a\0" source name: @main.lua 001A 00000000 line defined (0) 001E 00000000 last line defined (0) 0022 00 nups (0) 0023 00 numparams (0) 0024 02 is_vararg (2) 0025 02 maxstacksize (2) * code: 0026 02000000 sizecode (2) 002A 01000000 [1] loadk 0 0 ; 1 002E 1E008000 [2] return 0 1 * constants: 0032 01000000 sizek (1) 0036 03 const type 3 0037 000000000000F03F const [0]: (1) * functions: 003F 00000000 sizep (0) * lines: 0043 02000000 sizelineinfo (2) [pc] (line) 0047 01000000 [1] (1) 004B 01000000 [2] (1) * locals: 004F 01000000 sizelocvars (1) 0053 02000000 string size (2) 0057 6100 "a\0" local [0]: a 0059 01000000 startpc (1) 005D 01000000 endpc (1) * upvalues: 0061 00000000 sizeupvalues (0) ** end of function ** 0065 ** end of chunk **
sidenote According to the IEE 754 number format, the following numbers are represented as (in 64-bits)
Number Sign[1] Exponent[11] Significand[52] 1 0 (+) 01111111111 (0) 1.0000000000000000000000000000000000000000000000000000 (1.00) 0x3FF0000000000000 2 0 (+) 10000000000 (+1) 1.0000000000000000000000000000000000000000000000000000 (1.00) 0x4000000000000000 3 0 (+) 10000000000 (+1) 1.1000000000000000000000000000000000000000000000000000 (1.50) 0x4008000000000000 4 0 (+) 10000000001 (+2) 1.0000000000000000000000000000000000000000000000000000 (1.00) 0x4010000000000000 5 0 (+) 10000000001 (+2) 1.0100000000000000000000000000000000000000000000000000 (1.25) 0x4014000000000000 6 0 (+) 10000000001 (+2) 1.1000000000000000000000000000000000000000000000000000 (1.50) 0x4018000000000000 7 0 (+) 10000000001 (+2) 1.1100000000000000000000000000000000000000000000000000 (1.75) 0x401C000000000000 8 0 (+) 10000000010 (+3) 1.1000000000000000000000000000000000000000000000000000 (1.00) 0x4020000000000000 9 0 (+) 10000000010 (+3) 1.0010000000000000000000000000000000000000000000000000 (1.125) 0x4022000000000000 10 0 (+) 10000000010 (+3) 1.0100000000000000000000000000000000000000000000000000 (1.25) 0x4024000000000000 11 0 (+) 10000000010 (+3) 1.0110000000000000000000000000000000000000000000000000 (1.375) 0x4026000000000000 -1 1 (-) 01111111111 (0) 1.0000000000000000000000000000000000000000000000000000 (1.00) 0xBFF0000000000000So, you can see that the number 1 is represented as 0x3FF0000000000000 and in the code above, it is displayed as 000000000000F03F
Now as you can see that it is not very difficult to understand how the lua bytecode works, and in many cases a simple compiled bytecode file can be converted into source code, however note that on Windows this is best and easily achieved, however on the Mac, there are issues on the 32-bit and 64-bit versions, so while luadec is not available in compiled/binary form, compiling luadec into binary is abit of a pain and in many cases frustrating.
If we were not to use the local and instead just used
a = 1
it would compile to
; x86 standard (32-bit, little endian, doubles) ; function [0] definition (level 1) ; 0 upvalues, 0 params, 2 stacks .function 0 0 2 2 .const "a" ; 0 .const 1 ; 1 [1] loadk 0 1 ; 1 [2] setglobal 0 0 ; a [3] return 0 1 ; end of function
As you can note there is an extra instruction that is added to the mix, setglobal.
and if we were to have two variables, one number and one string,
a,b = 1,"ball"it would look like
; x86 standard (32-bit, little endian, doubles) ; function [0] definition (level 1) ; 0 upvalues, 0 params, 2 stacks .function 0 0 2 2 .const "a" ; 0 .const "b" ; 1 .const 1 ; 2 .const "ball" ; 3 [1] loadk 0 2 ; 1 [2] loadk 1 3 ; "ball" [3] setglobal 1 1 ; b [4] setglobal 0 0 ; a [5] return 0 1 ; end of function
and a combination of local and global would look like
local a= 1 b = "ball"
and then
; x86 standard (32-bit, little endian, doubles) ; function [0] definition (level 1) ; 0 upvalues, 0 params, 2 stacks .function 0 0 2 2 .local "a" ; 0 .const 1 ; 0 .const "b" ; 1 .const "ball" ; 2 [1] loadk 0 0 ; 1 [2] loadk 1 2 ; "ball" [3] setglobal 1 1 ; b [4] return 0 1 ; end of function
Nevertheless, the source code can be decompiled or disassembled, for those that want to, they will, but you can save your source code from prying eyes by compiling it. The second problem is that when you have a couple of lua files, they can all be compiled into bytecode but managing it can get a bit difficult, as there are so many of them. Some documentation suggests that you can use a command like
luac main.lua lib1.lua lib2.lua lib3.lua > myapplua.out
where as some other suggests that if you have dependencies, compile them first, so the same would look like
luac lib1.lua lib2.lua lib3.lua main.lua > myapplua.out
To manage that, one suggested way is to place all of the compiled lua files into a single file like in the riff format, or like the ZIP format, that looks like
Header 4 bytes (0x04034B50) version required 2 bytes general purpose flag 2 bytes compression method 2 bytes last mod file time 2 bytes last mod file date 2 bytes crc-32 4 bytes compressed size 4 bytes uncompressed size 4 bytes filename length 2 bytes extra field length 2 bytes file name (variable size) extra field (variable size)
This is then followed by the file data that is repeated for each file in the archive. So we could have something similar, so let's say we have a lua project with the following files main.lua, lib1.lua, lib2.lua and lib3.lua
we could have something like
HEADER 4 bytes (CLUA) BLOCK IDENTIFIER 4 bytes (FILE_BLOCK) SIZE OF BLOCK 4 bytes (SIZE OF BLOCK) NO. OF RECORDS 4 bytes (ENTRIES)
followed by with the filename data like in the ZIP format
BLOCK IDENTIFIER 4 bytes (FILE_NAME) START_POSITION 4 bytes (Position in the file where the data starts) FILENAME_LENGTH 4 bytes (length of the filename) FILENAME STRING name followed by a \0
which are then followed by the file data as
BLOCK_IDENTIFIER 4 bytes (FILE_DATA) END_OF_BLOCK 4 bytes (Position where the block ends) BLOCK_LENGTH 4 bytes (Size of the compiled file) DATA Variable Length
Now I have tired to see if Lua would run such a compiled code, but it would not as the lua interpretter does not understand custom RIFF type file wrapping. This is the reason why there are some special fields like the position in file, this helps to quickly seek to the position and pick up the block of code. The idea is that the compiled file can be decompiled, but if it is placed in a file wrapper like so, it cannot be easily decompiled and there is an added layer of protection to the code. If you are using this from within a C/C++ or an Objective-C app, these files can be extracted to a /tmp location and then executed, to add more security, these can be encrypted, so that the extraction will work only with a particular key that you set in your app. The only question and thing left to try is if the file is extracted at runtime into a /tmp space, can it be executed? and will it refer to the resources in the resources directory?
Keep tuned, if this interests you for the time when I try to answer the question posed, will it work if executed from the /tmp directory with the resources in a /resource directory and if there is a better protection methodology than just this or encryption.
Sources
Zip formathttp://www.pkware.com/documents/casestudies/APPNOTE.TXT
Lua VM Instructions - Kein-Hong Man
RIFF Format
http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html
http://www.daubnet.com/en/file-format-riff
IEEE 754 64-Bit double numbers
http://babbage.cs.qc.cuny.edu/IEEE-754.old/References.xhtml
http://speleotrove.com/decimal/
http://en.wikipedia.org/wiki/IEEE_754-2008
Comments
Post a Comment