-
Notifications
You must be signed in to change notification settings - Fork 5
Haxathon Supremacy Virtual Machine
License
endeav0r/hsvm
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
__ __ ____ __ __ _ /\ \/\ \ /\ _`\ /\ \/\ \ /'\_/`\ /' \ \ \ \_\ \ \ \,\L\_\ \ \ \ \ \ /\ \ __ __ /\_, \ \ \ _ \ \/_\__ \ \ \ \ \ \ \ \ \__\ \ /\ \/\ \ \/_/\ \ \ \ \ \ \ /\ \L\ \ \ \ \_/ \ \ \ \_/\ \ \ \ \_/ | \ \ \ \ \_\ \_\ \ `\____\ \ `\___/ \ \_\\ \_\ \ \___/ \ \_\ \/_/\/_/ \/_____/ `\/__/ \/_/ \/_/ \/__/ \/_/.2 HAXATHON SUPREMACY VIRTUAL MACHINE v1.2 Design Document [email protected] 19 SEP 2012 ========================== = Table of Contents ========================== This document is broken into the following sections - Background - Bug Reports - HSVMv1 Conventions - Instruction Encoding - Registers - Instructions - System Calls ========================== = BACKGROUND ========================== The HSVMv1 implements a big-endian, RISC CPU with 16-bit registers, 32-bit fixed-width instructions and 64kb of byte-addressable memory. It has been developed to facilitate binary analysis tasks in the contest Haxathon Supremacy. There are four components to HSVMv1: assembler, disassembler, vm and debugger. This document outlines the details of the HSVMv1 instruction set. ========================== = BUG REPORTS ========================== Just like every challenge released during Haxathon Supremacy, HSVMv1 is 100% gauranteed bug-free. If you think you've found a bug, you haven't. If you _really_ think you've found a bug, maybe you should keep it to yourself because MAYBE, just _maybe_, *perhaps* there's a /tiny/, [slight] possibility it's important. ========================== = HSVMv1 Conventions ========================== The r0 register is used for returning results. All registers _except_ for r0 must be callee saved. Arguments are passed on the stack. Caller is responsible for stack cleanup. This is the same as the cdecl calling convention. Program execution will begin at address 0. There is no convention for format of data and code. The two are often found freely mixed throughout binaries. This may frustrate disassembly. System calls are performed by loading the desired system call to perform in r0 and the remaining arguments in registers r1-r7 in ascending order. ========================== = INSTRUCTION ENCODING ========================== Instructions are encoded according to the following struct: struct _instruction { uint8_t opcode; uint8_t operand_0; union { struct { uint8_t operand_1; uint8_t operand_2; } __attribute__((packed)); uint16_t lval; }; } __attribute__((packed)); This enables the following encodings: 0-7 8-15 16-23 24-31 Encoding Name |--------------|--------------|--------------|--------------| | OPCODE | | | | Encoding-A |--------------|--------------|--------------|--------------| | OPCODE | REGISTER-A | | | Encoding-B |--------------|--------------|--------------|--------------| | OPCODE | REGISTER-A | REGISTER-B | | Encoding-C |--------------|--------------|--------------|--------------| | OPCODE | REGISTER-A | REGISTER-B | REGISTER-C | Encoding-D |--------------|--------------|--------------|--------------| | OPCODE | REGISTER-A | LVAL | Encoding-E |--------------|--------------|--------------|--------------| | OPCODE | | LVAL | Encoding-F |--------------|--------------|--------------|--------------| The format of the instruction is determined by the opcode. Not all instructions require all 32-bits. However, the size of instructions is fixed at 32-bits. Instruction bytes not required for the instruction may contain any value and will not invalidate the instruction. While all instructions are 32-bits, instructions have no alignment requirements. ========================== = REGISTERS ========================== There are 10 general purpose registers. The first eight are of the form: r0, r1, r2, r3, r4, r5, r6 and r7 The last two GPRs are rsp and rbp. The register rsp is modified directly by the instructions push, pop, call and ret. The instruction pointer is rip and is directly accessible by the user in all places where other GPRs are accessible. There is a flags register unavailable to the user. It is set by the result of the cmp instruction. The encodings for the user-accessible registers are: r0 : 0x00 r1 : 0x01 r2 : 0x02 r3 : 0x03 r4 : 0x04 r5 : 0x05 r6 : 0x06 r7 : 0x0A rbp : 0x08 rsp : 0x09 rip : 0x07 It may seem odd that rip is 0x07 while 0x07 is 0x0A. This decision was made carefully... check your ASCII charts. ========================== = INSTRUCTIONS ========================== The HSVMv1 instructions are: add, sub, mul, div, mod, and, or, xor cmp, je, jne, jl, jle, jg, jge call, jmp, ret load, loadb, stor, storb, push, pop in, out mov, hlt, nop, syscall -------------------------- - Arithmethic Instructions -------------------------- add, sub, mul, div, mod, and, or, xor All arithmetic instructions support Encoding-D and Encoding-E When given Encoding-D, the interpretation is: A = B OP C ex: add r0, r1, r2 ; r0 = r1 + r2 The opcode encodings are: add : 0x10 sub : 0x12 mul : 0x14 div : 0x16 mod : 0x18 and : 0x1A or : 0x1C xor : 0x1E When given Encoding-E, the interpretation is: A = A OP LVAL ex: add r0, 1 ; r0 = r0 + 1 The opcode encodings are: add : 0x11 sub : 0x13 mul : 0x15 div : 0x17 mod : 0x19 and : 0x1B or : 0x1D xor : 0x1F ------------------------------------ - Conditional Branching Instructions ------------------------------------ cmp, je, jne, jl, jle, jg, jge The cmp instruction supports Encoding-C and Encoding-E When given Encoding-C, the interpretation is: flags = A - B ex: cmp r0, r2 ; flags = r0 - r2 The opcode encoding is: cmp : 0x53 When given Encoding-E, the interpretation is: flags = A - LVAL ex: cmp r0, 2 ; flags = r0 - 2 The opcode encoding is: cmp : 0x54 The je, jne, jl, jle, jg and jge instructions support Encoding-F, where the LVAL is added to rip if the condition is met. The conditions for each are: je -> flags == 0 jne -> flags != 0 jl -> flags < 0 jle -> flags <= 0 jg -> flags > 0 jge -> flags >= 0 je : 0x21 jne : 0x22 jl : 0x23 jle : 0x24 jg : 0x25 jge : 0x26 ---------------------------------------- - Non-Conditional Branching Instructions ---------------------------------------- call, jmp, ret The call instruction supports Encoding-B and Encoding-F When given Encoding-B, the interpretation is: memory[rsp] = rip rsp = rsp - 2 rip += A ex: call r0 The opcode encoding is: call : 0x28 When given Encoding-F, the interpretation is: memory[rsp] = rip rsp = rsp - 2 rip += LVAL ex: call 48 The jmp instruction supports Encoding-F When given Encoding-F, the interpreration is: rip += LVAL The opcode encoding is: jmp : 0x20 ex: jmp -24 The ret instruction supports Encoding-A When given Encoding-A, the interpretation is: rsp = rsp + 2 rip = mem[rsp] ex: ret The opcode encoding is: ret : 0x29 ---------------------------- - Memory-Access Instructions ---------------------------- load, loadb, stor, storb, push, pop There are three families of memory access instructions. load, stor = load and store 16-bit values loadb, storb, = load and store 8-bit values push, pop = push and pop 16-bit value to/from the stack load, stor, loadb and storb support Encoding-C and Encoding-E The opcode encodings for Encoding-C are: load : 0x31 loadb : 0x33 stor : 0x35 storb : 0x37 Pseudocode for these instructions are: load r0, r1 : r0 = mem[r1] loadb r0, r1 : r0 = mem[r1] & 0x00FF stor r0, r1 : mem[r0] = r1 storb r0, r1 : mem[r0] = r1 & 0x00FF (only modifies a single byte) The opcode encodings for Encoding-E are: load : 0x30 loadb : 0x32 stor : 0x34 storb : 0x36 Pseudocode for these instructions are: load r0, 0x0040 : r0 = mem[0x0040] loadb r0, 0x0040 : r0 = mem[0x0040] & 0x00FF stor 0x0040, r0 : mem[0x0040] = r0 storb 0x0040, r0 : mem[0x0040] = r0 & 0x00FF (only modifies a single byte) push supports Encoding-B and Encoding-F When given Encoding-B, the interpretation is: rsp = rsp - 2 mem[rsp] = A ex: push r0 The opcode encoding is: push : 0x42 When given Encoding-F, the interpretation is: rsp = rsp - 2 mem[rsp] = LVAL ex: push 0x0090 The opcode encoding is: push : 0x43 pop supports Encoding-B When given Encoding-B, the interpreration is: A = mem[rsp] rsp = rsp + 2 ex: pop r0 The opcode encoding is: pop : 0x44 ------------------ - I/O Instructions ------------------ in, out These instructions deal with stdin/stdout one byte at a time. in and out support Encoding -B When given Encoding-B, the interpretation of in is: A = getchar(); ex: in r0 The opcode encoding is: in : 0x40 When given Encoding-B, the interpretation of out is: putchar(A); ex: out r0 The opcode encoding is: out: 0x41 ---------------------------- - Miscellaneous Instructions ---------------------------- mov, hlt, nop, syscall The mov instruction is used to move values into registers and to move values between registers. mov supports Encoding-C and Encoding-E When given Encoding-C, the interpretation of mov is: A = B ex: mov r0, r1 The opcode encoding is: mov : 0x51 When given Encoding-E, the interpretation of mov is: A = LVAL ex: mov r0, 0x0040 The opcode encoding is: mov : 0x52 The hlt instruction terminates execution of the VM. The nop instruction advances the instruction pointer and performs no operation. The syscall instruction performs a system call and is handled directly by the VM. More information about system calls is detailed in the system calls section. nop, syscall and hlt support Encoding-A The opcode encodings are: hlt : 0x60 nop : 0x90 syscall : 0x61 The syscall instruction performs a system call and is handled directly by the VM. More information about system calls is detailed in the syscall section. ========================== = SYSTEM CALLS ========================== System calls are executed with the syscall instruction. r0 is set to a value corresponding to the desired system call. The registers r1-r7 are set to the arguments of the system call. The system call result is placed in r0 on return. There are four implement system calls in HSVMv1. They are: 0: OPEN 1: READ 2: WRITE 3: CLOSE ------ - open ------ uint16 open (const char * path, int oflag); oflag is a bitmask of the following values: READ - 1 - Open the file for reading WRITE - 2 - Open the file for writing APPEND - 4 - Append to the file CREATE - 8 - Create the file if it does not exist HSVM Assembly Example: filename: "thefile" mov r1, filename ; r1 holds the address of string "thefile" mov r2, 1 ; r2 is set to READ mov r0, 0 ; r0 is set to OPEN syscall ; r0 now holds the file descriptor, or 0xffff on error ------ - read ------ uint16 read (uint16 filedes, void * buf, uint16 buf_size); HSVM Assembly Example: filename: "thefile" mov r1, filename ; r1 holds the address of string "thefile" mov r2, 1 ; r2 is set to READ mov r0, 0 ; r0 is set to OPEN syscall ; r0 now holds the file descriptor, or 0xffff on error mov r1, r0 ; set r1 to file descriptor mov r2, rbp sub r2, 0x80 ; buf is 64 bytes below base pointer mov r3, 64 ; buffer is 32 bytes large mov r0, 1 ; r0 is set to READ syscall ------------- - write, close ------------- write and close do not include assembly examples. uint16 write (uint16 filedes, void * buf, uint16 buf_size); uint16 close (uint16 filedes);
About
Haxathon Supremacy Virtual Machine
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published