Test Driven Development - Synacore VM
Created: March 24, 2019, 1:27 a.m.
Last Updated: March 24, 2019, 1:27 a.m.
So one thing I noticed was absent from a lot of my public projects were good unit tests. I browse /r/C_programming a lot and came across two posts that sprung me into action. The first was this post by /u/mort96 about his testing library snow. It looked super cool and so I figured I'd try it out, but I needed a project to use it with. Luckily a few weeks later I saw this post about a challenge to implement a simple VM. All the pieces came together and now we have this blog post (albeit 3 months after I finished the project). The full code for my implementation of the VM is located on github here.
Registers & Writing to bit-fields:
To really focus on making proper tests, I decided to write the tests as I implemented each feature. The first feature I implemented was the registers. From the Synacore spec we know our registers need to hold 16 bit values. The most basic test we can write is to test whether we can write and read from our registers. For the register implementation I decided to be as compact as possible and made them as bit fields within a struct:
struct registers {
unsigned r0:REG_SIZE;
unsigned r1:REG_SIZE;
unsigned r2:REG_SIZE;
unsigned r3:REG_SIZE;
unsigned r4:REG_SIZE;
unsigned r5:REG_SIZE;
unsigned r6:REG_SIZE;
unsigned r7:REG_SIZE;
} __attribute__( ( packed ) );
typedef struct registers registers;
However I quickly learned that bit fields themselves are not addressable, so on my first attempts to read/write from specific registers using something like:
memcpy(&reg->r0, data, REG_SIZE/8);
Would cause some nasty compiler errors like:
But that wasn't too much of a deterrent. Bytes are addressable and luckily for us the Synacore architecture is 16 bit which is a multiple of 8. So we can do some quick math and determine which byte we should write to based on the register number and the pointer to the registers struct. For instance register 0 will be at byte offset 0, register 1 will be at byte offset 2 etc. So we can extrapolate this to
Our snow test is going to be simple; write some stuff to the registers and read it back, then compare to our expected result:
describe(registers) {
it("write/read registers") {
void * buf = malloc(REG_SIZE+1);
registers * reg = malloc(sizeof(struct registers));
assertneq(buf, NULL);
assertneq(reg,NULL);
// set registers
asserteq(set_register(reg, 1, "dead", strlen("dead")),0);
asserteq(set_register(reg, 2, "beef", strlen("beef")),0);
// verify first is "de"
asserteq(get_register(reg, 1, buf), 0);
asserteq_buf(buf, "dead", REG_SIZE/8);
// verify second is "be"
asserteq(get_register(reg, 2, buf), 0);
asserteq_buf(buf, "beef", REG_SIZE/8);
defer(free(reg));
defer(free(buf));
}
}
With the test ready to go and an implementation idea, I was able to write these two handy methods that pass the tests:
int set_register(registers * reg, int num, void * data, int len) {
//printf("Setting reg %d to %04X len %d\n", num, *(uint16_t *)data, len);
if(!memcpy(((char*)reg) + REG_SIZE_BYTES*num, data, len > 2 ? 2 : len)) {
return -1;
}
return 0;
}
int get_register(registers * reg, int num, void * data) {
if(!memcpy(data, (char *)reg + REG_SIZE_BYTES*num, REG_SIZE_BYTES)) {
return -1;
}
return 0;
}
Stacks on Stacks on Stacks (Finding The First Bug)
Based on the spec I knew the stack neded the following functionality:
dynamically resizes ("infinite" stack)
push items on the stack from a register
pop items off the stack into a register
So those were 3 straightforward requirements to test my implementation for. After writing the following "stack resize test" I got my first unit test win; the test code triggered a heap buffer overflow:
it("stack resizes") {
stack_info * stack = stack_init(INIT_STACK_SIZE);
registers * reg = malloc(sizeof(struct registers));
assertneq(stack, NULL);
// test data
char * tmp, *res;
tmp = calloc(1,REG_SIZE);
res = calloc(1,REG_SIZE);
memcpy(tmp, "12345678912345600", REG_SIZE-1);
// push > INIT_STACK_SIZE data onto stack
// to force a re-size
while(stack->total_size <=
INIT_STACK_SIZE + REG_SIZE_BYTES*STACK_RESIZE*2) {
set_register(reg, 0, tmp, REG_SIZE_BYTES);
asserteq(push_stack(stack,reg,0), 0);
asserteq_buf(stack->mem, tmp, REG_SIZE_BYTES);
}
// clear stack and check values
while(stack->num_elements > 0) {
// pop off stack
pop_stack(stack, reg, 1);
get_register(reg, 1, res);
asserteq_buf(res, tmp, REG_SIZE_BYTES);
}
defer(cleanup_stack(stack));
defer(free(reg));
defer(free(tmp));
defer(free(res));
}
The heap overflow is triggered when we hit the code path that resizes the stack via realloc. Below is the code failing the resize test, see if you can spot the bug:
int push_stack(stack_info * stack, registers * reg, int reg_num) {
char * old_mem;
// check resize
old_mem = stack->mem;
if(stack->total_size < REG_SIZE_BYTES*(stack->num_elements+1)) {
stack->mem = realloc(stack->mem, stack->total_size + REG_SIZE_BYTES*STACK_RESIZE);
if(stack->mem == NULL) {
stack->mem = old_mem;
return -1;
}
}
// push register onto stack
get_register(reg, reg_num, stack->mem + REG_SIZE_BYTES*stack->num_elements);
stack->num_elements++;
return 0;
}
Notice I forgot to update
write to position: 0 w/stack size: 80
write to position: 16 w/stack size: 80
write to position: 32 w/stack size: 80
write to position: 48 w/stack size: 80
write to position: 64 w/stack size: 80
write to position: 80 w/stack size: 128 // resized to 128
write to position: 96 w/stack size: 128 // realloced again to 128 (bug)
write to position: 112 w/stack size: 128 // realloced again to 128 (bug)
write to position: 128 w/stack size: 128 // overflow
On that last iteration it attempts to write another 16 byte entry at position 128. Here is the AddressSanitizer output for the overflow:
This creates the heap buffer overlow
int push_stack(stack_info * stack, registers * reg, int reg_num) {
char * old_mem;
// check resize
old_mem = stack->mem;
if(stack->total_size < REG_SIZE_BYTES*(stack->num_elements+1)) {
stack->mem = realloc(stack->mem, stack->total_size + REG_SIZE_BYTES*STACK_RESIZE);
stack->total_size += REG_SIZE_BYTES*STACK_RESIZE;
if(stack->mem == NULL) {
stack->mem = old_mem;
return -1;
}
}
// push register onto stack
get_register(reg, reg_num, stack->mem + REG_SIZE_BYTES*stack->num_elements);
stack->num_elements++;
return 0;
}
Creating a Virtual Memory Space
To create a 15 bit addressable memory space that stores 16 bit values, we need to allocate a buffer with that amount of space. 15 bits allows for
#define REG_SIZE 16
#define REG_SIZE_BYTES REG_SIZE/8
#define VM_MEM_SIZE 32768*REG_SIZE_BYTES // 15 bit addressable
Now it would be nice if we could easily convert virtual addresses to the actual "physical" address our emulator needs to reference the data. I put physical in quotation marks because our emulator actually has a virtual address space of its own within the host OS's userspace. But since we're emulating the Synacore architecture lets use the virtual/physical terminology to describe addresses inside/outside the emulated program respectively. So I made these quick define statements to easily check if the address is valid and to convert a virtual address to the actual location in our buffer:
#define VALID_MEMORY(x) ((x) <= 32767 && (x) >= 0)
#define VIRT_TO_PHYS(x) ((x)*REG_SIZE_BYTES)
Now allocating this space and iterating throught the binary is pretty straightforward:
// get filesize
fseek(f, 0, SEEK_END);
file_len=ftell(f);
fseek(f, 0, SEEK_SET);
if(file_len <= 0) {
return -1;
}
// create memory space for the "process"
buffer = malloc(VM_MEM_SIZE);
if(!buffer) {
return -1;
}
// read the entire file into memory
fread(buffer, file_len, 1, f);
fclose(f);
printf("Read %lu bytes from disk\n", file_len);
// set io redirection
in = stdin;
out = stdout;
// step through instructions
i=0;
while(i < file_len) {
i+=run_instruction(buffer, i, reg, stack);
}
Implementing Instructions
As we iterate through the binary, we need to interpret each 16 bit value as an opcode and then depending on opcode length iterate a certain amount of bytes (i.e. REG_SIZE_BYTES multiplied by the number of opcode arguments plus the opcode itself) to the next opcode.
To easily grab the first two bytes and iterpret it as an integer I made this quick define:
#define GET_RIGHTMOST_16_BITS(x) ((x) & (unsigned int)0x0000FFFF)
Once we have that, a switch statement is easy to implement and can handle all of our opcodes:
switch(GET_RIGHTMOST_16_BITS(buffer[i])) {
case 0: //halt
exit(0);
case 1: // set register <a> to <b>
a = *(uint16_t*)(buffer+i+REG_SIZE_BYTES);
if(!VALID_REGISTER(a)) {
exit(-1);
}
b = get_reg_immediate(reg, *(uint16_t*)(buffer+i+REG_SIZE_BYTES*2));
set_register(reg, a-32768, &b, REG_SIZE_BYTES);
return REG_SIZE_BYTES*3; // *3 because the set opcode has 2 arguments
....
}
I won't bore you guys describing every single test for each instruction. But if you want to check it out the instructions are in instructions.c and the tests are in tests.c.
Overall the snow library is pretty awesome and makes beautiful test output, so I think it's worth a try if you want to play around with a testing library:
And of course give the Synacore challenge a shot if you haven't already! I may try to implement it again in Rust to get more familiar with the language.