<-- home

Test Driven Development - Synacore VM

So one thing I noticed was absent from a lot of my public projects were good unit tests. I browse /r/C_programming a lot and came across two posts that sprung me into action. The first was this post by /u/mort96 about his testing library snow. It looked super cool and so I figured I'd try it out, but I needed a project to use it with. Luckily a few weeks later I saw this post about a challenge to implement a simple VM. All the pieces came together and now we have this blog post (albeit 3 months after I finished the project). The full code for my implementation of the VM is located on github here.

Registers & Writing to bit-fields:

To really focus on making proper tests, I decided to write the tests as I implemented each feature. The first feature I implemented was the registers. From the Synacore spec we know our registers need to hold 16 bit values. The most basic test we can write is to test whether we can write and read from our registers. For the register implementation I decided to be as compact as possible and made them as bit fields within a struct:

struct registers {
        unsigned r0:REG_SIZE;
        unsigned r1:REG_SIZE;
        unsigned r2:REG_SIZE;
        unsigned r3:REG_SIZE;
        unsigned r4:REG_SIZE;
        unsigned r5:REG_SIZE;
        unsigned r6:REG_SIZE;
        unsigned r7:REG_SIZE;
} __attribute__( ( packed ) );
typedef struct registers registers;

However I quickly learned that bit fields themselves are not addressable, so on my first attempts to read/write from specific registers using something like:

memcpy(&reg->r0, data, REG_SIZE/8);

Would cause some nasty compiler errors like:

But that wasn't too much of a deterrent. Bytes are addressable and luckily for us the Synacore architecture is 16 bit which is a multiple of 8. So we can do some quick math and determine which byte we should write to based on the register number and the pointer to the registers struct. For instance register 0 will be at byte offset 0, register 1 will be at byte offset 2 etc. So we can extrapolate this to reg+(REG_SIZE/8)*num.

Our snow test is going to be simple; write some stuff to the registers and read it back, then compare to our expected result:


describe(registers) {
        it("write/read registers") {
                void * buf = malloc(REG_SIZE+1);
                registers * reg = malloc(sizeof(struct registers));
                assertneq(buf, NULL);
                assertneq(reg,NULL);

                // set registers
                asserteq(set_register(reg, 1, "dead", strlen("dead")),0);
                asserteq(set_register(reg, 2, "beef", strlen("beef")),0);

                // verify first is "de"
                asserteq(get_register(reg, 1, buf), 0);
                asserteq_buf(buf, "dead", REG_SIZE/8);

                // verify second is "be" 
                asserteq(get_register(reg, 2, buf), 0);
                asserteq_buf(buf, "beef", REG_SIZE/8);
                defer(free(reg));
                defer(free(buf));
        }
}

With the test ready to go and an implementation idea, I was able to write these two handy methods that pass the tests:


int set_register(registers * reg, int num, void * data, int len) {
    //printf("Setting reg %d to %04X len %d\n", num, *(uint16_t *)data, len);
    if(!memcpy(((char*)reg) + REG_SIZE_BYTES*num, data, len > 2 ? 2 : len)) {
        return -1;
    }   
    return 0;
}

int get_register(registers * reg, int num, void * data) {
    if(!memcpy(data, (char *)reg + REG_SIZE_BYTES*num, REG_SIZE_BYTES)) {
        return -1;
    }
    return 0;
}

Stacks on Stacks on Stacks (Finding The First Bug)

Based on the spec I knew the stack neded the following functionality:


  • dynamically resizes ("infinite" stack)

  • push items on the stack from a register

  • pop items off the stack into a register

So those were 3 straightforward requirements to test my implementation for. After writing the following "stack resize test" I got my first unit test win; the test code triggered a heap buffer overflow:

it("stack resizes") {
        stack_info * stack = stack_init(INIT_STACK_SIZE);
        registers * reg = malloc(sizeof(struct registers));
        assertneq(stack, NULL);

        // test data
        char * tmp, *res;
        tmp = calloc(1,REG_SIZE);
        res = calloc(1,REG_SIZE);
        memcpy(tmp, "12345678912345600", REG_SIZE-1);

        // push > INIT_STACK_SIZE data onto stack
        // to force a re-size
        while(stack->total_size <= 
                INIT_STACK_SIZE + REG_SIZE_BYTES*STACK_RESIZE*2) {
                set_register(reg, 0, tmp, REG_SIZE_BYTES);
                asserteq(push_stack(stack,reg,0), 0);
                asserteq_buf(stack->mem, tmp, REG_SIZE_BYTES);
        }

        // clear stack and check values
        while(stack->num_elements > 0) {

                // pop off stack
                pop_stack(stack, reg, 1);
                get_register(reg, 1, res);
                asserteq_buf(res, tmp, REG_SIZE_BYTES);
        }
        defer(cleanup_stack(stack));
        defer(free(reg));
        defer(free(tmp));
        defer(free(res));       
}

The heap overflow is triggered when we hit the code path that resizes the stack via realloc. Below is the code failing the resize test, see if you can spot the bug:

int push_stack(stack_info * stack, registers * reg, int reg_num) {
        
        char * old_mem;
        
        // check resize
        old_mem = stack->mem;
        if(stack->total_size < REG_SIZE_BYTES*(stack->num_elements+1)) {
                stack->mem = realloc(stack->mem, stack->total_size + REG_SIZE_BYTES*STACK_RESIZE);
                if(stack->mem == NULL) {
                        stack->mem = old_mem;
                        return -1;
                }
        }

        // push register onto stack
        get_register(reg, reg_num, stack->mem + REG_SIZE_BYTES*stack->num_elements); 
        stack->num_elements++;
        return 0;
}

Notice I forgot to update stack->total_size after the call to realloc! This means stack->total_size + REG_SIZE_BYTES*STACK_RESIZE will always evaluate to 128. So when the test runs it will perform the following iterations:

write to position: 0 w/stack size: 80
write to position: 16 w/stack size: 80
write to position: 32 w/stack size: 80
write to position: 48 w/stack size: 80
write to position: 64 w/stack size: 80
write to position: 80 w/stack size: 128  // resized to 128 
write to position: 96 w/stack size: 128  // realloced again to 128 (bug)
write to position: 112 w/stack size: 128 // realloced again to 128 (bug)
write to position: 128 w/stack size: 128 // overflow

On that last iteration it attempts to write another 16 byte entry at position 128. Here is the AddressSanitizer output for the overflow:

This creates the heap buffer overlow WRITE of size 16 located 0 bytes to the right of our 128-byte region. As you can see the AddressSanitzer output is extremely helpful for debugging, which is why I always try to use it with my projects. So our fix is to just increase the stack->total_size variable:

int push_stack(stack_info * stack, registers * reg, int reg_num) {
        
        char * old_mem;
        
        // check resize
        old_mem = stack->mem;
        if(stack->total_size < REG_SIZE_BYTES*(stack->num_elements+1)) {
                stack->mem = realloc(stack->mem, stack->total_size + REG_SIZE_BYTES*STACK_RESIZE);
                stack->total_size += REG_SIZE_BYTES*STACK_RESIZE;
                if(stack->mem == NULL) {
                        stack->mem = old_mem;
                        return -1;
                }
        }

        // push register onto stack
        get_register(reg, reg_num, stack->mem + REG_SIZE_BYTES*stack->num_elements); 
        stack->num_elements++;
        return 0;
}

Creating a Virtual Memory Space

To create a 15 bit addressable memory space that stores 16 bit values, we need to allocate a buffer with that amount of space. 15 bits allows for 2^15 = 32768 unique addresses, and each 16 bit value will require 2 bytes. So 32768*(16/8) is the size of our virtual address space.

#define REG_SIZE 16
#define REG_SIZE_BYTES REG_SIZE/8
#define VM_MEM_SIZE 32768*REG_SIZE_BYTES // 15 bit addressable

Now it would be nice if we could easily convert virtual addresses to the actual "physical" address our emulator needs to reference the data. I put physical in quotation marks because our emulator actually has a virtual address space of its own within the host OS's userspace. But since we're emulating the Synacore architecture lets use the virtual/physical terminology to describe addresses inside/outside the emulated program respectively. So I made these quick define statements to easily check if the address is valid and to convert a virtual address to the actual location in our buffer:

#define VALID_MEMORY(x) ((x) <= 32767 && (x) >= 0)
#define VIRT_TO_PHYS(x) ((x)*REG_SIZE_BYTES)

Now allocating this space and iterating throught the binary is pretty straightforward:

// get filesize
fseek(f, 0, SEEK_END);
file_len=ftell(f);
fseek(f, 0, SEEK_SET);
if(file_len <= 0) {
    return -1;
}

// create memory space for the "process"
buffer = malloc(VM_MEM_SIZE);
if(!buffer) {
    return -1;
}

// read the entire file into memory
fread(buffer, file_len, 1, f);
fclose(f);
printf("Read %lu bytes from disk\n", file_len);

// set io redirection
in = stdin;
out = stdout;

// step through instructions
i=0;
while(i < file_len) {
    i+=run_instruction(buffer, i, reg, stack);  
}

Implementing Instructions

As we iterate through the binary, we need to interpret each 16 bit value as an opcode and then depending on opcode length iterate a certain amount of bytes (i.e. REG_SIZE_BYTES multiplied by the number of opcode arguments plus the opcode itself) to the next opcode.

To easily grab the first two bytes and iterpret it as an integer I made this quick define:

#define GET_RIGHTMOST_16_BITS(x) ((x) & (unsigned int)0x0000FFFF)

Once we have that, a switch statement is easy to implement and can handle all of our opcodes:

switch(GET_RIGHTMOST_16_BITS(buffer[i])) {
        
        case 0: //halt
            exit(0);
        case 1: // set register <a> to <b>
            a = *(uint16_t*)(buffer+i+REG_SIZE_BYTES);
            if(!VALID_REGISTER(a)) {
                exit(-1);
            }
            b = get_reg_immediate(reg, *(uint16_t*)(buffer+i+REG_SIZE_BYTES*2));
            set_register(reg, a-32768, &b, REG_SIZE_BYTES);
            return REG_SIZE_BYTES*3;  // *3 because the set opcode has 2 arguments
        ....
}

I won't bore you guys describing every single test for each instruction. But if you want to check it out the instructions are in instructions.c and the tests are in tests.c.

Overall the snow library is pretty awesome and makes beautiful test output, so I think it's worth a try if you want to play around with a testing library:

And of course give the Synacore challenge a shot if you haven't already! I may try to implement it again in Rust to get more familiar with the language.


References:

The Challenge Website