Tracking down bugs in GCC

Tue 01 March 2016

OK I'll admit, the title is kind of click bait; what I think I have is a bug in msp430-gcc, which is The Gnu C compiler for the MSP430 series of microntrollers by Texas Instruments, sorry to dissapoint. While I was hacking on a tiny event loop to power the devices in my personal intranet of things I discovered that something was going a bit crazy with my code. Cracking out mspdebug I noticed that at a certain point control was jumping to a seemingly random location in memory that did not have valid instructions, causing the microntroller to reset. Weird! Where was this happening, and why? Questions, questions; time to go digging!

I managed to track the problem down to the main event loop that pops events off a FIFO and acts on them. An "event" in this context is a two-element C-struct consisting of a function pointer and a data pointer. Even when I provided a perfectly valid function pointer, my code was still jumping to an arbitrary position in memory and resetting. The plot thickens; it looks like I'm going to have to get my hands dirty and dig around a bit in the generated assembly! After some more back-and-forth between my C source and the assembly I managed to construct a minimal example that illustrates the problem that I am having:

// problem_test.c
typedef struct {
    void (*function)(void*) ;
    void *data ;
} event_t ;

extern void placeholder(event_t*) ;
extern void test_function(void*) ;

int main(void) {
    event_t e ;
    e.function = test_function ;  // set to valid function pointer = (void*) 0x03 ;  // arbitrary data
    placeholder(&e) ;  // prevent everything from being optimised away
    e.function( ;

When the above code is compiled with optimisations disabled it produces correct output. The output of msp430-gcc -O0 -S -c problem_test.c is shown below. For clarity I have removed the assembler directives and have added in-line comments.

    ; stack setup and allocation of space for `event_t e`
    mov r1, r4
    add #2, r4
    sub #4, r1
    ; `e.function = test_function`
    mov #test_function, -6(r4)
    ; ` = 0x03`
    mov #3, -4(r4)
    ; call `placeholder(&e)`
    mov r4, r15
    add #llo(-6), r15
    call    #placeholder
    ; call `e.function(`
    mov -6(r4), r14
    mov -4(r4), r15
    call    r14
    ; de-allocate stack space for `e`
    add #4, r1

This code is correct, however if we now enable optimisations, compiling with msp430-gcc -O1 -S -c problem_test.c (-O1 and -O2 produce the same output for the above C code), we get the following assembly:

    ; allocate space for `event_t e` on the stack
    sub #4, r1
    ; `e.function = test_function`
    mov #test_function, @r1
    ; ` = 0x03`
    mov #3, 2(r1)
    ; call `placeholder(&e)`
    mov r1, r15
    call    #placeholder
    ; move `` into r15
    mov 2(r1), r15
    ; ??? call `` ???
    call    2(r1)
    ; de-allocate stack space for `e`
    add #4, r1

The second and third to last lines are the most important ones. we know that r1 points to the top of the stack, and so the values of e.function and can be found with 0(r1) and 2(r1) respectively, as each is a pointer, and hence 2 bytes wide on the MSP430 architecture. Despite this we clearly see that there is a call 2(r1) -- the program is going to jump to the address in and start executing the data it finds there as if they were machine code! Clearly for sufficiently arbitrary data we will very quickly run into something that is not a valid machine instruction and the microcontroller will reset.

So, it appears that we have found the source of the problem, although it is still not clear why the wrong offsets are calculated when optimisations are enabled; I will submit a bug report when I have a moment. As a workaround I noticed that if I use a global variable for the event_t then everything works correctly, even with optimisations enabled. Luckily for my actual use case this is a viable option, so I will be able to keep working until a fix is released.