It’s been quite a while since I took compilers course in college and I wanted to re-hash at least some of the lost knowledge while at RC. Granted, Paul Biggar did PhD in compilers/interpreters and agreed to do a quick lecture. One idea that resulted from it was to write a brainf*ck interpreter in Python.

Brainf*ck is a minimalistic language with only 8 one-character commands, that is however turing-complete. Its machine consists of a tape (initialized with all cells equal to 0) and a pointer to current cell. You can move left or right, increment/decrement value of the cell, read from standard input to a cell and write cell contents to standard output. There are also two control-flow commands that make while loops possible.

A couple of examples:

,.

will read a character and print it.

++++++++[>++++[>++>+++>+++>+<<<<-]>+>->+>>+[<]<-]>>
.>>---.+++++++..+++.>.<<-.>.+++.------.--------.>+.>++.

prints “Hello World!”

You might have guessed this is probably the easiest language to implement an interpreter for and together with Sarah Ransohoff we were able to write it in a few hours. Although, Brian Lee who was sitting next to us, wrote a better version in only 15 minutes.

Here’s how we did it.

Constructor for the main class:

class Brainfuck:
    def __init__(self, code, stdin=sys.stdin, stdout=sys.stdout):
        self.stdin = stdin # overriding stdin for testing purposes
        self.stdout = stdout # overriding stdout for testing
        self.code = code
        self.index_in_code = 0
        self.tape_size = 40
        self.tape = [0]*self.tape_size
        self.pointer = 0
        self.commands = {'+': self.plus, '-': self.minus, '<': self.left, 
                         '>': self.right, '[': self.start, ']': self.end, 
                         '.': self.out, ',': self.read}

By default, the interpreter will use standard in and out but for unit testing we wanted to be able to use io.StringIO objects. After passing the code in, setting the tape and pointer, all we need to do is list functions that handle different commands. Here are some of them (others are basically mirrored in logic and semantics):

def plus(self):
    self.tape[self.pointer] += 1

def left(self):
    if self.pointer == 0:
        self.pointer = self.tape_size-1
    else:
        self.pointer -= 1

def start(self):
    if self.tape[self.pointer] == 0:
        bracket_counter = 0
        i = self.index_in_code + 1
        while i < len(self.code):
            if bracket_counter == 0 and self.code[i] == ']':
                self.index_in_code = i
                break
            if self.code[i] == '[':
                bracket_counter += 1
            elif self.code[i] == ']':
                bracket_counter -= 1
            i += 1
        if i == self.tape_size:
            raise Exception('Unbalanced brackets')

Start function, which is responsible for the beginning of while loop, and its twin, end, are not particularly elegant. This is where Brian’s version was better. What we’re doing here is searching the code for matching bracket every time ‘[‘ or ‘]’ are encountered. What Brian did was to parse the code first and treat every while loop as a list of commands, potentially building a tree of nested loops and passing every node of that tree to interpreter to execute.

Here’s the last piece of the puzzle, execute function:

def execute(self):
    code = list(self.code)
    while self.index_in_code < len(self.code):
        command = code[self.index_in_code]
        if command in self.commands:
            self.commands[command]()
        self.index_in_code += 1

All in all, it’s a simple piece of code (I didn’t include unit testing though) but it does require an overhaul. I might get back to this and rewrite it.

Share →