Felix Ple┼čoianu, 2009-06-07 (DRAFT)

Make Your Own Programming Language
Part 4

This is the fourth in a 5-part tutorial on how to implement a programming language. It is intended for people with some programming experience, who want to know how their compiler, interpreter or virtual machine works. Hint: it's not magic.

In Part 3 you've learned how to extend Scratch from within, with user-defined words. The only thing left (and I'm sure you're wondering why I waited so long) are control structures.

Decisions, decisions...

The thing with stack-based languages is, they give you so much choice on how to achieve a particular result. We've seen two distinct ways of implementing variables, two implementations for user-defined words (and we'll discover a third in this installment), and we've barely scratched the surface of what is possible. Similarly, two main styles of control structures are possible.

First, there is the Forth way:

( if-then-else statement )
condition IF do when true ELSE do when false THEN

( while loop )
BEGIN condition WHILE do various things REPEAT

Leaving aside the backwardedness of the syntax (which is more than can be blamed on the reverse polish notation), implementing these words would involve manipulating the compilation buffer (which isn't even available outside of a word definition), with the help of a second stack and some really low-level words. But before we turn this tutorial into a full-blown compiler course, let's see how modern stack languages do it.

condition { do when true } { do when false } ifelse % Postscript

[ condition ] [ do when true ] [ do when false ] if ( Factor )
[condition] [do when true] [do when false] ifte ( Joy )

As you can see, they are all variations on a common theme, which can be summed up to "gather several words in a list, then interpret them as needed". Which, in turn, sounds a lot like what DEF does. With one little exception: lists must be able to contain other lists.

var ListWords = {
	"[": function (terp) {
		var list = [];
		var old_stack = terp.stack;
		terp.stack = list;
		
		do {
			var next_word = terp.lexer.nextWord();
			if (next_word == null) throw "Unexpected end of input";
			if (next_word == "]") break;
			
			next_word = terp.compile(next_word);
			if (next_word.immediate)
				terp.interpret(next_word);
			else
				terp.stack.push(next_word);
		} while (true);

		terp.stack = old_stack;
		terp.stack.push(list);
	}
};
ListWords["["].immediate = true;

Note how [ uses what is effectively a compilation buffer, like the one built into the interpreter, except it's a private one. Using Javascript's own stack saves us from the trouble of dealing with yet another data structure ourselves, which is just fine for our purposes.

Lists and lists

(With apologies to Andrew Plotkin.)

Now you can enter things like [ 1 2 3 ] and [ 1 2 3 [ 4 5 6 ] ] and check that the interpreter does the right thing. Except, of course, you'll have to use the browser's address bar for that, as there are no words in Scratch to actually manipulate lists. Fortunately, that's easily fixed.

ListWords["LENGTH"] = function (terp) {
        if (terp.stack.length < 1) throw "Not enough items on stack";
        var temp = terp.stack.pop();
        terp.stack.push(temp.length);
};
	
ListWords["ITEM"] = function (terp) {
        if (terp.stack.length < 2) throw "Not enough items on stack";
        var key = terp.stack.pop();
        var obj = terp.stack.pop();
        if (typeof obj == "object") {
                terp.stack.push(obj[key]);
        } else {
                throw "Object expected";
        }
}

Now you can easily check the length of — and extract items from — a list. Feel free to develop the concept.

[ 1 2 3 ] length                        // Yields 3
[ 1 2 [ 3 4 5 ] ] 2 item                // Yields 3, 4, 5
[ 1 2 [ 3 4 5 ] ] 2 item 2 item         // Yields 5

3 length                                // Yields undefined

(By the way, having a simplistic interpreter eases understanding, but there's a price: I always forget to type a space before and after square brackets. You may want to use begin/end keywords instead.)

Now we can finally do what we've set out to achieve in this installment.

Simple control stuctures

If you don't see the connection yet, try entering the following in the interpreter: [ 1 " hello" print ]. On my machine it yields the following: 1,hello,function (terp) { if (terp.stack.length < 1) throw "Not enough items on stack"; var tos = terp.stack.pop(); alert(tos); }. The dictionary word PRINT has been added to the list like an ordinary data item! Indeed, Scratch is what they call a homoiconic language. This helps a lot, because if code is data, then data is code, and we can run a list just as if it was a word.

var ControlWords = {
	"RUN": function (terp) {
		if (terp.stack.length < 1) throw "Not enough items on stack";
		var temp = terp.stack.pop();
		if (temp.constructor != Array) throw "List expected";
		
		terp.interpret(makeWord(temp));
	}
};

This one's straight out of Logo. It looks like a good fit, so let's continue in the same vein.

[ do some stuff ] 5 times
condition [ do some stuff ] iftrue
condition [ do different stuff ] iffalse
condition [ do some things ] [ do other things ] ifelse
[ condition ] [ repeat actions ] while

Look at the source code for this installment to see how they're implemented. While you're there, note how complex is WHILE. It's a consequence of trying to shoehorn in control structures from a different kind of language. But a better suited loop will require additional language support.

Language support for advanced features

The astute reader might have noticed I didn't bother to mention comparison and logical operations. They are, of course pretty much required for control structures to function, but they are also trivially implemented. Something we don't have — and can't have, with the current implementation — is a way to perform jumps. That's because the variable code_pointer is local to makeWord(). Can you think of a better place?

function makeWord(code) {
	return function (terp) {
		var old_pointer = terp.code_pointer;
		terp.code_pointer = 0;
		while (terp.code_pointer < code.length) {
			terp.interpret(code[terp.code_pointer]);
			terp.code_pointer ++;
		}
		terp.code_pointer = old_pointer;
	};
}

Now the code pointer lives in the interpreter, it's trivial to implement the traditional continue, as a word that sets it to Infinity. Throw in an extra flag, and you can have break as well. But wait, there's a trick to them. You can't just call them as you would in Javascript (condition [ break ] iftrue) because it would just leave the innermost list. Not exactly useful. What you need is a conditional version of each, that checks the value on top of the stack first. Let's call them ?break and ?continue. Add a looping construct that minds the aforementioned flag and you're good to go.

var ControlWords = {
        
        ...
        
	"?CONTINUE": function (terp) {
		if (terp.stack.length < 1) throw "Not enough items on stack";
		var cond = terp.stack.pop();
		if (cond) terp.code_pointer = Infinity;
	},
	
	"?BREAK": function (terp) {
		if (terp.stack.length < 1) throw "Not enough items on stack";
		var cond = terp.stack.pop();
		if (cond) {
			terp.code_pointer = Infinity;
			terp.break_state = true;
		}
	},
	
	"LOOP": function (terp) {
		if (terp.stack.length < 1) throw "Not enough items on stack";
		var code = terp.stack.pop();
		if (code.constructor != Array) throw "List expected";
		
		var code_word = makeWord(code);
		var old_break_state = terp.break_state;
		terp.break_state = false;
		do { code_word(terp); } while (!terp.break_state);
		terp.break_state = old_break_state;
	}
}

With these you can simulate any conditional loop. For example:

1 [ dup 3 > ?break dup print 1 + ] loop

A little cryptic, perhaps, but it works. Feel free to choose betters-sounding words.

There are many more possibilities, but I'm running out of easily explained techniques. Let's call it a day. In Part 5 we're going to try and draw some conclusions, as well as outline future directions.

Try Scratch
Stack content:

Creative Commons License
Make Your Own Programming Language by Felix Pleşoianu is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.